INTERNET-DRAFT Claus Faerber draft-faerber-i18n-email-netnews-names-00 August 2002 Internationalisation of Email Addresses, Newsgroup Names and similar Identifiers Status of this memo This document is an Internet-Draft and is subject to all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Abstract This document describes a possible architecture for the implementation of internationalised email addresses, newsgroup names, and similar identifiers on top of the standards set by the Internationalised Domain Names [IDN] working group. 1 Introduction 1.1 Overview The advent of internationalised domain names raises the question how other identifiers, such as email addresses, newsgroup names, etc. should be internationalised. As these types of identifiers are often included in other types of identifiers, an overall architecture is needed. This draft proposes a solution derieved directly from the internationalization of domain names and several requirements described in section 1.2. 1.2 Requirements The author of this draft believes that a specification must meet the following requirements: - Leagacy mail and news user agents, MTAs (including injection agents) and news servers must be able to handle the Faerber Expires: March 2003 Page 1 INTERNET-DRAFT Int. Email Addresses and Newsgroup Names August 2002 internationalised addresses without problems. - Therefore, the encoding of domain names should be identical to that of internationalised domain names [IDN]. - Further, the encoding of domain names included within the LHS of email addresses should be identical to that of internationalised domain names. - As delimiters are often exchanged, the result should be identical regardless of the order in which the exchange of the delimiters and the encoding of the internationalised domain names occurs. - A single encoding/decoding function should be able to handle both internationalised domain names and ohter internationalised identifiers. 2 Encoding of Internationalised Names The requirements set forth in section 1.2 lead directly to the following architecture: - Names are split in individual parts at the following delimiters: SP / %x00-1F / "." / "@" / "+" / "%" / "=" / "/" / "," / ";" / ":" / "!" / "(" / ")" / "[" / "]" / "<" / ">" [[RATIONALE: As much delimiters as possible are used to increase the chance that the encoding of individual parts of the identifier are encoded the same way when included in other identifiers: "@" - used to seperate local-part and domain name. "+" - used by some mailers for subaddressing "%" - used by some MTAs to embed domains within the local-part of email addresses ("percent-hack") "=" - used within MIXER (RFC 2156) "/" - used wihtin MIXER (RFC 2156), used as a newsgroup component seperator in some leagacy non-RFC BBS networks. ",", ";" - used to seperate identifiers in many positions ":" - used to seperate (obsolete) source routes from the destination address " " - used to seperate source routes from each other. "!" - used as a seperator within the Path header in RFC 1036, used as a address seperator within (obsolete) UUCP bang addresses "(", ")" - used for comments, used within the replacement for some seperators according to MIXER (e.g. "(a)" instead of "@") "[", "]", "<", ">" - as precaution ]] - Each part is then prepared according to [NAMEPREP] and encoded according using [PUNYCODE]. The Mixed-case annotation described in appendix B of [PUNYCODE] is used. - The parts are then re-assembled to build the encoded name. [[NOTE: As it only adds characters that are not allowed in domain names as delimiters, it will procude the same results (except for the case of the resulting string, which does not matter within domain names) as [IDN] for all valid domain names.]] Faerber Expires: March 2003 Page 2 INTERNET-DRAFT Int. Email Addresses and Newsgroup Names August 2002 3 Usage Within Applications 3.1 General The format of identifiers defined by various specifications is not altered in any way; all data sent over the network uses the encoded form of the identifiers. Only display and input of these identifiers is changed in the user agent (i.e. the software that interfaces directly with human users). It is the task of the user agent to encode all non-ASCII characters in identifiers using the method described in section 2. Changes to relay agents, transport agents, etc., and software accompanying them are usually not necessary. 3.2 Email 3.2.1 RFC 2821 Internationalised identifiers can appear within the following lexicals: - Domain of the EHLO and HELO commands - return-path of the MAIL FROM command - forward-path of the RCPT TO command - String of the VRFY and EXPN commands Example: C: EHLO zq--frber-gra.muc.de C: MAIL FROM: SMTP agents do not need to implement this specification to handle internationalised identifiers correctly. SMTP agents MUST handle addresses that appear to be malformed internationalised identifiers. The VRFY and EXPN commands may profit from future extensions to handle unencoded names. [[NOTE: Although outside the scope of this specification, it is believed that the interface between MUAs and MTAs will use the encoded form of these identifiers, too, so that the MTA can be kept completly unchanged. Local delivery agents might profit from extensions to allow pattern matching agains internationalised identifiers.]] 3.2.2 RFC 2822 Internationalised identifiers can appear within the following lexicals: - addr-spec - obs-route - domain Faerber Expires: March 2003 Page 3 INTERNET-DRAFT Int. Email Addresses and Newsgroup Names August 2002 Example: From: =?ISO-8859-1?Q?Claus_F=E4rber?= Mail user agents that do not implement this specification will present the identifiers in encoded form to the user. Users will still be able to reply to messages using these identifiers. 3.3 Netnews/Usenet 3.3.1 RFC 1036 Internationalised identifiers can appear within the following header fields: - parts of From, Sender, and Reply-To header fields that correspond to those described in RFC 2822. - Path header - Newsgroup and Followup-To header as well as within the following lexicals: - groupname argument to newgroup and rmgroup commands. - newsgroup names within checkgroup messages. Examples: Newsgroups: se.test.zq--rksmrgs-5wao1o Control: newsgroup se.test.zq--rksmrgs-5wao1o News user agents that do not implement this specification will present the identifiers in encoded form to the user. Users will still be able to read newsgroups, send followups and replies to messages using these identifiers. News transfer agents do not need to implement this specification to handle internationalised identifiers correctly. 3.3.2 RFC 977/RFC 2980 Internationalised identifiers can appear within all groupnames passed as arguments to NNTP commands or returned by these commands. NNTP servers do not need to implement this specification to handle internationalised identifiers correctly. Extended NNTP commands taking "wildmat" as an argument may profit from an implementation that takes into accout that group names might be encoded according to this specification and matches agains the decoded form of these names. 3.3.3 Submission to moderated newsgroups When submitting articles POSTed to moderated group to the moderator, the moderator's email address is often determined using a method where a pattern in an "wildcard" email address is replaced by the Faerber Expires: March 2003 Page 4 INTERNET-DRAFT Int. Email Addresses and Newsgroup Names August 2002 name of the moderated newsgroup, having all "."s within the newsgroup name replaced by "-". This will result in email addresses not formed according to this specification. Example: A message sent to the moderated newsgroup se.test.zq--rksmrgs-5wao1o.moderated will be forwarded to the email address se-test-zq--rksmrgs-5wao1o@usenet-se.net, although the expected encoding for the email address would be zq--se-test-rksmrgs-8kbw71a@usenet-se.net Administrators of sites providing such address aliases MUST set up aliases for both forms of the email address. [[NOTE: This only affects a small number of sites: those providing mail aliases for newsgroup moderators. We can't add "-" to the list of part seperators as this would be incompatible with [IDN]. [IDN] can't be changed as there is no other non-alphanumeric character allowed in domain names.]] 4 Relation to other specifications 4.1 IDN This specification extends the system of Internationalised domain names described in [IDN]. 4.2 USEFOR This specification provides an alternative to the use of unencoded domain names as proposed by the USEFOR working group [USEFOR], which is believed to cause severe interoperability problems. This specification avoids such problems by using an encoding that produces encoded forms of newsgroup names that are fully compliant with RFC 1036. 4 References [IDN] Faltstrom, Faltstrom, et. al., "Internationalizing Domain Names in Applications (IDNA)", draft-ietf-idn-idna-10. [PUNYCODE] Adam Costello, "Punycode: An encoding of Unicode for use with IDNA", draft-ietf-idn-punycode. [NAMEPREP] Paul Hoffman and Marc Blanchet, "Nameprep: A Stringprep Profile for Internationalised Domain Names", draft-ietf-idn-nameprep. [USEFOR] Charles H. Lindsey, "News Article Format", draft-ietf-usefor-article Faerber Expires: March 2003 Page 5 INTERNET-DRAFT Int. Email Addresses and Newsgroup Names August 2002 5 Author's Address Claus Faerber Connollystrasse 8 80809 Muenchen GERMANY E-Mail: claus@faerber.muc.de NOTE: Please write the author's last name with a-umlaut (Unicode U+00E4, HTML ä) instead of "ae" where possible: Färber Full Copyright Statement Copyright (C) The Internet Society (2002). All Rights Reserved This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Faerber Expires: March 2003 Page 6