Network Working Group P. Saint-Andre Internet-Draft Cisco Intended status: Informational October 22, 2010 Expires: April 25, 2011 Internationalized Addresses in XMPP draft-saintandre-xmpp-i18n-02 Abstract The Extensible Messaging and Presence Protocol (XMPP) as defined in RFC 3920 used stringprep in the preparation and comparison of non- ASCII characters within XMPP addresses. This document explores whether it makes sense to move away from the use of stringprep in XMPP. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on April 25, 2011. Copyright Notice Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Saint-Andre Expires April 25, 2011 [Page 1] Internet-Draft XMPP I18N October 2010 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Characteristics and Uses of XMPP Addresses . . . . . . . . . . 4 3. String Classes . . . . . . . . . . . . . . . . . . . . . . . . 5 3.1. Localpart . . . . . . . . . . . . . . . . . . . . . . . . 5 3.2. Resourcepart . . . . . . . . . . . . . . . . . . . . . . . 7 4. Migration Issues . . . . . . . . . . . . . . . . . . . . . . . 7 5. User Interface Issues . . . . . . . . . . . . . . . . . . . . 8 6. Recommendations . . . . . . . . . . . . . . . . . . . . . . . 8 6.1. Possible Approaches . . . . . . . . . . . . . . . . . . . 8 6.2. Domainpart . . . . . . . . . . . . . . . . . . . . . . . . 9 6.3. Localpart . . . . . . . . . . . . . . . . . . . . . . . . 9 6.4. Resourcepart . . . . . . . . . . . . . . . . . . . . . . . 9 7. Security Considerations . . . . . . . . . . . . . . . . . . . 9 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 9. Informative References . . . . . . . . . . . . . . . . . . . . 9 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 11 Saint-Andre Expires April 25, 2011 [Page 2] Internet-Draft XMPP I18N October 2010 1. Introduction The Extensible Messaging and Presence Protocol [XMPP] is a widely- deployed technology for real-time communication, commonly used for instant messaging (IM) among human users but also for communication among automated systems. XMPP addresses (also called "JabberIDs" or JIDs) are of the form , where the localpart and resourcepart are formally optional but quite common because they are used to identify clients and other entities on the network. In some sense, XMPP addresses have always been internationalized, because the developers of the original Jabber open-source project intended that all data sent over the wire would consist of UTF-8 encoded Unicode code points. However, at that time (1999) the Jabber developers were quite unsophisticated about internationalization, nor they could not simply re-use a reliable internationalization technology that had been developed by the wider Internet community (as they could, for example, by re-using Secure Sockets Layer and Transport Layer Security for channel encryption); this lack of sophistication is evident in the community's first attempt at formally defining the format for JabberIDs in early 2002 [XEP-0029]. When the first instantiation of the IETF's XMPP WG was formed in late 2002, IDNA2003 [RFC3490] had not yet been published and stringprep [RFC3454] was a very new technology. During its work on [RFC3920], the XMPP WG absorbed as best it could the advice of internationalization experts regarding appropriate methods for preparing and comparing XMPP addresses; however, the participants in the XMPP WG did not possess very much knowledge of internationalization and therefore did not necessarily make fully- informed decisions. As a result of this early work, in [RFC3920] the XMPP WG decided to re-use IDNA2003 [RFC3490] and Nameprep [RFC3491] for the domainpart of a JID and to define two additional stringprep profiles: Nodeprep for the localpart and Resourceprep for the resourecepart. Since the publication of [RFC3920] in 2004, the Internet community has gained more experience with internationalization. In particular, IDNA2003, which is based on stringprep, has been superseded by IDNA2008 ([RFC5890], [RFC5891], [RFC5892], [RFC5893], [RFC5894]), which does not use stringprep. This migration away from stringprep for internationalized domain names has prompted other "customers" of stringprep to consider new approaches to the preparation and comparision of internationalized addresses. As a result, the IETF has formed the PRECIS WG as a common forum for seeking solutions to the problem statement outlined in [PROBLEM]. This document has two purposes: (1) provide input to the PRECIS WG and (2) help inform the decisions of the XMPP WG regarding internationalization of XMPP addresses, eventually leading to replacement of [XMPP-ADDR]. Saint-Andre Expires April 25, 2011 [Page 3] Internet-Draft XMPP I18N October 2010 2. Characteristics and Uses of XMPP Addresses As mentioned, XMPP addresses are of the form . For the domainpart, it makes sense for XMPP to simply re-use the most up-to-date technology for internationalized domain names, which [RFC3920] did by re-using [RFC3490]. Naturally, any migration from IDNA2003 to IDNA2008 will introduce migration issues as outlined under Section 4, but those issues need to be overcome so that XMPP technologies can follow best current practices for internationalization of domain names. However, just because XMPP re-uses IDNA2008 does not necessarily imply that the underlying "inclusion approach" taken in IDNA2008 can also be applied directly to the localpart and resourcepart of an XMPP address. To understand whether a new approach makes sense, we need to understand the uses and characteristics of XMPP addresses (and the parts thereof). The inclusion approach used in IDNA2008 makes sense because domain names were always limited to the letter-digits-hyphen ("LDH") pattern; the progression to non-ASCII domain names simply introduced more characters that might qualify as letters and (in some cases) digits. Extrapolating from that pattern, [RFC5894] argues that there is no good reason for a domain name to include characters such as symbols (e.g., hearts and stars), since the purpose of a domain name is to provide an unambiguous, memorable label for identifying and referring to resources on the Internet, not a personally expressive "handle" or a fun "tag" for interaction. The localpart and resourcepart of a JID often serve purposes other than unambiguous, memorable labels. For example, a human user of an XMPP-based IM system might expect that the username (localpart) portion of a JID could be expressive of their identity in some way, e.g. by matching some combination of their given name, surname, or nickname. Similarly, an occupant of an XMPP-based chatroom [XEP-0045] might expect that their in-room nickname (resourcepart) could be a fun conversation-starter; for example, a regular visitor to an XMPP chatroom that the author frequents has an in-room nickname of "The King" where "King" is represented by the Unicode codepoint 'BLACK CHESS KING' (U+265A). Such characters might difficult to communicate in some contexts (e.g., in screen readers for the blind), but are expressive and fun, which is not an unimportant consideration for many IM users -- even at the expense of reliability. Does the desire for an expressive username or nickname trump the need for human-readable identifiers? Given the wide implementation of full-Unicode addresses in human-oriented XMPP applications, IM client developers seem to think so. Saint-Andre Expires April 25, 2011 [Page 4] Internet-Draft XMPP I18N October 2010 These admittedly anecdotal and subjective considerations vaguely indicate that the inclusion approach pursued in the IDNA2008 initiative is quite appropriate for the more restricted class of domain names but perhaps not as appropriate for the localpart or resourcepart of an XMPP address. That being said, some XMPP implementations (e.g., a custom client) or deployments (e.g., an IM system at a large enterprise or branch of the military) might wish to "lock down" the expressive potential of XMPP addresses by limiting provisioned addresses to a particular subset or version of Unicode, by specifying which scripts, languages, code points, and text directions are supported, etc. Currently there is no way for an implementation or deployment to do so in standardized manner that can be communicated to other entities on the network (e.g., during account provisioning). Given that a deployed XMPP service acts in some ways like a registrar does for domain names, such methods might be helpful; although they are out of scope for the XMPP WG, they might be considered by the XMPP Standards Foundation (e.g., in revisions to or a replacement for [XEP-0077]). 3. String Classes Both [PROBLEM] and [FRAMEWORK] propose that it might be valuable to think of internationalized addresses in terms of broad "string classes" such as domain name, email address, restricted identifier, less-restrictive identifier, and perhaps even free-form identifier (just about anything goes). Particular technologies like XMPP could either borrow such a string class unchanged (as we do for domain names) or adapt or "profile" such a string class with modifications (e.g., as could possibly do by profiling the email address class, restricted identifier class, or less-restrictive for localparts and a possible free-form identifier class for resourceparts). This document does not yet make recommendations about borrowing or adapting more general string classes, in part because those classes are not yet clearly defined. However, as input to further discussion, this document explores the two string classes for which [RFC3920] defined new stringprep profiles: localparts and resourceparts. The following subsections refer to the properties described in Section 3 of [PROBLEM] (input restrictions, normalization, case mapping, and bidirectionality). 3.1. Localpart The localpart of an XMPP address is joined with a domainpart (via the '@' separator) in a way that on the surface looks like an email address, such as . However, there are some Saint-Andre Expires April 25, 2011 [Page 5] Internet-Draft XMPP I18N October 2010 subtle differences, even if we assume that the username portion of an email address inherits from the "dot-atom-text" rule of [RFC5322] instead of the more complex "local-part" rule. Specifically, within the ASCII block: o Several characters are allowed in email addresses but not allowed in JabberIDs: "&" (U+0026), "'" (U+0027), and "/" (U+002F). o Several characters are allowed in JabberIDs but not allowed in email addresses: "(" (U+0028), ")" (U+0029), "," (U+002C), "." (U+002E), ";" (U+003B), "[" (U+005B), "\" (U+005C), and "]" (U+005D). Those differences might not be significant enough to prevent the XMPP WG from adapting or "profiling" an email address class if the PRECIS WG produces such a class. On the other hand, they might lead the XMPP WG toward borrowing or adapting either the restricted identifier class or the less-restrictive identifier class, depending on how those are defined. With regard to input restrictions, the characters allowed in an XMPP localpart have always been lightly restricted. Within the ASCII block, the only restricted characters are space, controls, " (U+0022), & (U+0026), ' (U+0027), / (U+002F), : (U+003A), < (U+003C), > (U+003E), and @ (U+0040). Outside the ASCII block, no characters are currently restricted. It is an open issue whether further restrictions are desirable (e.g., do XMPP localparts really need to include symbol characters such as hearts and stars?). With regard to normalization, the Nodeprep profile of stringprep specifies that implementations apply Unicode normalization form NFKC (Compatibility Decomposition followed by Canonical Composition). As briefly described in Section 2.4 of [PROBLEM], it is an open question whether it is more appropriate to apply Unicode normalization form NFKC, form NFC (Canonical Decomposition followed by Canonical Composition), or no normalization at all. These forms are defined in "Unicode Standard Annex #15: Unicode Normalization Forms" [UAX15], along with several examples of the differing outputs they can produce. As two examples, for the source code point U+FB01 (SMALL LATIN LIGATURE FI) NFC produces that same code point whereas NFKC produces "f" followed by "i" (U+0066 and U+0069), and for the source code points U+0032 (DIGIT TWO) and U+2075 (SUPERSCRIPT FIVE) NFC produces those same code points whereas NFKC produces U+0032 (DIGIT TWO) and U+0035 (DIGIT FIVE). Very informally, XMPP developers can think of NFKC as trying to be smart -- and perhaps sometimes too smart. With regard to case mapping, the Nodeprep profile of stringprep specifies that XMPP localparts are case-folded, and we want to retain Saint-Andre Expires April 25, 2011 [Page 6] Internet-Draft XMPP I18N October 2010 that feature (e.g., we want and to identify the same entity on the network). With regard to bidirectionality (i.e., scripts that are written right to left), [RFC3920] did not provide any guidance other than pointing to Section 6 of [RFC3454]. Any treatment of bidirectionality in XMPP localparts is an open issue ([RFC5893] provides some helpful discussion of the general topic, at least as applied to internationalized domain names). 3.2. Resourcepart The resourcepart of an XMPP address has traditionally been a kind of "anything goes" string, even allowing the space character. If the PRECIS WG defines something like a free-form identifier, the XMPP WG might borrow or adapt that class. Another option would be to say that the resourcepart is Net-Unicode as specified in [RFC5198]. With regard to input restrictions, the characters allowed in an XMPP resourcepart have always been lightly restricted. Within the ASCII block, the only restricted characters are controls. Outside the ASCII block, no characters are currently restricted. Although is an open issue whether further restrictions are desirable, as explained under Section 2 XMPP-based IM systems have taken advantage of the lack of restrictions on resource identifiers (e.g., in multi-user chatrooms). With regard to normalization, the Resourceprep profile of stringprep specifies that implementations apply Unicode normalization form NFKC (Compatibility Decomposition followed by Canonical Composition). With regard to case mapping, the Resourceprep profile of stringprep specifies that XMPP localparts are not case-folded (e.g., in an XMPP- based chatroom, the participant "StPeter" could be different from the participant "stpeter"). It is an open question whether this behavior is necessary or desirable in all contexts. With regard to bidirectionality (i.e., scripts that are written right to left), [RFC3920] did not provide any guidance. Any treatment of bidirectionality in XMPP resourceparts is an open issue ([RFC5893] provides some helpful discussion of the general topic, at least as applied to internationalized domain names). 4. Migration Issues Any move away from Nameprep, Nodeprep, and Resourceprep as they are defined today will inevitably introduce the potential for migration Saint-Andre Expires April 25, 2011 [Page 7] Internet-Draft XMPP I18N October 2010 issues, such as JIDs that were not ambiguous before the migration but that become ambiguous after the migration. These issues need to be clearly defined and well understood so that the costs and benefits of any change can be properly assessed -- especially if the change might have an impact on authentication (e.g., as described in [RFC3920]), authorization (e.g., presence subscriptions as described in [XMPP-IM]), access (e.g., joining a chatroom as described in [XEP-0045]), identification (e.g., in XMPP URIs or IRIs as described in [XMPP-URI]), and other security-related functions. 5. User Interface Issues [RFC5895] introduces the helpful concept of "the dividing line between user interface and protocol" and applies that concept to the complexs process of translating the user's (presumed) intentions into bits on the wire. IDNA2003 conflated user interface processing and machine-readable protocols, and in many ways XMPP inherited that same error. It would be desirable for XMPP technologies to define a clear dividing line between user interface and protocol. This might mean that the XMPP community will need to define recommended mappings that are applied to a string before it is considered a JID (or the localpart of resourcepart of a JID). 6. Recommendations 6.1. Possible Approaches This document does not yet provide definitive recommendations, but instead mainly seeks to foster discussion about internationalized addresses in XMPP. However, there are three possible approaches that the XMPP WG might pursue in relation to its existing stringprep profiles: 1. Keep using Nameprep, Nodeprep, and/or Resourceprep as they are defined today. 2. Collaborate with other interested parties or working groups to define a new version of stringprep that tracks changes to Unicode since Unicode 3.2 as currently specified in [RFC3454]. 3. Pursue the general model followed in the IDNA2008 work by defining a tiered model of valid, disallowed, and unassigned characters; such an effort might be pursued only within the XMPP community (for Nodeprep, Resourceprep, or both) or more generally in concert with other users of stringprep. Saint-Andre Expires April 25, 2011 [Page 8] Internet-Draft XMPP I18N October 2010 The XMPP WG might even decide to use a mix of these approaches, e.g. to use the new, non-stringprep IDNA2008 approach for domainparts but the existing Nodeprep and Resourceprep profiles for localparts and resourceparts. In general, given that the PRECIS WG has been formed as a common effort across different technologies, it is reasonable for the XMPP developer community to participate in that WG (and for the XMPP WG to cooperate with that WG) and to adopt whatever solutions are developed in that WG. 6.2. Domainpart RFC 3920 specifies the use of IDNA2003 for the domainpart of a JID (which in the terms of IDNA2008 [RFC5890] is a "domain name slot"). This document does not question the reasoning behind the IDNA2008 work and therefore recommends the use of IDNA2008 technologies in the document that obsoletes [XMPP-ADDR]. 6.3. Localpart This document does not yet provide a recommendation regarding the localpart of a JID. 6.4. Resourcepart This document does not yet provide a recommendation regarding the resourcepart of a JID. 7. Security Considerations The inclusion of non-ASCII characters in XMPP addresses has important security implications, such as the ability to mimic characters or entire addresses through the inclusion of "confusable characters" (see [RFC4690] and [RFC5890]). These issues are explored at some length in [XMPP-ADDR]. Other security considerations might apply and will be described in a future version of this specification. 8. IANA Considerations This document has no actions for the IANA. 9. Informative References [FRAMEWORK] Saint-Andre Expires April 25, 2011 [Page 9] Internet-Draft XMPP I18N October 2010 Blanchet, M., "Precis Framework: Handling Internationalized Strings in Protocols", draft-blanchet-precis-framework-00 (work in progress), July 2010. [PROBLEM] Blanchet, M. and A. Sullivan, "Stringprep Revision Problem Statement", draft-ietf-precis-problem-statement-00 (work in progress), October 2010. [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of Internationalized Strings ("stringprep")", RFC 3454, December 2002. [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, "Internationalizing Domain Names in Applications (IDNA)", RFC 3490, March 2003. [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile for Internationalized Domain Names (IDN)", RFC 3491, March 2003. [RFC3920] Saint-Andre, P., Ed., "Extensible Messaging and Presence Protocol (XMPP): Core", RFC 3920, October 2004. [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and Recommendations for Internationalized Domain Names (IDNs)", RFC 4690, September 2006. [RFC5198] Klensin, J. and M. Padlipsky, "Unicode Format for Network Interchange", RFC 5198, March 2008. [RFC5322] Resnick, P., Ed., "Internet Message Format", RFC 5322, October 2008. [RFC5890] Klensin, J., "Internationalized Domain Names for Applications (IDNA): Definitions and Document Framework", RFC 5890, August 2010. [RFC5891] Klensin, J., "Internationalized Domain Names in Applications (IDNA): Protocol", RFC 5891, August 2010. [RFC5892] Faltstrom, P., "The Unicode Code Points and Internationalized Domain Names for Applications (IDNA)", RFC 5892, August 2010. [RFC5893] Alvestrand, H. and C. Karp, "Right-to-Left Scripts for Internationalized Domain Names for Applications (IDNA)", RFC 5893, August 2010. Saint-Andre Expires April 25, 2011 [Page 10] Internet-Draft XMPP I18N October 2010 [RFC5894] Klensin, J., "Internationalized Domain Names for Applications (IDNA): Background, Explanation, and Rationale", RFC 5894, August 2010. [RFC5895] Resnick, P. and P. Hoffman, "Mapping Characters for Internationalized Domain Names in Applications (IDNA) 2008", RFC 5895, September 2010. [UAX15] The Unicode Consortium, "Unicode Standard Annex #15: Unicode Normalization Forms", September 2010. [XEP-0029] Kaes, C., "Definition of Jabber Identifiers (JIDs)", XSF XEP 0029, October 2003. [XEP-0077] Saint-Andre, P., "In-Band Registration", XSF XEP 0077, September 2009. [XEP-0045] Saint-Andre, P., "Multi-User Chat", XSF XEP 0045, July 2008. [XMPP-ADDR] Saint-Andre, P., "Extensible Messaging and Presence Protocol (XMPP): Address Format", draft-ietf-xmpp-address-05 (work in progress), October 2010. [XMPP-IM] Saint-Andre, P., "Extensible Messaging and Presence Protocol (XMPP): Instant Messaging and Presence", draft-ietf-xmpp-3921bis-15 (work in progress), October 2010. [XMPP-URI] Saint-Andre, P., "Internationalized Resource Identifiers (IRIs) and Uniform Resource Identifiers (URIs) for the Extensible Messaging and Presence Protocol (XMPP)", RFC 5122, February 2008. [XMPP] Saint-Andre, P., "Extensible Messaging and Presence Protocol (XMPP): Core", draft-ietf-xmpp-3920bis-17 (work in progress), October 2010. Saint-Andre Expires April 25, 2011 [Page 11] Internet-Draft XMPP I18N October 2010 Author's Address Peter Saint-Andre Cisco 1899 Wyknoop Street, Suite 600 Denver, CO 80202 USA Phone: +1-303-308-3282 Email: psaintan@cisco.com Saint-Andre Expires April 25, 2011 [Page 12]