Network Working Group P. Saint-Andre Internet-Draft &yet Intended status: Informational March 31, 2014 Expires: October 2, 2014 An Interoperable Subset of Characters for Internationalized Usernames draft-saintandre-username-interop-03 Abstract Various Internet protocols define constructs for usernames, i.e., the localpart of an address such as "localpart@example.com". This document describes a subset of Unicode characters to allow in internationalized usernames for the sake of maximal interoperability across Internet protocols. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on October 2, 2014. Copyright Notice Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Saint-Andre Expires October 2, 2014 [Page 1] Internet-Draft Username Interoperability March 2014 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 2 3. Subset . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5 5. Security Considerations . . . . . . . . . . . . . . . . . . . 5 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 6 6.1. Normative References . . . . . . . . . . . . . . . . . . 6 6.2. Informative References . . . . . . . . . . . . . . . . . 6 Appendix A. Analysis . . . . . . . . . . . . . . . . . . . . . . 7 Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . 12 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 12 1. Introduction Various Internet protocols define constructs for usernames, i.e., the localpart of an address such as "localpart@example.com". As further described under Appendix A), examples include the localparts of email addresses, Kerberos Principal Names, Network Access Identifiers, SIP URIs, instant messaging URIs and presence URIs, XMPP addresses, and account URIs, as well as certain forms of SASL simple user names (see [I-D.ietf-precis-saslprepbis]). This document describes a subset of Unicode characters [UNICODE] to allow in internationalized usernames for the sake of maximal interoperability across Internet protocols. This subset might prove useful in cases where a provider offers multiple services (say, email and instant messaging) using the same underlying identifier, or where the same identifier (e.g., an account URI) is used when interacting with multiple providers. 2. Terminology Many important terms used in this document are defined in [I-D.ietf-precis-framework], [RFC6365], and [UNICODE]. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. 3. Subset The interoperable subset of characters provided here is defined as a profile of the PRECIS IdentifierClass specified in [I-D.ietf-precis-framework]. In essence, the IdentifierClass restricts the allowable characters to letters and digits from all the scripts of Unicode [UNICODE] while grandfathering all the characters from the ASCII range [RFC20]. The profile defined here, Saint-Andre Expires October 2, 2014 [Page 2] Internet-Draft Username Interoperability March 2014 "LocalpartIdentifierClass", further restricts the characters from the ASCII range to those known to work across existing application protocols (as described under Appendix A). The syntax is defined as follows using the Augmented Backus-Naur Form (ABNF) as specified in [RFC5234]. localpart = 1*1023(localpoint) ; ; a "localpoint" is a UTF-8 encoded Unicode code point ; that conforms to the "LocalpartIdentifierClass" ; profile of the PRECIS IdentifierClass A "localpart" MUST consist only of Unicode code points that conform to the "LocalpartIdentifierClass" profile of the "IdentifierClass" base string class defined in [I-D.ietf-precis-framework]. The LocalpartIdentifierClass profile includes all code points allowed by the IdentifierClass base class, with the exception of the following characters, which are disallowed (again, see Appendix A for the reasoning behind these restrictions): U+0022 (QUOTATION MARK), i.e., '"' U+0023 (NUMBER SIGN), i.e., '#' U+0025 (PERCENT SIGN), i.e., '%' U+0026 (AMPERSAND), i.e., '&' U+0027 (APOSTROPHE), i.e., "'" U+0028 (LEFT PARENTHESIS), i.e., '(' U+0029 (RIGHT PARENTHESIS), i.e., ')' U+002C (COMMA), i.e., ',' U+002E (FULL STOP), i.e., '.' U+002F (SOLIDUS), i.e., '/' U+003A (COLON), i.e., ':' U+003B (SEMICOLON), i.e., ';' U+003C (LESS-THAN SIGN), i.e., '<' U+003E (GREATER-THAN SIGN), i.e., '>' Saint-Andre Expires October 2, 2014 [Page 3] Internet-Draft Username Interoperability March 2014 U+003F (QUESTION MARK), i.e., '?' U+0040 (COMMERCIAL AT), i.e., '@' U+005B (LEFT SQUARE BRACKET), i.e., '[' U+005C (REVERSE SOLIDUS), i.e., '\' U+005D (RIGHT SQUARE BRACKET), i.e., ']' U+005E (CIRCUMFLEX ACCENT), i.e., '^' U+0060 (GRAVE ACCENT), i.e., '`' U+007B (LEFT CURLY BRACKET), i.e., '{' U+007C (VERTICAL), i.e., '|' U+007D (RIGHT CURLY BRACKET), i.e., '}' The normalization and mapping rules for the LocalpartIdentifierClass are as follows, where the operations specified MUST be completed in the order shown: 1. Fullwidth and halfwidth characters MUST be mapped to their decomposition mappings. 2. So-called additional mappings MAY be applied, such as mapping of characters that are similar to common delimiters (such as '@', ':', '/', '+', '-', and '.', e.g., mapping of IDEOGRAPHIC FULL STOP (U+3002) to FULL STOP (U+002E)) and special handling of certain characters or classes of characters (e.g., mapping of non-ASCII spaces to ASCII space); the PRECIS mappings document [I-D.ietf-precis-mappings] describes such mappings in more detail. 3. Uppercase and titlecase characters MUST be mapped to their lowercase equivalents. 4. All characters MUST be mapped using Unicode Normalization Form C (NFC). With regard to directionality, applications MUST apply the "Bidi Rule" defined in [RFC5893] (i.e., each of the six conditions of the Bidi Rule must be satisfied). Saint-Andre Expires October 2, 2014 [Page 4] Internet-Draft Username Interoperability March 2014 A localpart MUST NOT be zero octets in length and MUST NOT be more than 1023 octets in length. This rule is to be enforced after any normalization and mapping of code points. 4. IANA Considerations The IANA shall add the following entry to the PRECIS Profiles Registry: Name: LocalpartIdentifierClass. Applicability: Usernames that are intended to be interoperable across multiple application protocols. Base Class: IdentifierClass. Replaces: None. Width Mapping: Map fullwidth and halfwidth characters to their decomposition mappings. Additional Mappings: None required or recommended. Case Mapping: Map uppercase and titlecase characters to lowercase. Normalization: NFC. Directionality: The "Bidi Rule" defined in RFC 5893 applies. Exclusions: 24 non-alphanumeric characters in the ASCII range. Enforcement: Up to the application protocol or deployment. Specification: this document. [Note to RFC Editor: please change "this document" to the RFC number issued for this specification.] 5. Security Considerations Deploying usernames that are interoperable across multiple protocols could potentially give malicious entities multiple ways to attack an account or user. The security considerations described in [I-D.ietf-precis-framework] apply to the "IdentifierClass" base string class used in this document. The security considerations described in [UTS39] apply to the use of Unicode characters. Saint-Andre Expires October 2, 2014 [Page 5] Internet-Draft Username Interoperability March 2014 6. References 6.1. Normative References [I-D.ietf-precis-framework] Saint-Andre, P. and M. Blanchet, "Precis Framework: Handling Internationalized Strings in Protocols", draft- ietf-precis-framework-15 (work in progress), March 2014. [RFC20] Cerf, V., "ASCII format for network interchange", RFC 20, October 1969. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", STD 68, RFC 5234, January 2008. [RFC5893] Alvestrand, H. and C. Karp, "Right-to-Left Scripts for Internationalized Domain Names for Applications (IDNA)", RFC 5893, August 2010. [UNICODE] The Unicode Consortium, "The Unicode Standard, Version 6.3", 2013, . 6.2. Informative References [I-D.ietf-appsawg-acct-uri] Saint-Andre, P., "The 'acct' URI Scheme", draft-ietf- appsawg-acct-uri-07 (work in progress), January 2014. [I-D.ietf-precis-mappings] Yoneya, Y. and T. NEMOTO, "Mapping characters for PRECIS classes", draft-ietf-precis-mappings-07 (work in progress), February 2014. [I-D.ietf-precis-saslprepbis] Saint-Andre, P. and A. Melnikov, "Preparation and Comparison of Internationalized Strings Representing Usernames and Passwords", draft-ietf-precis-saslprepbis-07 (work in progress), March 2014. [RFC821] Postel, J., "Simple Mail Transfer Protocol", STD 10, RFC 821, August 1982. [RFC2822] Resnick, P., "Internet Message Format", RFC 2822, April 2001. Saint-Andre Expires October 2, 2014 [Page 6] Internet-Draft Username Interoperability March 2014 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002. [RFC3856] Rosenberg, J., "A Presence Event Package for the Session Initiation Protocol (SIP)", RFC 3856, August 2004. [RFC3860] Peterson, J., "Common Profile for Instant Messaging (CPIM)", RFC 3860, August 2004. [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, January 2005. [RFC4120] Neuman, C., Yu, T., Hartman, S., and K. Raeburn, "The Kerberos Network Authentication Service (V5)", RFC 4120, July 2005. [RFC4282] Aboba, B., Beadles, M., Arkko, J., and P. Eronen, "The Network Access Identifier", RFC 4282, December 2005. [RFC5322] Resnick, P., Ed., "Internet Message Format", RFC 5322, October 2008. [RFC6120] Saint-Andre, P., "Extensible Messaging and Presence Protocol (XMPP): Core", RFC 6120, March 2011. [RFC6365] Hoffman, P. and J. Klensin, "Terminology Used in Internationalization in the IETF", BCP 166, RFC 6365, September 2011. [UTS39] The Unicode Consortium, "Unicode Technical Standard #39: Unicode Security Mechanisms", July 2012, . Appendix A. Analysis This document takes the following username constructs into consideration: o Email addresses [RFC5322] o Kerberos Principal Names [RFC4120] o Network Access Identifiers [RFC4282] o SIP URIs [RFC3261] Saint-Andre Expires October 2, 2014 [Page 7] Internet-Draft Username Interoperability March 2014 o Instant messaging URIs [RFC3860] and presence URIs [RFC3856] o XMPP addresses (a.k.a. Jabber Identifiers) [RFC6120] o Account URIs [I-D.ietf-appsawg-acct-uri] Each of those address formats defines something that can be used as the "localpart" of an address. The localpart of an email address uses either the "local-part" or the "dot-atom-text" rule in [RFC5322]. Here we make the simplifying assumption that the "dot-atom-text" rule applies: dot-atom-text = 1*atext *("." 1*atext) atext = ALPHA / DIGIT / ; Any character except "!" / "#" / "$" / ; controls, SP, and "%" / "&" / "'" / ; specials. Used for "*" / "+" / "-" / ; atoms. "/" / "=" / "?" / "^" / "_" / "`" / "{" / "|" / "}" / "~" We make the same simplifying assumption for im: and pres: URIs (although their specifications reference [RFC2822]). A Kerberos Principal Name is a sequence of strings of type KerberosString, where each KerberosString is a GeneralString that is constrained to contain only characters in IA5String. PrincipalName ::= SEQUENCE { name-type [0] Int32, name-string [1] SEQUENCE OF KerberosString } KerberosString ::= GeneralString (IA5String) A Network Address Identifier inherits from [RFC821]. Here we care only about the "username" rule: Saint-Andre Expires October 2, 2014 [Page 8] Internet-Draft Username Interoperability March 2014 username = dot-string dot-string = string dot-string =/ dot-string "." string string = char string =/ string char char = c char =/ "\" x c = %x21 ; '!' allowed ; '"' not allowed c =/ %x23 ; '#' allowed c =/ %x24 ; '$' allowed c =/ %x25 ; '%' allowed c =/ %x26 ; '&' allowed c =/ %x27 ; ''' allowed ; '(', ')' not allowed c =/ %x2A ; '*' allowed c =/ %x2B ; '+' allowed ; ',' not allowed c =/ %x2D ; '-' allowed ; '.' not allowed c =/ %x2F ; '/' allowed c =/ %x30-39 ; '0'-'9' allowed ; ';', ':', '<' not allowed c =/ %x3D ; '=' allowed ; '>' not allowed c =/ %x3F ; '?' allowed ; '@' not allowed c =/ %x41-5a ; 'A'-'Z' allowed ; '[', '\', ']' not allowed c =/ %x5E ; '^' allowed c =/ %x5F ; '_' allowed c =/ %x60 ; '`' allowed c =/ %x61-7A ; 'a'-'z' allowed c =/ %x7B ; '{' allowed c =/ %x7C ; '|' allowed c =/ %x7D ; '}' allowed c =/ %x7E ; '~' allowed ; DEL not allowed c =/ %x80-FF ; UTF-8-Octet allowed x = %x00-FF ; all 128 ASCII characters The localpart of a sip:/sips: URI inherits from the "userinfo" rule in [RFC3986] with several changes; here we discuss the SIP "user" rule only: Saint-Andre Expires October 2, 2014 [Page 9] Internet-Draft Username Interoperability March 2014 user = 1*( unreserved / escaped / user-unreserved ) user-unreserved = "&" / "=" / "+" / "$" / "," / ";" / "?" / "/" unreserved = alphanum / mark mark = "-" / "_" / "." / "!" / "~" / "*" / "'" / "(" / ")" The localpart of an XMPP address allows any ASCII character except space, controls, and the " & ' / : < > @ characters. The 'acct' URI syntax borrows the 'host', 'pct-encoded', 'sub- delims', 'unreserved' rules from [RFC3986]: acctURI = "acct" ":" userpart "@" host userpart = unreserved / sub-delims 0*( unreserved / pct-encoded / sub-delims ) To summarize the foregoing information, the following table lists the allowed and disallowed characters in the localpart of identifiers for each protocol (aside from the alphanumeric, space, and control characters), in order by hexadecimal character number (where each "A" row shows the allowed characters and each "D" row shows the disallowed characters). Saint-Andre Expires October 2, 2014 [Page 10] Internet-Draft Username Interoperability March 2014 Table 1: Allowed and Disallowed Characters (Non-Alphanumeric) +---+----------------------------------+ | EMAIL ADDRESSES, IM/PRES URIs | +---+----------------------------------+ | A | ! #$%&' *+ - / = ? ^_`{|}~ | | D | " () , . :;< > @[\] | +---+----------------------------------+ | KERBEROS PRINCIPAL NAMES | +---+----------------------------------+ | A | !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ | | D | | +---+----------------------------------+ | NETWORK ADDRESS IDENTIFIERS | +---+----------------------------------+ | A | ! #$%&' *+ - / = ? ^_`{|}~ | | D | " () , . :;< > @[\] | +---+----------------------------------+ | SIP/SIPS URIs | +---+----------------------------------+ | A | ! $ &'()*+,-./ ; = ? _ ~ | | D | "# % : < > @[\]^ `{|} | +---+----------------------------------+ | XMPP ADDRESSES | +---+----------------------------------+ | A | ! #$% ()*+,-. ; = ? [\]^_`{|}~ | | D | " &' /: < > @ | +---+----------------------------------+ | ACCT URIs | +---+----------------------------------+ | A | ! $%&'()*+,-. ; = \ ^_`{|}~ | | D | "# /: < >?@[ ] | +---+----------------------------------+ The interoperable subset allows only characters that are allowed in all of the foregoing formats, as shown in the following table. Table 2: Subset Characters (Non-Alphanumeric) +---+----------------------------------+ | INTEROPERABLE SUBSET | +---+----------------------------------+ | A | ! $ *+ - = _ ~ | | D | "# %&'() , ./:;< >?@[\]^ `{|} | +---+----------------------------------+ Saint-Andre Expires October 2, 2014 [Page 11] Internet-Draft Username Interoperability March 2014 Appendix B. Acknowledgements Thanks to Sean Turner for inspiring the work on this document. Thanks also to Paul Hoffman, John Klensin, and Glen Zorn for their comments. Author's Address Peter Saint-Andre &yet P.O. Box 787 Parker, CO 80134 USA Email: ietf@stpeter.im Saint-Andre Expires October 2, 2014 [Page 12]