Network Working Group P. Hoffman Internet-Draft January 13, 2009 Updates: RFC 3454, 3490, 3491 (if approved) Intended status: Standards Track Expires: July 17, 2009 Internationalizing Domain Names in Applications (IDNA) version 2 draft-hoffman-idna2-00.txt Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on July 17, 2009. Copyright Notice Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Hoffman Expires July 17, 2009 [Page 1] Internet-Draft IDNA2 January 2009 Abstract IDNA has been a world-wide success since it was introduced over five years ago. However, it has some notable deficiencies, including being tied to an old version of the Unicode standard and needless restrictions that prevented some languages from being used. This document describes IDNA version 2, which rectifies those problems while making the fewest changes necessary to the original protocol. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Acknowledgements . . . . . . . . . . . . . . . . . . . . . 4 1.2. Conventions Used In This Document . . . . . . . . . . . . . 4 2. Changes to RFC 3490 (IDNA v.1) . . . . . . . . . . . . . . . . 4 3. Changes to RFC 3454 (Stringprep) . . . . . . . . . . . . . . . 4 4. Changes to RFC 3491 (Nameprep) . . . . . . . . . . . . . . . . 6 5. Changes to RFC 3492 (Punycode) . . . . . . . . . . . . . . . . 6 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 6 7. Security Considerations . . . . . . . . . . . . . . . . . . . . 7 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 7 8.1. Normative References . . . . . . . . . . . . . . . . . . . 7 8.2. Informative References . . . . . . . . . . . . . . . . . . 8 Appendix A. Work Still to be Done . . . . . . . . . . . . . . . . 8 Appendix B. Changes between versions . . . . . . . . . . . . . . . 8 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 8 Hoffman Expires July 17, 2009 [Page 2] Internet-Draft IDNA2 January 2009 1. Introduction This document describes Internationalizing Domain Names in Applications (IDNA) version 2 (hereafter called "IDNAv2"), a direct update to IDNA (hereafter called "IDNAv1"). IDNAv1 consists of four RFCs: o [RFC3490], "Internationalizing Domain Names in Applications (IDNA)", is the main definition of IDNAv1. This defines the processing rules for IDNA and gives the background for how IDNA works. o [RFC3454], "Preparation of Internationalized Strings ("stringprep")", defines the general framework for processing non- ASCII strings that are used in IDNA. o [RFC3491], "Nameprep: A Stringprep Profile for Internationalized Domain Names (IDN)", is a short profile of the rules from the stringprep framework. o [RFC3492], "Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA)", defines the encoding used in IDNAv1 labels. IDNAv2 is backwards-compatible with IDNv1, meaning that any DNS label that was legal in IDNAv1 has exactly the same representation in IDNAv2. New labels are allowed in IDNAv2 that were not allowed in IDNAv1. IDNA needs to be updated for many reasons, some of which are covered in [RFC4690]. If for no other reason, many characters that could appear in domain names have been added since Unicode version 3.2 [UNICODE32], which is the version of the Unicode Standard on which IDNAv1 is based. One explicit goal of this update is to allow labels with characters that have been added since Unicode version 3.2 to be used in IDNA. To that end, IDNAv2 is based on Unicode 5.1 [UNICODE51]. The tables in stringprep and Nameprep are updated to reflect this change. Another explicit goal of this update is to not change the encoding of any label that is legal in IDNAv1. If an internationalized label in IDNAv1 produces an ACE label, IDNAv2 must produce the same ACE label. If an internationalized label in IDNAv1 produces an ASCII label, IDNAv2 must produce the same ASCII label. A third explicit goal is to update the bidirectional ("bidi") algorithm used by IDNAv1 to cover more languages such as Dhivehi and Yiddish. This is done to cover an oversight in IDNAv1 that was discovered after the work was finished. Hoffman Expires July 17, 2009 [Page 3] Internet-Draft IDNA2 January 2009 1.1. Acknowledgements The first serious work on updating IDNAv1 was undertaken by John Klensin, Patrik Faltstrom, Harald Alvestrand, and Cary Karp. It led to the formation of the IDNAbis Working Group in the IETF, and they produced many revisions of their documents in that WG. Some of the ideas in this IDNAv2 document (most notably, the update to the bidi algorithm) is derived from their efforts. Many, many people worked on IDNAv1. In addition to the authors of the standards (Marc Blanchet, Adam Costello, Patrik Faltstrom, and me), there were literally dozens of active participants in the original IDN Working Group in the IETF that began in 2000. Their tireless effort led to IDNAv1. 1.2. Conventions Used In This Document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. In sections of this document where changes are made to RFCs, those changes are shown with a vertical line character ("|") in the first column. 2. Changes to RFC 3490 (IDNA v.1) All references to the Unicode Standard are updated to refer to [UNICODE51]. All references to Nameprep are updated to refer to the Nameprep in this document. Similarly, all references to stringprep are updated to refer to the stringprep in this document. In section 3.1, the first bullet point ("1) Whenever dots are used...") is changed to add the following at the end of the sentence: "U+2CFE (Coptic full stop)". 3. Changes to RFC 3454 (Stringprep) [[[ ============================================================ NOTE FOR EARLY VERSIONS OF THIS DRAFT This section is intentionally incomplete. The tables in Stringprep need to be added to based on the characters added to the repertoire Hoffman Expires July 17, 2009 [Page 4] Internet-Draft IDNA2 January 2009 after Unicode 3.2 up to and including Unicode 5.1. Probably the best way for this to be done is a few dedicated individuals go through the new characters one-by-one, and also to go through them programmatically, and see which tables need to be added to. I have done a first pass of doing this one-by-one, but I felt that publishing my results in the first draft would cause others to get lazy about this important task. Future versions of this document will reflect the results of that work. The character review will be similar to what we did in IDNAv1, except that we don't have to create any new buckets. Basically, we have to see whether a particular new character should be mapped to nothing, or whether it should be prohibited for one of the reasons already listed in RFC 3454. In my not-careful first pass, I found very few characters that will need to be added to sections 3 or 5. The case- mapping will happen algorithmically, with a check that the new map does not change any value in the old map. ============================================================ ]]] This document is significantly revised to reflect the use of Unicode version 5.1. All the substantiative changes are additions. There has been no effort to "correct" perceived mistakes in RFC 3454. (One can argue that the extending of the bidi rules in section 6 to allow more languages to be expressed is such a correction; however, the change lets more strings to be allowed, and doesn't cause any string that was allowed in RFC 3454 to not be allowed in the new version.) Most of the changes to RFC 3454 are to add characters to the tables in the document. These characters come from Unicode version 5.1. Thus, the tables become valid for Unicode version 5.1. However, the same tables are still valid for Unicode version 3.2 because a profile that is still using version 3.2 will not ever use the added rows in the updated tables. In all places other than Appendix A, references to "[Unicode3.2]" are updated to refer to [UNICODE51]. Similarly, all text references to "Unicode version 3.2" are updated to "Unicode version 5.1". Characters will be added to the tables in section 3.1. For example, U+E0100 to U+E01EF will be added to the second list in the section. In section 3.2, change "CaseFolding-3.txt" to "CaseFolding.txt". Characters will be added to the tables in subsections of section 5. An example is that U+2064 will be added to the list in section 5.2. Hoffman Expires July 17, 2009 [Page 5] Internet-Draft IDNA2 January 2009 In section 6, at the end of the fourth paragraph (which currently ends with "have bidirectional category "EN"."), the following sentence is added: "The Unicode Standard also defines a bidirectional category "NSM" for "non-spacing marks"." In section 6, the third requirement is changed to read: | 3) If a string contains any RandALCat character, a RandALCat | character MUST be the first character of the string, and | either a RandALCat character or NSM charcter MUST be the | last character of the string. In the references, update the reference for UAX15, and add a reference for [UNICODE51]. Appendix A is changed to read: | The following is the only repertoire covered in this document: | | - Unicode 3.2, as defined in [UNICODE32] | | - Unicode 5.1, as defined in [UNICODE51] A new appendix, "A.2 Unassigned code points in Unicode 5.1", will be added. The tables in appendixes B, C, and D will be added to. 4. Changes to RFC 3491 (Nameprep) All references to IDNA and stringprep are updated to refer to the stringprep in this document. In section 1 and 2, "Unicode 3.2" is changed to "Unicode 5.1". In section 10, change the last table entry to "This is the second version of Nameprep." 5. Changes to RFC 3492 (Punycode) IDNAv2 does not change RFC 3492. 6. IANA Considerations IANA is requested to add the following to the stringprep profile Hoffman Expires July 17, 2009 [Page 6] Internet-Draft IDNA2 January 2009 registry (www.iana.org/assignments/stringprep-profiles). Name of this profile: Nameprep RFC in which the profile is defined: This document. Indicator whether or not this is the newest version of the profile: This is the second version of Nameprep. 7. Security Considerations The security considerations from RFCs 3454, 3490, 3491, and 3492 all apply to this document. The changes between IDNAv1 and IDNAv2 are not believed to add any new security considerations. 8. References 8.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of Internationalized Strings ("stringprep")", RFC 3454, December 2002. [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, "Internationalizing Domain Names in Applications (IDNA)", RFC 3490, March 2003. [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile for Internationalized Domain Names (IDN)", RFC 3491, March 2003. [RFC3492] Costello, A., "Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA)", RFC 3492, March 2003. [UNICODE32] The Unicode Consortium, "The Unicode Standard, Version 3.2", The Unicode Standard version 3.2. [UNICODE51] The Unicode Consortium, "The Unicode Standard, Version 5.1", The Unicode Standard version 5.1. Hoffman Expires July 17, 2009 [Page 7] Internet-Draft IDNA2 January 2009 8.2. Informative References [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and Recommendations for Internationalized Domain Names (IDNs)", RFC 4690, September 2006. Appendix A. Work Still to be Done Figure out exactly how we want the reference to Unicode 3.2 and Unicode 5.1 to look in the references section, then figure out how to wrestle xml2rfc to produce that. Fill in all the tables for the updates to stringprep. Appendix B. Changes between versions (This section is to be removed by the RFC Editor.) This is the initial version. Author's Address Paul Hoffman Email: phoffman@imc.org Hoffman Expires July 17, 2009 [Page 8]