Network Working Group K. Davies Internet-Draft ICANN Intended status: Informational February 29, 2012 Expires: September 1, 2012 Representing registration policy for IDNs using XML draft-davies-idntables-00 Abstract This memo describes a method of representing the registration policy that a zone administrator uses for registering Internationalised Domain Names using Extensible Markup Language (XML). These registry policies, commonly known as "IDN tables", are used to enforce and share policy on which specific code-points are permitted for registrations, and which alternative code-points are considered variants. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on September 1, 2012. Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of Davies Expires September 1, 2012 [Page 1] Internet-Draft IDN Table XML representation February 2012 the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Design Goals . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. IDN Table XML Format . . . . . . . . . . . . . . . . . . . . . 6 3.1. Basic structure . . . . . . . . . . . . . . . . . . . . . 6 3.2. The meta element . . . . . . . . . . . . . . . . . . . . . 6 3.2.1. The version element . . . . . . . . . . . . . . . . . 6 3.2.2. The date element . . . . . . . . . . . . . . . . . . . 7 3.2.3. The language element . . . . . . . . . . . . . . . . . 7 3.2.4. The domain element . . . . . . . . . . . . . . . . . . 7 3.2.5. The description element . . . . . . . . . . . . . . . 8 3.2.6. The variant-classes element . . . . . . . . . . . . . 8 3.3. The data element . . . . . . . . . . . . . . . . . . . . . 8 3.3.1. Variants . . . . . . . . . . . . . . . . . . . . . . . 9 3.4. Example table . . . . . . . . . . . . . . . . . . . . . . 10 4. Processing a label against a table . . . . . . . . . . . . . . 12 4.1. Determining eligibility for a label . . . . . . . . . . . 12 4.2. Determining variants for a label . . . . . . . . . . . . . 12 5. Conversion between other formats . . . . . . . . . . . . . . . 13 5.1. RFC 3743 Language Variant Table . . . . . . . . . . . . . 13 5.2. RFC 4290 Model Table Format . . . . . . . . . . . . . . . 13 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 7. Security Considerations . . . . . . . . . . . . . . . . . . . 15 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Appendix A. RelaxNG Schema . . . . . . . . . . . . . . . . . . . 17 Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . 20 Appendix C. Editorial Notes . . . . . . . . . . . . . . . . . . . 21 C.1. Known Issues and Future Work . . . . . . . . . . . . . . . 21 C.2. Sample tables and running code . . . . . . . . . . . . . . 21 C.3. Change History . . . . . . . . . . . . . . . . . . . . . . 21 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 22 Davies Expires September 1, 2012 [Page 2] Internet-Draft IDN Table XML representation February 2012 1. Introduction This memo describes how to use Extensible Markup Language (XML) to describe the list of permissible code points and variants used in a zone administrator's policies. Historically, zone administrators - such as top-level domain registries - have published their policies using text and HTML based formats loosely based around the format used to describe a Language Variant Table in [RFC3743]. [RFC4290] further posts a "Model table format" that describes a similar set of functionality. Through the first decade of IDN deployment, experience has shown that these table formats are difficult to consistently implement and compare due to their different formats. A more universal format, such as one using a structured XML format, will assist by improving machine-readability, consistency, and maintainability of IDN tables. It will also provide for more complex conditional implementation of variants that reflects the known requirements of current zone administrator policies. Davies Expires September 1, 2012 [Page 3] Internet-Draft IDN Table XML representation February 2012 2. Design Goals The following items are explicit design goals of this format: o MUST be in a format that can be implemented in a reasonably straightforward manner in software; o The format SHOULD be able to be checked for formatting errors, such that common mistakes can be caught; o An IDN Table MUST be able to express the set of valid code points that are allowed for registration under a specific zone administrator's policies; o MUST be able to express computed alternatives to a given domain name based on a one-to-one, or one-to-many relationship. These computed alternatives are commonly known as "IDN variants"; o IDN Variants SHOULD be able to be tagged with specific categories, such that the categories can be used to support registry policy (such as whether to list the computed variant in the zone, or to merely block it from registration); o IDN Variants MUST be able to stipulated based on contextual information. For example, specific variants may only be applicable when they follow another specific code point, or when the code point is displayed in a specific presentation form; o The data contained within the table MUST be unambiguous, such that independent implementations that utilise the contents will arrive at the same results; o IDN Tables SHOULD be suitable for comparison and re-use, such that one could easily compare the contents of two or more to see the differences, to merge them, and so on. o As many existing IDN Tables are practicable SHOULD be able to be migrated to the new format with all applicable logic retained. It is explicitly NOT the goal of this format to: o Stipulate what code points should be listed in an IDN Table by a zone administrator. What registration policies are used for a particular zone is outside the scope of this memo. o Stipulate what a consumer of an IDN Table must do when they determine a particular domain is valid or invalid; or arrive at a set of computed IDN variants. IDN Tables are only used to Davies Expires September 1, 2012 [Page 4] Internet-Draft IDN Table XML representation February 2012 describe rules for computing code points, but does not prescribe how registries and other parties utilise them. Davies Expires September 1, 2012 [Page 5] Internet-Draft IDN Table XML representation February 2012 3. IDN Table XML Format 3.1. Basic structure The basic XML framework of the document is as follows: ... Within the "idntable" element rests two sub-elements. Firstly is a "meta" element that contains all meta-data associated with the IDN table, such as its authorship, what it is used for, implementation notes and references. This is followed by a "data" element that contains the substantive code-point data. ... ... A document should contain exactly one "idntable" element, and within that optionally one "meta" element and exactly one "data" element. 3.2. The meta element The "meta" element is used to express meta-data associated within the IDN table. It can be used to explain the author or relevant contact person, explain what the usage of the IDN table is, provide implementation notes as well as references. The data contained within is not required by software consuming the IDN table in order to calculate valid IDN labels, or to calculate variants. 3.2.1. The version element The "version" element is used to uniquely identify each version of the table being represented. No specific format is required, but it is RECOMMENDED that it be a numerical positive integer, which is incremented with each revision of the file. An example of a typical first edition of a document: Davies Expires September 1, 2012 [Page 6] Internet-Draft IDN Table XML representation February 2012 1 A common alternative is to use a major-minor number scheme, where two decimal numbers are used to represent major and minor changes to the table. For example, "1.0" would be the first major release, "1.1" would be a minor update to that, and "2.0" would represent a major revision. 3.2.2. The date element The "date" element is used to identify the date the table was written. The contents of this element MUST be a valid ISO 8601 date string as described in [RFC3339]. Example of a date: 2009-11-01 3.2.3. The language element The "language" element signals that the table is associated with a specific language or script. The value of the language element must be a valid language tag as described in [RFC5646]. The tag may simply refer to a script if the table is not referring to a specific language. There may be multiple language elements for a table if the table spans multiple languages and/or scripts. Example of an English language table: en If the table applies to a specific script, rather than a language, the "und" language tag should be used followed by the relevant [RFC5646] script subtag. For example, for a Cyrillic script table: und-Cyrl 3.2.4. The domain element This optional element refers to a domain to which this policy is applied. example There may be multiple tags used to reflect a list of domains. Davies Expires September 1, 2012 [Page 7] Internet-Draft IDN Table XML representation February 2012 3.2.5. The description element The "description" element is a free-form element that contains any additional relevant description. Typically, this field contains authorship information, as well as additional context on how the table was formulated (such as with references), and how it has been applied. The element has an optional "type" attribute, which refers to the MIME-type of the enclosed data. If the description lacks a type field, it will be assumed to be plain text. The description elements describe information relating to the IDN table that is useful for the user of the table in its interpretation. This may explain the history, the rationale, reference sources etc. It may also contain authorship information. The "type" attribute may be used to specify the encoding within description element. The attribute should be a valid MIME type. If supplied, it will be assumed the contents is a single CDATA element of that encoding. Typical types would be "text/plain" or "text/ html". "text/plain" will be assumed if no type attribute is specified. 3.2.6. The variant-classes element Consumers of the IDN table may classify the generated variants into different classes using the class attribute, discussed below. This class attribute allows the registry to apply different policy (for example, whether the block or register specific generated strings). The variant-classes block provides human-readable explanations of the meaning of the classes used in the IDN table. 3.3. The data element The "data" element contains the code point data the comprises the registry policy. It describes registry policy using a series of XML elements that either represent individual code points, or ranges of code points. The data may use the "char" and "range" elements to specify code points, and code ranges. Discrete permissable code points may be stipulated with a "char" element, e.g. Davies Expires September 1, 2012 [Page 8] Internet-Draft IDN Table XML representation February 2012 Ranges of permissable code points may be stipulated with a "range" element, e.g. Codepoints must be expressed in hexadecimal, i.e. according to the standard Unicode convention without the prefix "U+". The rationale for not allowing other encoding formats, including native Unicode encoding in XML, is explored in [UAX42]. The XML conventions used in this format, including the element and attribute names, mirror this document where practical and reasonable to do so. 3.3.1. Variants While most tables typically only determine code point eligibility, others additionally specify a mapping of code points to other code points, known as "variants". What constitutes a variant is a matter of policy, and varies for each implementation. 3.3.1.1. Basic variants Variants are specified as one of more children of a "char" element. For example, to map "v" as a variant of "u": A sequence of multiple code points can be specified as a variant of a single code point. For example, the sequence of "o" then "e" can be specified as a variant for an "o with umlaut" (U+00F6) as follows: It is not possible to specify variants for ranges. 3.3.1.2. Null variants To specify a null variant, which is a variant string that maps to no codepoint, use the null codepoint 0000. For example, to mark a string with a zero width non-joiner to the same string without the zero width non-joiner: Davies Expires September 1, 2012 [Page 9] Internet-Draft IDN Table XML representation February 2012 3.3.1.3. Conditional variants At its basis, generation of variants are conditional on a specific code point or set of code points. However, in some instances registries perform control based on other attributes that can not solely be determined based on simple code point comparisons. For example, in some tables utilising the Arabic script, the Arabic contextual form is a determinant in which variants are used. The contextual form can not be derived solely from the code point, as the code point is the same for the various forms. The IDN table provides for conditioning generation variants on specific instances as follows, using the "when" attribute. arabic-initial Based on context, the code point would be presented in its Arabic Initial form. arabic-isolated Based on context, the code point would be presented in its Arabic Isolated form. arabic-medial Based on context, the code point would be presented in its Arabic Medial form. arabic-final Based on context, the code point would be presented in its Arabic Final form. For example, to mark U+0673 as a variant of U+0625, but only when it appears in isolated or final forms: 3.4. Example table A sample complete XML IDN table is as follows. Davies Expires September 1, 2012 [Page 10] Internet-Draft IDN Table XML representation February 2012 1 2010-01-01 sv example This language table was developed with the Swedish examples institute. Davies Expires September 1, 2012 [Page 11] Internet-Draft IDN Table XML representation February 2012 4. Processing a label against a table 4.1. Determining eligibility for a label In order to use a table to test a specific IDN table for membership in the table, a consumer of an IDN table must iterate through each code point within a given U-label, and test that each code point is a member of the IDN table. If any code point is not a member of the IDN Table, it shall be deemed as not eligible in accordance with the table. A code point is deemed a member of the table when it is listed with the element, and all necessary condition listed in "when" attributes are correctly satisfied. 4.2. Determining variants for a label For a given eligible label, the set of variants is deemed to be each possible permutation of elements, whereby all "when" attributes are correctly satisfied for each code point in the given permutation. Davies Expires September 1, 2012 [Page 12] Internet-Draft IDN Table XML representation February 2012 5. Conversion between other formats 5.1. RFC 3743 Language Variant Table All attributes can be retained in conversion from an [RFC3743] language variant table to this XML format. This XML format can be converted to the format described in [RFC3743], with the following caveats: o Version numbers not expressed as integers will not satisfy the ABNF formatting for [RFC3743]. o Much of the additional meta data can not be expressed in the text format (although can be supplied as comments in the text file). o The [RFC3743] format only allows for two variant classes, those that are preferred and those that are regular. Other distinctions will be lost. o No ability to retain conditional variants. 5.2. RFC 4290 Model Table Format All attributes can be retained in conversion from the [RFC4290] model table format to this XML format. Tables similarly can be converted to the format described in [RFC4290] with the same caveats as the [RFC3743] format, and additionally the inability to classify variants into groups such as "preferred". Davies Expires September 1, 2012 [Page 13] Internet-Draft IDN Table XML representation February 2012 6. IANA Considerations This document does not specify any IANA actions. Davies Expires September 1, 2012 [Page 14] Internet-Draft IDN Table XML representation February 2012 7. Security Considerations There are no security considerations for this memo. Davies Expires September 1, 2012 [Page 15] Internet-Draft IDN Table XML representation February 2012 8. References [RFC3339] Klyne, G., Ed. and C. Newman, "Date and Time on the Internet: Timestamps", RFC 3339, July 2002. [RFC3743] Konishi, K., Huang, K., Qian, H., and Y. Ko, "Joint Engineering Team (JET) Guidelines for Internationalized Domain Names (IDN) Registration and Administration for Chinese, Japanese, and Korean", RFC 3743, April 2004. [RFC4290] Klensin, J., "Suggested Practices for Registration of Internationalized Domain Names (IDN)", RFC 4290, December 2005. [RFC5564] El-Sherbiny, A., Farah, M., Oueichek, I., and A. Al-Zoman, "Linguistic Guidelines for the Use of the Arabic Language in Internet Domains", RFC 5564, February 2010. [RFC5646] Phillips, A. and M. Davis, "Tags for Identifying Languages", BCP 47, RFC 5646, September 2009. [UAX42] Unicode Consortium, "Unicode Character Database in XML". Davies Expires September 1, 2012 [Page 16] Internet-Draft IDN Table XML representation February 2012 Appendix A. RelaxNG Schema Davies Expires September 1, 2012 [Page 17] Internet-Draft IDN Table XML representation February 2012 Davies Expires September 1, 2012 [Page 18] Internet-Draft IDN Table XML representation February 2012 Davies Expires September 1, 2012 [Page 19] Internet-Draft IDN Table XML representation February 2012 Appendix B. Acknowledgements This format builds upon the work on documenting IDN tables by a number of other parties, most significantly that of the the Joint Engineering Team published as [RFC3743], and [RFC5564] published by the Arabic-language community. Contributions that have helped shape this document have been contributed by Francisco Arias, Nicholas Ostler, Steve Sheng and Andrew Sullivan. Davies Expires September 1, 2012 [Page 20] Internet-Draft IDN Table XML representation February 2012 Appendix C. Editorial Notes This appendix to be removed prior to final publication. C.1. Known Issues and Future Work o The text does not currently provide a mechanism for deriving variants based on a sequence of two or more code points. Such a mechanism would be required to perform an inverse mapping from one provided in this document, namely, mapping the sequence of "o" and "e" to an "o with umlaut" o A mechanism for a specific code point only being eligible when preceded or followed by a specific sequence of characters is not provided. Such a mechanism is needed to support the contextual rule required by the .CAT top-level domain, which supports the middle dot (U+00B7) only when both preceded and followed by the letter "l" (U+006C). o An optional mechanism for explicitly nominating the registry action associated with a computed variant could be added. For example, an "action" attribute to the element could specify one of the following: "allocate", "block", "delegate", "mirror" or "withhold". Each of these actions would need to be formally defined. o The tables may benefit from a unique identifier, such as an "id" attribute on the element. o A more formal step-wise description of how variants are computed needs to be supplied. C.2. Sample tables and running code Some sample tables using this format, as well as a basic implementation of this specification, is posted at https://github.com/kjd/idntables C.3. Change History -00 Initial draft. Davies Expires September 1, 2012 [Page 21] Internet-Draft IDN Table XML representation February 2012 Author's Address Kim Davies Internet Corporation for Assigned Names and Numbers 4676 Admiralty Way Suite 330 Marina del Rey, CA 90292 US Phone: +1 310 823 9358 Email: kim.davies@icann.org URI: http://www.iana.org/ Davies Expires September 1, 2012 [Page 22]