Internet DRAFT - draft-iucg-wgidnabislc

draft-iucg-wgidnabislc






Network Working Group                            Jean-Francois C. Morfin
Internet-Draft                                                   Intlnet
Intended status: Independent submission                September 9, 2009
Expires: March 10, 2010


                  WG-IDNABIS/LC comments and responses
                      draft-iucg-wgidnabislc-01.txt

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.  This document may contain material
   from IETF Documents or IETF Contributions published or made publicly
   available before November 10, 2008.  The person(s) controlling the
   copyright in some of this material may not have granted the IETF
   Trust the right to allow modifications of such material outside the
   IETF Standards Process.  Without obtaining an adequate license from
   the person(s) controlling the copyright in such materials, this
   document may not be modified outside the IETF Standards Process, and
   derivative works of it may not be created outside the IETF Standards
   Process, except to format it for publication as an RFC or to
   translate it into languages other than English.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on January 14, 2010.

Copyright Notice

   Copyright (c) 2009 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal



Morfin                   Expires March 10, 2010                 [Page 1]

Internet-Draft                  wgidnalc                  September 2009


   Provisions Relating to IETF Documents in effect on the date of
   publication of this document (http://trustee.ietf.org/license-info).
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.


Abstract

   The IDNA is a key issue for the IUCG as a paradigm for the future of
   the Internet. There is therefore a need to make sure its description
   document set reflects a complete IETF and users consensus. To help
   this, this memo keeps track of the WG-IDNABIS/LC requested and
   received answers. The IAB Draft on IDN has been added because some
   remarks have been made which are important. The author is quoted if
   the comment is not from IUCG



Table of Contents

   1.  Introduction................................................... 3
   2.  General appreciation........................................... 3
   3.  IDN - IAB...................................................... 4
   4.  IDNA Definitions............................................... 6
   5.  IDNA Rationale................................................ 16
   6.  IDNA Mapping.................................................. 19
   7.  IDNA Protocol................................................. 21
   8.  IDNA BIDI..................................................... 31
   9.  IDNA Tables................................................... 41


Requirements notation

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
















Morfin                   Expires March 10, 2010                 [Page 2]

Internet-Draft                  wgidnalc                  September 2009

1.  Introduction
   
   The IDNA is a key issue for the IUCG as a paradigm for the future of
   the Internet. There is therefore a need to make sure its description
   document set reflects a complete IETF and users consensus. To help
   this, this memo keeps track of the WG-IDNABIS/LC requested and
   received answers. The IAB Draft on IDN has been added because some
   remarks have been made which are important to the background and to
   the future of the Multilingual Internet.
   
   The author is quoted if the comment is not from IUCG. This
   compilation is updated up to the answer of Harald Alvestrand on
   08/29.
   
2.  General appreciation
   
   *  The document repartition seems adequate. However, even if the
      Mapping memo was not a part of the IDNA (why?) document set, it is
      more than logical and enlightening to have it read prior to the
      Protocol parts.
      
   *  The documents are rather confusing because it is impossible to
      decide whether:
      
      *  they consider IDNA as a part or not as a part of the DNS (we
         may also be influenced by the ML-DNS pile we work on).
         
      *  they differentiate (which) between characters and codepoints.
         
      *  they use NFKC or NFC, and what are their differences,
         intrinsically and from an IDNA point of view
         
      *  they want to be a complete standards, or a partial suggestions,
         set. This results from:
         
         *  the non-normative forms are being used in places that one
         would deem normative
         
         *  the constant discussion of Registries'
         capacities/obligations and the lack of documentation on the
         tools for executing them and managing the related
         registration/coding metadata and rules.
   
   *  (Martin Duerst): - Use only one name for talking about the
      document collection. Currently:
      
      *  'collection' (Abstract, 1.1)
         
      *  'series' (1.1)



Morfin                   Expires March 10, 2010                 [Page 3]

Internet-Draft                  wgidnalc                  September 2009

         
      *  'set' (1.3)
         
      *  'and the associated ones' (1.1.1)
         
      *  'these documents' (2.1; very unclear when reading whether that
         phrase indeed refers to the document collection or to Unicode
         documents or what, similar again in 2.2)
         
      This variability is confusing.
   
3.  IDN - IAB
   
   *  (Martin Duerst): Mentioning ISO-2022-JP for encoding Japanese
      domain names raises some suspicion. ISO-2022-JP may well be (or
      have been) used in the DNS or a similar system, but such use would
      be atypical, and should be documented by a reference. Based on the
      general "division of labor" of the three classical Japanese
      encodings (ISO-2022-JP, EUC-JP, Shift_JIS), one would expect
      EUC-JP or Shift_JIS rather than ISO-2022-JP in such a case. [Among
      the three, ISO-2022-JP makes it easiest to explain the "heuristic
      encoding detection" scenario described at the end of Section 1.1.
      But without a reference, it may look to some as if ISO-2022-JP was
      a made-up example.]
   
   *  1. "An Internationalized Domain Name (IDN) is a name that contains
      one or more non-ASCII characters.". What is a "name" vs. a domain
      name?
   
   *  1. "When an IDN is encoded with Punycode, it is prefixed with
      "xn--",". Is an IDN a label?
   
   *  1. "it is prefixed with "xn--", which assumes that ASCII names do
      not start with this prefix." Isn't the whole thing supposed to use
      "xn--" ASCII labels instead of non-ASCII entries?
   
   *  1. "reversible Unicode-to-Punycode conversion .... reversible
      Punycode-to-Unicode conversion". Unicode is a table, Punycode is
      an algorithm. Punycode is not reversible, but its use can be
      restricted to codepoints in turn permitting it to perform
      reversibly.
   
   *  1. "UTF-8 [RFC3629] is a mechanism for encoding a Unicode
      character in a variable number of 8-bit octets, where an ASCII
      character is preserved as-is." Characters belong to scripts that
      may or may not be supported by ASCII and Unicode encoding tables.
   
   *  1.1. (Martin Duerst): For the bulleted list at the end of Section
      1.1, it should be pointed out that UTF-8 can be detected, and
      distinguished from other 8-bit encodings, with much higher



Morfin                   Expires March 10, 2010                 [Page 4]

Internet-Draft                  wgidnalc                  September 2009


      precision than just "a byte in the string has the 8th bit set".
      For details, please see
      http://www.ifi.unizh.ch/mml/mduerst/papers/PDF/IUC11-UTF-8.pdf.
   
   *  1.1. (Martin Duerst): The heuristic for punycode that is given in
      Section 1.1 is "starts with xn--". However, on the level of
      getaddrinfo, we are dealing with domain names, not single labels,
      and something like www.xn--foo.jp should definitely be punycode
      even if it doesn't start with xn--.
   
   *  1.1. "When used with a DNS resolver library, IDNA is inserted as a
      "shim" between the application and the resolver library": that
      shim should be located in Figure 2.
   
   *  1.1. "Again this assumes that ASCII names never start with "xn--",
      and also that UTF-8 strings never contain an ESC character". It
      may be worth documenting whether "IDN", "names", "labels", and
      "strings" are or are not the same thing.
   
   *  2. "Applications that convert an IDN to Punycode before calling
      getaddrinfo() will result in name resolution failures if the
      Punycode name is directly used in such protocols. " Why? Is that
      not actually due to the reason that was given in 3? "Applications
      that convert an IDN to Punycode before calling getaddrinfo() will
      result in name resolution failures if the name is actually
      registered in a private name space in some other encoding (e.g.,
      UTF-8)."
   
   *  3. "While implementations of the DNS protocol must not place any
      restrictons on the labels that can be used, applications that use
      the DNS are free to impose whatever restrictions they like, and
      many have." Wouldn't these two rules contradict the proposed
      WG-IDNABIS charter change? Wouldn't they permit the support of
      cases such as Tatweel, Tamil figures, and French majuscules?
   
   *  3.4. "The DNS resolver will append suffixes in the suffix search
      list". Where is the "suffix search list" documented?
   
   *  4. "even when DNS is used, the conversion to Punycode should be
      done only by an entity that knows which name space will be used."
      Fundamental. Yet, this is not considered by IDNA rationale or
      protocol.
   
   *  4.1. "Indeed the choice of conversion within the resolver
      libraries is consistent with the quote from [RFC3490] section 6.2
      stating that Punycode conversion "might be performed inside these
      new versions of the resolver libraries". - "the recommendation is
      that a resolver library be more liberal in what it would accept
      from an application would mean that such a name would be accepted



Morfin                   Expires March 10, 2010                 [Page 5]

Internet-Draft                  wgidnalc                  September 2009


      and re-encoded as needed". These recommended architectures (such
      as ML-DNS) are not considered in the IDNA rationale. Will IDNA be
      interoperable with these recommended architectures?
   
   *  4. "encoding conversion between Punycode and UTF-8 is
      unambiguous". (?). This could lead to stabilization through
      punycode and A-labels, in turn making A-labels the DNS referent
      entry for UTF support?
   
   *  5. Security considerations. This kind of consideration is already
      posing a problem for TM protection. This is the "Babel Names"
      case. This occurs when someone trademarks the U-label
      corresponding to a protected Roman script TM. When that U-label
      displays under its ASCII label form, it infringes on the Roman
      script TM rights. Ex.: "xn--coca-cola" or "xn--vint-cerf".
   
   *  (Martin Duerst): The solution that the document seems to be
      pushing most is heuristic detection, i.e. an API where strings in
      different encodings are fed in and the API sorts things out
      heuristically, converting if necessary. To some extent, this may
      be an unavoidable evil, but it would be good if the document were
      pushing more for clear encoding identification (for which I think
      GetAddrInfoW() (UTF-16) would be an example).
   
   *  (Martin Duerst): It may be a good idea to also look into the issue
      of escaped forms of domain names being fed into resolver APIs. One
      form of escaping is (UTF-8-based) %-encoding in URIs (and IRIs),
      which is allowed in URIs according to RFC 3986, is the only way to
      encode non-ASCII in the host part of an URI where punycode isn't
      appropriate, and may be the result of a conversion from an IRI to
      an URI. For further background and discussion, please see
      http://lists.w3.org/Archives/Public/public-iri/2009Aug/0012.html
      and
      http://lists.w3.org/Archives/Public/public-iri/2009Aug/0024.html
      and the followup discussion.
      
      Another potential kind of escaping are HTML/XML numeric character
      references (of the form "& #xABCD ;"), although I expect them to
      be less of a problem because they are used higher up in the
      application and usually removed early on.
   
4.  IDNA Definitions
   
   *  Information on Unicode is scattered throughout the document.
      Wouldn't it be much better to describe a clear sequence?
      
      *  what an IDNA is,
         
      *  what IDNs are,



Morfin                   Expires March 10, 2010                 [Page 6]

Internet-Draft                  wgidnalc                  September 2009

         
      *  what IDNA labels are,
         
      *  what they are made of,
         
      *  how Unicode supports them, including NFC in the same 2.1.
         section,
         
      *  how a zone manager may impose profiling rules (description,
         enforcement).
   
   *  Most of the new terms are discussed before being defined. This
      starts with the confusing "looking them up" in part 1.1.1. (which
      means resolving, and not just asking about, validity or existence)
      as opposed to "registering"). IDNs are introduced 2.3.2.3. etc.
      This certainly reflects how difficult the work is in defining all
      these terms, but it is still quite confusing. For example, it is
      advisable to begin with part 4.4.
   
   *  The different classes of domain names that are discussed only seem
      to be related to IDNA without an exhaustive presentation of the
      DNS domain name context. The names are somewhat confusing. The
      drafts are certainly clear, but they do not reflect a progressive
      logic of discovery of the nature of a name/label that could be
      ported to programming functions.
   
   *  References to the lower/uppercase image can be understood by DNS
      old-timers, but is confusing to newcomers, as it does not reflect
      the same functionality and because U-label/A-label lower/uppercase
      treatment is not the same.
   
   *  Different keyboards and encoding are discussed, stressing that a
      DNS resolution calls for a U-label conversion, but nothing obliges
      local applications to transcode user entries to Unicode when they
      interoperate at a layer other than DNS. However, these
      applications may want to canonize these entries in their proper
      way. Interplus supports the idea that an application layer may use
      middle non-Unicode and non-ASCII coding. Among others, this
      facilitates interoperability with UTF-8 that Microsoft supports
      within private nets: the user interface may be common and the
      underlying machinery either IDNA or UTF-8.
   
   *  1.1.1 (Martin Duerst): Audiences - "what names are permitted in
      DNS zone files," -> "what names are permitted in DNS zones,"
      (whether these are files or whatever is implementation-dependent.
      
      This section is very important, and would be much more effective
      with less circumscription. Just use straightforward terms
      people/functions that everybody else names directly, such as
      'registries', 'registrars', 'administrators creating subdomains',



Morfin                   Expires March 10, 2010                 [Page 7]

Internet-Draft                  wgidnalc                  September 2009


      and so on, and then say that this list isn't exclusive. That will
      have the additional benefit of bringing the document up in more of
      the relevant searches.
      
      The second paragraph is also overly circumscriptive. Using "the
      one containing explanatory material" to refer to Rationale is a
      strong disservice to every reader, even if strictly speaking may
      be preferable to a forward reference. Please use [] style
      references, or labels such as "Rationale" with a short sentence
      pointing to 1.3, or move the "who should read what" info to 1.3
      with a general pointer from 1.1.1. But please stop talking around
      stuff that can be easily expressed more directly (this general
      comment applies in many other places, too).
   
   *  1.1.1 (Martin Duerst): should be 1.2, and 1.1.2 should be 1.3, and
      1.3 should be 1.4, to simplify structure.
   
   *  2.1 (Martin Duerst): Say that 0x means hexadecimal (first para)
   
   *  2.3.1 (Martin Duerst): title: This looks as if this section
      defines one term, "LDH-Label". Change the title to something more
      general, such as "Definitions for ASCII-only Labels".
   
   *  2.3.1. (Paul Hoffman): Anchor 10 above Figure 1 indicates that the
      figure might be shrunk. I propose instead that the four footnotes
      simply be moved to immediately after the figure, which makes the
      figure itself fit on one page.
   
   *  2.3.1. (Paul Hoffman) In Figure 2, the terms "Binary Label
      (including high bit on)" and "Bit String Label" are not defined
      and are confusing without definition. Do we need this figure at
      all any more?
   
   *  2.3.1. (Martin Duerst): general (but most urgently 3rd para): Make
      sure that the terms defined stick out, at least the same way as in
      2.1 (one para per def, defined word is first word of para). Move
      clear and simple definition to front, and rationale,
      relationships,... to the end of the paragaraph.
   
   *  2.3.1. (Martin Duerst): Move normative text to Protocol ("those
      labels MUST NOT be processed as ordinary LDH-labels by
      IDNA-conforming programs and SHOULD NOT be mixed with IDNA-labels
      in the same zone")
   
   *  2.3.1. (Martin Duerst): 3rd para: "but which otherwise conform to
      LDH-label rules" -> "but otherwise conform to LDH-label rules"
   
   *  2.3.1. (Martin Duerst): 3rd para: "case-independent" ->
      "case-insensitive"



Morfin                   Expires March 10, 2010                 [Page 8]

Internet-Draft                  wgidnalc                  September 2009

   
   *  2.3.1. (Martin Duerst): 3rd para: "divided in" -> "divided into"
   
   *  2.3.1. (Martin Duerst): "for future extensions that use extensions
      based on the same "prefix and encoding" model"": a) 'extensions'
      is repeated; b) the IETF is great at not talking about future
      eventualities and describing general models that never may be
      used. In this and other sections, such stuff should also be cut
      out.
   
   *  2.3.1. (Martin Duerst): anchor10: I do not understand why we need
      the (1)..(4) notes. Either the definitions are clear enough, or
      they should be fixed. Something like "NON-RESERVED LDH LABELS
      (NR-LDH-labels) NR-LDH LABELS" is also total overkill. The only
      thing that's necessary is "NR-LDH labels", with exactly the same
      capitalization and hyphenation as in the definition.
   
   *  2.3.1. (Martin Duerst): Fig. 2: I'm somewhat confused here. Note
      (5) seems to suggest that U-labels have a fixed binary encoding
      (e.g. UTF-8) and are used directly in the DNS. Otherwise, the note
      doesn't make sense.
   
   *  2.3.2.1. (Martin Duerst): "While that constraint may be tested in
      any of several ways, an A-label must be capable of being produced
      by conversion from a U-label and a U-label must be capable of
      being produced by conversion from an A-label.": This puts the
      chart before the horse. Change to "An A-label must be capable of
      being produced by conversion from a U-label and a U-label must be
      capable of being produced by conversion from an A-label. There are
      several ways in which this constraint may be tested."
   
   *  2.3.2.1. (Andrew Sullivan): Para. 2.3.2.1: An "A-label" is the
      ASCII-Compatible Encoding (ACE, see Section 2.3.2.5) form of an
      IDNA-valid string. It must be a complete label: IDNA is defined
      for labels, not for parts of them and not for complete domain
      names. This means, by definition, that every A-label will begin
      with the IDNA ACE prefix, "xn--" (see Section 2.3.2.5), followed
      by a string that is a valid output of the Punycode algorithm
      [RFC3492] and hence a maximum of 59 ASCII characters in length.
      The prefix and string together must conform to all requirements
      for a label that can be stored in the DNS including conformance to
      the rules for the preferred form described in RFC 1034, RFC 1035,
      and RFC 1123. A string meeting the above requirements is still not
      an A-label unless it can be decoded into a U-label.
      
      So, to be less vague: the section is supposed to define certain
      terms, and that bullet ought to define "A-label". It does not. It
      tells us the necessary conditions for being an A-label, but not
      the sufficient. This could be remedied if the last sentence said
      instead, "If and only if a string meeting the above requirements



Morfin                   Expires March 10, 2010                 [Page 9]

Internet-Draft                  wgidnalc                  September 2009


      can be decoded into a U-label, then it is an A-label." But I'm no
      longer sure that's true, given that we've lived with the I-D
      definition so long and yet not had it fully operationalized. Is
      there anything else? If there is, it needs to be added. These
      definitions, I say, must be completely operationalized (or else we
      have no excuse to call this document the definitions document).
      Since people have to write code on the basis of these definitions,
      they must be completely unambiguous.
   
         Author's current answer: These changes, with Paul's suggested
         modifications, have been tentatively accepted and incorporated
         in the document. Anyone who objects should say so quickly.
   
   *  2.3.2.1. (Martin Duerst): "Among other things, this implies that
      both U-labels and A-labels must be strings in Unicode NFC
      [Unicode-UAX15] normalized form.": A-labels are by definition in
      NFC, because they are ASCII-only. If you want to say that they
      must *represent* labels that are in NFC, that would be fine, but I
      think mentioning NFC here isn't really necessary.
      
      MAJOR!!!!!
   
   *  2.3.2.1. (Martin Duerst): says: "Any rules or conventions that
      apply to DNS labels in general, such as rules about lengths of
      strings, apply to whichever of the U-label or A-label would be
      more restrictive. For the U-label, constraints imposed by existing
      protocols and their presentation forms make the length restriction
      apply to the length in octets of the UTF-8 form of those labels
      (which will always be greater than or equal to the length in code
      points)."
      
      Now this is TOTALLY NEW to me. There sure is a restriction to 63
      octets in the DNS itself, but because U-labels don't enter the DNS
      as such (neither as UTF-8 nor as UTF-16 or whatever), an arbitrary
      UTF-8-based length restriction seems totally unjustified. I'm not
      at all aware of such a restriction in IDNA2003.
      
      Indeed, punycode was explicitly designed, among else, to perform
      well for scripts with few characters. For small scripts that need
      3 bytes per character in UTF-8 (all Indic scripts, Georgian,
      Sinhala, Thai, Lao, Tibetan, Myanmar, Ethiopic, Cherokee, Unified
      Canadian Aboriginal Syllabics, Khmer,..., this restriction would
      mean a drastic reduction of the number of characters usable in a
      label. To give an example, when at W3C, I created some IRI tests
      (http://www.w3.org/2001/08/iri-test/).
      
      The tests use Hiragana (http://www.???????.w3.mag.keio.ac.jp and
      http://???????.???????.???????.w3.mag.keio.ac.jp), which is
      atypical in that Hiragana-only Japanese is rarely used except in



Morfin                   Expires March 10, 2010                [Page 10]

Internet-Draft                  wgidnalc                  September 2009


      children's books, but it is typical in that punycode is able to
      represent 41 Hiragana (123 octets in UTF-8) in 58 octets. Hiragana
      overall contains about 80 letters in a single block; punycode
      efficiency will vary with the size of the script (more efficient
      for smaller scripts, less efficient for larger scripts) as well as
      of course with every individual label.
      
      Currently, all (on Windows) of IE7, Mozilla Firefox, Safari, and
      Opera pass both length tests (single label and multiple labels).
      It would be very counterproductive if IDNA2008 required further
      artificial restrictions which essentially disfavor languages and
      cultures that haven't been lucky to get short encodings for their
      scripts in UTF-8. (I'd be fine if the Security section warns about
      the potential of some protocols or implementations not having
      appropriate space, but that's on a completely different level.)
   
   *  2.3.2.1. (Andrew Sullivan): "A "U-label" is an IDNA-valid string
      of Unicode characters, in normalization form NFC and including at
      least one non-ASCII character, expressed in a standard Unicode
      Encoding Form (in an Internet transmission context this will
      normally be UTF-8)." The parenthetical remark, I think, encourages
      implementers not to recognise as U-labels strings that come in as
      (say) UTF-32, but that are otherwise perfectly valid. Who cares
      what is normal in an Internet transmission context, when we're
      defining terms? Why does that matter ?
   
         Author's current answer: The comment was made because there is
         no requirement at all in IDNA (either 2003 or 2008) that UTF-8
         be used; many applications on particular operating systems
         actually use something else (UTF-16 is most common). But I
         dropped the additional text. It now just says "(such as UTF-8)"
         as Paul suggested. Again, anyone who doesn't like this should
         speak up
   
   *  2.3.2.1. (Andrew Sullivan): "To be valid, U-labels and A-labels
      must obey an important symmetry constraint. While that constraint
      may be tested in any of several ways, an A-label must be capable
      of being produced by conversion from a U-label and a U-label must
      be capable of being produced by conversion from an A-label. Among
      other things, this implies that both U-labels and A-labels must be
      strings in Unicode NFC [Unicode-UAX15] normalized form. These
      strings MUST contain only characters specified elsewhere in this
      document series, and only in the contexts indicated as
      appropriate."
      
      This passage nowhere actually says that _the_ A-label produced by
      conversion from a particular U-label must in turn produce, by the
      application of the alogorithm, the _same_ U-label. There is a
      symmetry (though not an obvious one) in U[1] being convertible to



Morfin                   Expires March 10, 2010                [Page 11]

Internet-Draft                  wgidnalc                  September 2009


      A[2] which is convertible to U[2] which is convertible to A[1],
      for instance. I have no idea whether such is possible, but there's
      no reason our formal definitions need to allow for it. This can be
      fixed so:
      
      To be valid, U-labels and A-labels must obey an important symmetry
      constraint. While that constraint may be tested in any of several
      ways, an A-label A' must be capable of being produced by
      conversion from a U-label U', and that U-label U' must be capable
      of being produced by conversion from A-label A'. Among other
      things, this implies that both U-labels and A-labels must be
      strings in Unicode NFC [Unicode-UAX15] normalized form. These
      strings MUST contain only characters specified elsewhere in this
      document series, and only in the contexts indicated as
      appropriate.
      
      I don't care about the notation, as long as it is unambiguously
      clear that we're always talking about the "very same" label on
      both sides of the transformation. We could go so far as to define
      IDNA-equivalent A-labels and U-labels formally. I think this would
      do it:
      
      A-label1 and U-label1 are equivalent if and only if all the
      following four conditions are true:
      
      1. The encoding of A-label1 according to [RFC3492] results in
      U-label1.
      
      2. The decoding of U-label2 according to [RFC3492] results in
      A-label2.
      
      3. A-label1 is equivalent to A-label2 according to DNS matching
      rules for labels.
      
      4. U-label1 is bistring equivalent to U-label2.
      
      Some may reject the above as a bit of needless formalism, or want
      to reduce some steps. I argue that this is the most basic and
      therefore most clear (but admittedly inelegant) formulation. As
      usual, however, I'm utterly prepared to admit that I've actually
      got the rule incorrect. But if I have, that amounts to a hint of
      trouble with the document, because I've managed to misunderstand
      it (and though I be dim, I have been following this effort).
   
   *  2.3.2.1 (Wil Tan): In protocol section 5.3, A-label Input section
      to add the lowercasing step prior to using the Punycode decoding
      algorithm. The section on symmetry constraint (-defs-10, section
      2.3.2.1) should also have similar wordings.
   



Morfin                   Expires March 10, 2010                [Page 12]

Internet-Draft                  wgidnalc                  September 2009


   *  2.3.1. Para5 (Wil Tan): Labels within the class of R-LDH labels
      that are not prefixed with "xn--" are also not valid IDNA-labels.
      To allow for future use of mechanisms similar to IDNA, those
      labels MUST NOT be processed as ordinary LDH-labels by
      IDNA-conforming programs and SHOULD NOT be mixed with IDNA-labels
      in the same zone.
   
         Author's current response: Unless, in the moving around of
         text, we have slipped up, it is important to note that the
         restriction here applies _only_ to IDNA-aware applications.
         That prevents it from being a restriction on the DNS generally.
         However, for IDNA-aware applications, it is a precaution
         against possible future prefix-altering changes as well as
         something of a mechanism for making it harder for bad guys to
         game future changes. If any non-IDNA arrangements come along
         that use "??--" label encodings, they will of course have to be
         coordinated with each other and with IDNA; in the interim, this
         provision keeps IDNA out of their way (i.e., avoids preempting
         such approaches).
         
         And, yes, the WG did discuss this at great length.
   
      "I may have missed it, but don't recall any discussions about
      restricting the processing of other tagged domains. Is this the
      right draft to prescribe restrictions on how non-XN-Labels are
      processed?"
   
         Author's current response: IMO, we are much too tied up in
         special definitions and confusing terminology already. Please
         let's not make it worse by introducing more unnecessary
         terminology in the form of "tagged domains". And it is the
         right place for defining how IDNA-aware applications handle
         R-LDH labels that are not valid A-labels, at least IMO and in
         the opinion of the mailing list the last two or three times we
         went through that topic.
   
   *  2.3.2.2. (Martin Duerst): NR-LDH-label and Internationalized
      Label: The section doesn't say anything about "Internationalized
      Label"s, although this term appears in the title. (the definition
      is in 2.3.2.3)
   
   *  2.3.2.3. (Martin Duerst): SVR record labels are not
      Internationalized labels, and therefore domain names used for SVR
      aren't IDNs. That's fine by me, but it should nevertheless be made
      clear (here or elsewhere) that IDNs can be used with SVR,... (this
      seems to be done at the end of 2.3.2.6, so this should be okay)
   
   *  2.3.2.4. (Martin Duerst): This seems to say that there is no
      equivalence between an all-lowercase A-label and an otherwise



Morfin                   Expires March 10, 2010                [Page 13]

Internet-Draft                  wgidnalc                  September 2009


      equal label where some letters (maybe accidentally) have been
      upper-cased. I think the cause of the problem is (as often in this
      document) the lack of consistent language. Instead of "and then
      testing for an exact match between the A-labels", say "and then
      testing for equivalence between the A-labels [using normal DNS
      matching rules]". If that's not what's intended, then some more
      background may be appropriate.
   
   *  2.3.2.5. (Martin Duerst): "a string of ASCII characters" -> "the
      string of ASCII characters"
   
   *  2.3.3. (Martin Duerst): "Because IDN labels may contain characters
      that are read, and preferentially displayed, from right to left,":
      Remove 'preferentially'.
      
      This maybe refers to some hopelessly broken systems, or to the
      fact that Arabic Braille is LTR, or something else, but is totally
      irrelevant and potentially misleading in this context.
   
   *  2.3.3. (Martin Duerst): Why doesn't this paragraph just refer to
      'logical' representation, a term that people who know bidi are
      familiar with and that's widely used in Unicode.
   
   *  2.3.3. (Andrew Sullivan): Also, as an aside, it would be helpful
      if defs pointed out more explicitly in its Para. 2.3.3 that there
      are BIDI-only terms defined in the BIDI document and not in defs.
   
         Author's current answer: I moved it to the beginning. I don't
         think there's anything bidi-specific in -defs.
   
   *  2.3.4. (Martin Duerst): "There has been some confusion about
      whether a "Punycode string" does or does not include the ACE
      prefix and about whether it is required that such strings could
      have been the output of the ToASCII operation":
      
      a) The combination of 'required' and 'could' doesn't make ANY
      sense for me.
      
      b) Is is unclear what "such strings" refers to (with ACE prefix?
      without ACE prefix?)
   
   *  2.3.4. (Martin Duerst): "much more clear" -> "much clearer"
   
   *  4. (Martin Duerst): There should be a very short paragraph saying
      that this section provides an overview and pointers into the
      security sections of the other documents. (or whatever else
      exactly the relationships are)
   
   *  4.1. (Martin Duerst): "In addition to characters that are



Morfin                   Expires March 10, 2010                [Page 14]

Internet-Draft                  wgidnalc                  September 2009


      permitted by IDNA2003 and its mapping conventions": Does this mean
      "In addition to characters that are permitted by (IDNA2003 and its
      mapping conventions)" or "In addition to characters that are
      permitted by IDNA2003 and [in addition to] its mapping
      conventions"? Please clarify.
   
   *  4.1. (Martin Duerst): "problems that might raise" -> "problems
      that might araise"
   
   *  4.1. "Security on the Internet partly relies on the DNS. Thus, any
      change to the characteristics of the DNS can change the security
      of much of the Internet." This sentence seems extremely confusing,
      as IDNA does not affect (change characteristics) the DNS but is
      rather built on the fact that they will not be changed.
   
   *  4.1. The same : "The security of the Internet is compromised if a
      user entering a single internationalized name is connected to
      different servers based on different interpretations of the
      internationalized domain name." The security of the Internet is
      not compromised, however, trust in the IDNA proposition might be.
   
   *  4.2. (Martin Duerst): "these specifications"? The IDNA2008
      collection of specifications? Or the specifications for the local
      character sets?
   
   *  4.2. (Martin Duerst): "(or different versions of one application)"
      -> "(or different versions or parts of one application)" (yes,
      this can and does happen)
   
   *  4.4. (Martin Duerst): "comparisons be done properly, as specified
      in the Requirements section of [IDNA2008-Protocol]": If
      comparisons are dealt with in Procotol, what's the purpose of
      2.3.2.4? And what's the purpose of trying to explain it all again
      just after the quoted sentence?
   
   *  4.5. (Martin Duerst): "Despite that prohibition, there are a
      significant number of files and databases on the Internet in which
      domain name strings appear in native-character form;": This makes
      it appear as if such files and databases are in violation of some
      spec. But they may simply contain IRIs instead of URIs. I would
      simply start the subsection with something like "As long as
      IDNA2003 labels have been kept in A-label form, the only
      differences in interpretation arise for characters whose ..." and
      then, in a new paragraph, continue "For IDNA2003 labels that have
      been kept in native encoding,..."
   
   *  4.7. The Summary might be considered adventurous? Corporations
      such as Nominum propose services that are supposed to protect the
      DNS. One of the purposes of ML-DNS is precisely to permit an



Morfin                   Expires March 10, 2010                [Page 15]

Internet-Draft                  wgidnalc                  September 2009


      architectural protection.
   
   *  Acknowledgments (Andrew Sullivan): "As is usual with IETF
      specifications, while the document represents rough consensus, it
      should not be assumed that all participants and contributors agree
      with all provisions," I don't feel comfortable with starting to
      make the Acknowledgements section a platform for disclaimers about
      WG consensus. I object pretty strongly to this addition. I don't
      think we're served well by trying to state in any document how
      rough the rough consensus is: the document either has to stand
      through the IETF process, or not. Besides, this evaluation is a
      prerogative of the Chair, the ADs, and the IESG. If this sort of
      disclaimer is needed, it ought to be added by the IESG (and even
      then I would object). I would like the sentence to be removed.
   
5.  IDNA Rationale
   
   *  1.1 (Andrew Sullivan): bugs me: "Traditionally, DNS labels are
      matched case-insensitively [RFC1034][RFC1035]. That convention was
      preserved in IDNA2003 by a case-folding operation that generally
      maps capital letters into lower-case ones. However, if case rules
      are enforced from one language, another language sometimes loses
      the ability to treat two characters separately. Case-sensitivity
      is treated slightly differently in IDNA2008." It makes it sound as
      though there's just a kooky tradition in the DNS, and we could fix
      that up. But that's not true: the matching rules are _defined_ to
      be case insensitive, so changing that would be a protocol change
      to DNS. Also, the text slides a little to easily between what are
      different contexts of "label" here, and I think it could be a
      source of confusion. I suggest this instead:
      
      The DNS matching rules for DNS labels are case-insensitive
      [RFC1034][RFC1035]. That convention was preserved for
      internationalized labels in IDNA2003 by a case-folding operation
      that generally maps capital letters into lower-case ones. However,
      if case rules are enforced from one language, another language
      sometimes loses the ability to treat two characters separately.
      Case-sensitivity is treated slightly differently in IDNA2008.
   
   *  1.3.1. DNS "Name" Terminology. "" would be better read as
      "orthotypographic" as an orthographic error that can be a way to
      lose some special semantics differences due to orthotypographic
      conventions.
   
   *  1.3.2. "IDNA-landr" typo?
   
   *  1.4. "Reduce the dependency on mapping, in order that the
      pre-mapped forms (which are not valid IDNA labels) tend to appear
      less often in various contexts, in favor of valid A-labels." calls



Morfin                   Expires March 10, 2010                [Page 16]

Internet-Draft                  wgidnalc                  September 2009


      for the Charter to be revised. ALternatively, it could say ,
      remove dependence on mapping as per a mapping document, in which
      this document would include a section on the various ways to
      ensure DNS security and the barring of some U+codes in some
      presentations.
   
   *  1.5. "This model has served the existing applications well, but it
      requires, with or without internationalized domain names, that
      users know the exact spelling of the domain names that are to be
      typed into applications such as web browsers and mail user agents.
      The introduction of the larger repertoire of characters
      potentially makes the set of misspellings larger, especially given
      that in some cases the same appearance, for example on a business
      card, might visually match several Unicode code points or several
      sequences of code points." may be read as if the users of these
      languages were more prone to errors than ASCII language.
   
   *  "If an application wants to use non-ASCII characters in public DNS
      domain names, IDNA is the only currently-defined option." IDNA is
      not a DNS option. It is an application way to transcode Unicode
      domain names in LDH domain names for the convenience of ASCII
      oriented international managers. The idea is to attain the
      adherence of local users and managers to IDNA and not to impose
      ASCII on them. DNS is UTF-8 compatible.
   
   *  "IDNA2008 divides all possible Unicode code-points into four
      categories: PROTOCOL-VALID, CONTEXTUAL RULE REQUIRED, DISALLOWED
      and UNASSIGNED.
      
      3.1.1. PROTOCOL-VALID
      
      Characters identified as "PROTOCOL-VALID" (often abbreviated
      "PVALID") are permitted in IDNs." Are we talking of code-points or
      of characters?
   
   *  3.1.2.1 Not in the TOC
   
   *  3.1.3 Disallowed "various HEART symbols" - is U+38FA also
      disallowed? or U+3966?
   
   *  3.1.3. This is the first time anyone has spoken of NFKC. In IDNA
      Defs and other cases, it is NFC. Shouldn't t both of them be
      documented? Shouldn't someone explain in which specific case one
      is used?
   
   *  "The character is an upper-case form or some other form that is
      mapped to another character by Unicode casefolding." this seems to
      create a very large mapping scheme that depends on a
      non-documented Unicode system needing correction (at least when it



Morfin                   Expires March 10, 2010                [Page 17]

Internet-Draft                  wgidnalc                  September 2009


      does not specifically support majuscules). Moreover, are we
      dealing with characters (that are orthogonal to Unicode) or with
      codepoints that represent characters and that are subject to
      Unicode casefolding. The proposition is to: (1) clarify the
      character/codepoint issue, (2) explain what Unicode case folding
      is and its limitations, (3) move them to CONTEXTO when these
      codepoints are both used as upper-cases and as majuscules, (4)
      explain that majuscules that are supported by upper-cases will be
      transcoded by punycode.
   
   *  4.4. Case mapping. One may regret that the French majuscules
      current support of Unicode, which isperfectly adequate in other
      circumstances yet inadequate in this case, is not discussed. This
      would explain the upgrade above.
   
         Author's current answer: First, you want certain characters
         treated differently for some languages that use a given script
         than for others that use the same script. That is nearly
         impossible to think about, just because there is, in general,
         no way to know what language a particular label is supposed to
         be associated with, nor is there a way to know what top-level
         domain has the label in one of its subtrees (even if one could
         reliably associate top-level domains with languages).
         
         -- Second, if I understand your latest note correctly, you
         would like to have those characters treated via some contextual
         rule ("CONTEXTO"). But the contextual rules yield either
         "valid" or "invalid" based on adjacent or nearby characters --
         they do not provide different mappings, nor different rules for
         different languages (the latter at least partially for the
         reasons above).
         
         -- And, finally, your suggestion requires treating capital
         letters (or at least some capital letters) as distinct from
         their lower-case forms, which would create massive
         inconsistencies with IDNA2003 (not just the two characters of
         inconsistency with which we have have had such extensive
         debates) as well as inconsistencies with DNS and host table
         practices that go back to the 1970s. No matter how strong your
         justification, and even if it were not also tied to
         differential treatment for a particular language, I cannot
         imagine the WG (or the IETF more broadly) agreeing to that
         change.
   
   *  4.5. "Examples of this are Yiddish, written with an extended
      Hebrew script, and Dhivehi (the official language of Maldives)
      which is written in the Thaana script (which is, in turn, derived
      from the Arabic script)" It seems that some explanation about
      Yiddish would be welcome so that the language will obtain the same



Morfin                   Expires March 10, 2010                [Page 18]

Internet-Draft                  wgidnalc                  September 2009


      support as Dhivehi and Thaana.
   
   *  5. "Conversely, lookup applications are expected to reject labels
      that clearly violate global (protocol) rules (no one has ever
      seriously claimed that being liberal in what is accepted requires
      being stupid)." The remark between the parentheses is confusing:
      it possibly qualifies as "stupid" a behavior that is not
      recommended, but that is acceptable by the document set.
   
   *  "Application implementors should be aware that where DNS wildcards
      are used, the ability to successfully resolve a name does not
      guarantee that it was actually registered." In which terms is this
      specific to IDNA?
   
   *  7.6. The Symbol Question. That part actually discusses the Unicode
      originated difficulties. Yet, the choice of Unicode has not yet
      been discussed.
   
   *  8.2. (Andrew Sullivan) the reference for DNSSEC should probably
      refer to DNSSECbis, which is usually [RFC4033], [RFC4034],
      [RFC4035], since RFC2535 is obsolete. One can make a strong
      argument for also including NSEC3 [RFC5155], but I don't feel too
      strongly about that.
   
   *  9. "Adding languages (or similar context) to IDNs generally, or to
      DNS matching in particular, would imply context dependent matching
      in DNS, which would be a very significant change to the DNS
      protocol itself". This sentence seems confusing. Natural languages
      are quoted throughout this IDNs document.
   
   *  12. (Andrew Sullivan): Here is where I make my now-familiar plea
      for the removal of the following sentence "As is usual with IETF
      specifications, while the document represents rough consensus, it
      should not be assumed that all participants and contributors agree
      with all provisions."
   
6.  IDNA Mapping
   
   *  (Andrew Sullivan): In general, I think the document is in
      reasonably good shape. I was hoping, however, for it to have some
      advice to registries on what sort of considerations are worth
      taking into account when formulating policies around what will and
      will not be registered.
      
      In particular, I think it would be very helpful to outline what it
      would mean for characters to be possibly mappable, and also to
      outline the different strategies that are available for preventing
      different alternative mappings from ending up with a different
      resolution.



Morfin                   Expires March 10, 2010                [Page 19]

Internet-Draft                  wgidnalc                  September 2009

      
      The basic advice is that registries should try to detect whether a
      candidate label may be expected to be the result of some
      (plausible) application of a mapping appropriate in the probable
      user community, and similarly that when faced with a candidate
      IDN, registries should consider the probable user community and
      consider the plausible applications of mappings appropriate to
      that community. If the registry does not have the expertise to
      evaluate the probable user community for a given code point, then
      it should simply reject the code point outright. (This is, in
      effect, the advice that you should have a policy, it should be a
      policy based on knowledge of the use cases, and that it should
      default closed.)
      
      After the registry has detected whether the candidate label, there
      are two basic strategies it might follow:
      
      1. Detect and reserve. In this case, the registry detects
      potential mappings, and reserves other candidate labels that might
      be the result of such mappings. This reservation takes the form of
      preventing registration of that label.
      
      2. Detect and bundle. In this case, the registry detects the
      potential mappings, and creates identical entries in the registry
      conforming to those "alternative forms" of the candidate label.
      There is the potential for a very large number of these bundled
      labels.
   
   *  (Andrew Sullivan): There's also a great deal of material at the
      end of rationale that would be more appropriate in this document
      (or else it should just go away, I think).
   
   *  (Andrew Sullivan): I think the document is ready to go, assuming
      this is what we want to say. Yet its content is a little flabby as
      recommendations go. I can imagine a reader being a little
      surprised at this advice, for example: "These are a minimal set of
      mappings that an application should strongly consider doing. Of
      course, there are many others that might be done." That boils down
      to, "You might want to do this. Or not. Or something else. Up to
      you." I know why we're saying this, but I would not be surprised
      if people object to such thin advice. If that's all the advice we
      want to offer, however, this is the right document and it should
      go ahead.
   
   *  Not sure that the terminology of "make sense" is adequate or
      clear.
   
   *  1. Introduction - This document is supposed to be separated from
      the IDNA document set. It should then document what the IDNA
      protocol is. It seems that the IDNA2008 protocols boil down to



Morfin                   Expires March 10, 2010                [Page 20]

Internet-Draft                  wgidnalc                  September 2009


      "DNS domain names are to be expressed in LDH form. IDNA is a
      commonly agreed upon convention wherein if they are entered by the
      user in another form, applications are advised to convert them to
      UTF in order to filter and map them, as is discussed in the
      present document, as well as to transcode them in by using the
      punycode algorithm. Depending on the Registry policy, their
      registration can be carried out in the ITF and/or the transcoded
      ASCII form."
   
   *  2.3. NFC is confirmed, NFKC is not discussed.
   
7.  IDNA Protocol
   
   *  As a general comment:
      
      *  The SHOULD/MUST chains may be somewhat awkward. MUSTs are used
         in a protocol procedure and then an alternative to that
         procedure is pragmatically considered. It could be of interest
         to draft a MUST tree to consider which cases are, or are not,
         covered.
         
      *  there is some confusion as to what the "string" is compared to
         the label and domain name, in which "Label" may be used instead
         of "U-Label" or sometimes "A-Label". Wouldn't it be better to
         review the text in qualifying the "labels" in order to be
         certain that all the cases are clearly covered?
   
   *  1.para 1: (Martin Duerst): I missed the term IDNA2003 in Defs, it
      would have been useful. I didn't complain because I thought it had
      been 'deprecated'. Seeing it here, I think it should go back to
      Defs, and be actively used in Defs and Protocol at least, to
      simplify and clarify prose.
   
   *  1.para 2: (Martin Duerst): "does not changes" -> "does not change"
   
   *  1.para 2: (Martin Duerst): "IDNA does not depend on any changes to
      DNS servers, resolvers, or protocol elements" -> "IDNA does not
      depend on any changes to DNS servers, resolvers, or DNS protocol
      elements" or "IDNA does not depend on any changes to DNS (servers,
      resolvers, or protocol elements)" (Otherwise, it's possible to
      understand 'protocol elements' as being not limited to DNS.)
   
   *  1.para 4: (Martin Duerst): ", that share some terminology,
      reference data and operations." -> "These two protocols share
      terminology, reference data and operations."
   
   *  2.para 1: (Martin Duerst): "Terminology used in IDNA, but also in
      Unicode or other character set standards and the DNS, appears in
      [IDNA2008-Defs]." -> "Terminology used in IDNA appears in



Morfin                   Expires March 10, 2010                [Page 21]

Internet-Draft                  wgidnalc                  September 2009


      [IDNA2008-Defs]." (where else these terms are used, or where they
      are from, can be explained in Defs where necessary, but is
      absolutely irrelevant here)
   
   *  2.para 1: (Martin Duerst): "Terminology that is required as part
      of the IDNA definition, including the definitions of "ACE",
      appears in that document as well." -> remove (first, the word
      'definition' is used with two slightly different meanings, and
      second, I don't see the point of singling out "ACE".)
   
   *  3.1. Requirement 2: (Martin Duerst): Equivalence is already
      defined in Defs. Please make sure there is only a single
      definition.
   
   *  3.1. Requirement 2: (Martin Duerst): Why is it a MUST that
      U-Labels are compared without case-folding (even for ASCII chars?)
      or other steps?
   
   *  3.1. Requirement 2: (Martin Duerst): "In many cases, validation
      may be important for other reasons and SHOULD be performed.": Is
      this restricted to when trying to compare? Or in general?
   
   *  3.1. Requirement 3: (Martin Duerst): This does double duty, and
      should be removed. The alternative is to covert 3.1 into a
      conformance section as usual e.g. for ISO standards, but then a
      lot more rewriting will be necessary in all of 3.1.
   
   *  3.2. "It does not apply to domain name slots which do not use the
      Letter/Digit/Hyphen (LDH) syntax rules." Confusing. Would some of
      the DN slots not accept both?
   
   *  3.2. para 1: (Martin Duerst): "IDNA applies to": What does
      "applies to" mean?
   
   *  3.2. para 2: (Martin Duerst): "Because it uses the DNS, IDNA
      applies" -> "Because IDNA uses the DNS, it applies", or even
      better "Because IDNA uses the DNS, IDNA applies" (repetitions
      don't hurt in standards, reference before referent does)
   
   *  3.2. para 2: "unless those protocols and implementations of them"
      -> "unless those protocols and their implementations"
   
   *  3.2. para 2:(Martin Duerst): "be aware of IDNs in Unicode" -> "be
      aware of IDNs" (whether they are in Unicode or not is irrelevant
      here)
   
   *  3.2.1. The word CLASS only appears in the whole document set in
      two sentences: "DNA applies only to domain names in the NAME and
      RDATA fields of DNS resource records whose CLASS is IN. See RFC



Morfin                   Expires March 10, 2010                [Page 22]

Internet-Draft                  wgidnalc                  September 2009


      1034 [RFC1034] for precise definitions of these terms. The
      application of IDNA to DNS resource records depends entirely on
      the CLASS of the record, and not on the TYPE except as noted
      below."
      
      What about internationalized domain name in a non IN CLASS?
   
         Author's answer: I have received no further input on this and
         will assume that the current text is ok unless I do.
   
         Additional CommentMy reading of that text was that it was a
         _restriction_, not a claim of fact. In other words, I
         interpreted that text as saying that, if a new class is
         invented, IDNA (if it were specified) would need to be
         specified separately for that class.
         
         By definition, the only place domain name labels can appear is
         in the NAME and RDATA fields of resource records. It's
         important to remember that the RDATA field can have subfields
         (as it does in the SOA record). I think that's clear enough
         from the rest of the discussion, and because the documents
         already say, "You need to understand DNS too."
   
   *  3.2.1. (Paul Hoffman): Section 3.2.1 says: "IDNA applies only to
      domain names in the NAME and RDATA fields of DNS resource records
      whose CLASS is IN." It would be good for the DNS-centric folks in
      the WG to verify that they think that this restriction is correct.
      Are there really no other fields where domain labels would appear?
   
   *  3.2.1. para 1: (Martin Duerst): The first paragraph reads as if
      IDNA applied to domain names in e.g. TXT records in CLASS IN. I
      think it would help here to say exactly what is meant by "IDNA
      applies". In some sense, IDNA applies nowhere in DNS records, they
      are all just ASCII. In some sense (labels starting with xn-- are
      presumed to be IDNA labels; you can add an IDN (or a label
      thereof) to a DNS record by using A-labels), IDNA applies.
   
   *  3.2.1. para 2: (Martin Duerst): The SVR discussion has significant
      overlap with Defs, please reduce.
   
   *  4. "This section defines the procedure for registering an IDN. The
      procedure is implementation independent; any sequence of steps
      that produces exactly the same result for all labels is considered
      a valid implementation." A procedure does provide but does not
      define a result?
   
   *  4. para 1:(Martin Duerst): "defines *the* procedure" ... : This
      would work better if there were really only one procedure, and it
      were written as a procedure. However, there are often variations,



Morfin                   Expires March 10, 2010                [Page 23]

Internet-Draft                  wgidnalc                  September 2009


      and different, often non-procedural ways in which things are
      expressed (e.g. 'labels must ...' instead of 'if a label doesn't
      satisfy x, abort')
   
   *  4. para 2: (Martin Duerst): "the registration and lookup protocols
      (Section 5)" -> "the registration protocol (this section) and the
      lookup protocol (Section 5)" (shortcuts are the enemies of
      specifications)
   
   *  4. para 2: (Martin Duerst): "while ... are very similar in most
      respects, they are different" -> "while ... are very similar in
      most respects, they are not identical"
   
   *  4. para 2: (Martin Duerst): "follow the appropriate steps":
      appropriate appeals to value judgement, which isn't adequate here.
   
   *  4.1. The obligation chain reads: "By the time a string enters the
      IDNA registration process [], it is expected to be in Unicode []",
      yet "registries [] SHOULD avoid any possible ambiguity by
      accepting registrations only for A-labels []."
   
   *  4.1. (Andrew Sullivan): "By the time a string enters the IDNA
      registration process as described in this specification, it is
      expected to be in Unicode and MUST be in Unicode Normalization
      Form C (NFC [Unicode-UAX15])." The "expected to be" part is
      redundant, since we have a subsequent MUST.
   
   *  4.1. (Andrew Sullivan): I found this slightly confusing: "The
      registry SHOULD permit submission of labels in A-label form and is
      encouraged to accept both the A-label form and the U-label one. If
      it does so,". The "does so" reference there is ambiguous: is it
      the submission of A-labels or the A-label+U-label case. The
      subsequent text suggests it's the former.
   
   *  4.1. title: (Martin Duerst): Why suddenly "Process" instead of
      "Procedure"? Why not just "Input"? And why singular in the title,
      and then plural in the first line of the text?
   
   *  4.1.(Martin Duerst): "are outside the scope of these protocols":
      How many protocols are there? Only one that's relevant here.
   
   *  4.1. (Martin Duerst): Why is NFC a condition on the input? Please
      make it a validation step afterwards, to streamline things.
   
   *  4.1. (Martin Duerst): "Entities responsible for zone files
      ("registries") are expected to accept only the exact string for
      which registration is requested, free of any mappings or local
      adjustments.": It's clear to me what we want here, but it's much
      better to write this as a condition on the later processing,



Morfin                   Expires March 10, 2010                [Page 24]

Internet-Draft                  wgidnalc                  September 2009


      rather than on input, something like: "Entities responsible for
      zone files ("registries") MUST NOT apply any mappings or local
      adjustments of any kind to the exact string for which registration
      is requested."
   
   *  4.1. (Martin Duerst): "They SHOULD avoid any possible ambiguity by
      accepting registrations only for A-labels, possibly paired with
      the relevant U-labels so that they can verify the correspondence."
      This has to be improved. First, the SHOULD doesn't belong on the
      reason, and the reason, if anywhere, belongs at the end. Second,
      there are three possible ways input can come in, so let's list
      things up: "Entities responsible for zone files ("registries") MAY
      accept input in any of three forms:
      
      1) As a pair of A-label and U-label
      
      2) As an A-label only
      
      3) As an U-label only.
      
      1) and 2) are RECOMMENDED because the use of A-labels avoids any
      possibility for ambiguity. (the first sentence in 4.2.1 can then
      be removed)
   
   *  4.2.1. (Martin Duerst): This is a complex jungle of conditions on
      input, conversions,... What should be done is:
      
      a) extract the 'raw' (without any preconditions) U-label->A-label
      and A-label->U-label 'functions' into subsections e.g. in Section
      3; these will serve as building blocks both in Section 4 and
      Section 5.
      
      b) As the first step of the registration procedure, make sure we
      have both an A-label and an U-label. One way to write this is:
      
      "4.2.1: Preprocessing
      
      1) If the input contained an A-label and a U-label, check that
      they are equivalent (or whatever that was called; the conditions
      are somewhere in Defs). If the check fails, abort registration.
      
      2) If the input contained an A-label, but no U-label, calculate
      the U-label according to @@@.
      
      3) If the input contained an U-label, but no A-label, calculate
      the A-label according to @@@."
      
      The above makes sure we have both an A-label and an U-label from
      here on. Checking on these can be performed independently (e.g.



Morfin                   Expires March 10, 2010                [Page 25]

Internet-Draft                  wgidnalc                  September 2009


      length check on A-label, NFC check on U-label). Conversion to
      punycode is no longer needed in 4.4, because we simply put the
      A-label we have now into the zone (assumed we have passed all the
      checks up to here, of course).
   
   *  4.2.1. (Martin Duerst): (probably not needed anyway) "both the
      A-label form and the U-label one" -> "both the A-label form and
      the U-label form"
   
   *  4.2.1. (Martin Duerst): Word the A-label checks more clearly, and
      create section "4.2.2 A-label Validation"
   
   *  4.2.3.2. "a combining mark or combining character" -> "a combining
      character" (combining marks are a special case of combining
      characters, and as such irrelevant here)
   
   *  4.2.3.3. (Martin Duerst): "To check this, each code-point marked
      as CONTEXTJ and CONTEXTO in [IDNA2008-Tables] MUST have a non-null
      rule." Is this a requirement on Tables? Are there "null rules"?
      What purpose do they serve, what's the difference between them and
      DISALLOWED?
   
   *  4.2.3.4. (Martin Duerst): What are "characters written from right
      to left"? Either we define this clearly here, or we leave it (or
      put it) in Bidi, but then we have to rewrite the sentence here
      (just requiring conformance to the conditions in Bidi).
   
   *  4.2.4. (Martin Duerst): This is totally unnecessary, please
      remove. If we need a summary for what's essentially just about a
      page of text, we better give up.
   
   *  4.3. Registry restriction inheritance is not alluded to.
   
   *  4.3. (Martin Duerst): "Policies are likely to be informed by the
      local languages" -> "Policies are likely to be informed by the
      local scripts and languages" (IDNs are mostly a script issue, much
      less a language issue. ICANN has fixed their documents to avoid
      only talking about languages (they still could move a bit further
      to scripts), so let's not commit the same mistake here again.)
   
   *  4.3. (Martin Duerst): "or the application of special restrictions
      to others": like what? Like that such a label can only be resolved
      on Tuesdays?
   
   *  4.4. (Martin Duerst): The generic parts of the conversion need to
      go somewhere else (Section 3?). The actual conversion (or
      checking) needs to go at the start of 4.2. Then this section is
      empty and can be removed.
   



Morfin                   Expires March 10, 2010                [Page 26]

Internet-Draft                  wgidnalc                  September 2009


   *  4.5. (Martin Duerst): "The A-label is registered in the DNS by
      insertion into a zone." -> "The label is registered in the DNS by
      inserting the A-label into a zone." (distinguish registration of
      the abstract thing from insertion of the concrete thing)
   
   *  5. Does this repetition (already in IDNA Rationale) "the presence
      of wild cards in the DNS might cause a string that is not actually
      registered in the DNS to be successfully looked up." reflect what
      the BIDI documents slightly differently: "Wildcards create the odd
      situation where a label is "valid" (can be looked up successfully)
      without the zone owner knowing that this label exists. So an owner
      of a zone whose name starts with a digit and contains a wildcard
      has no way of controlling whether or not names with RTL labels in
      them are looked up in his zone."
   
   *  5. (Martin Duerst): para 2: " The two steps described in Section
      5.2 are required.": Superfluous. Make sure there's a MUST at the
      right place in that section. (Looking at 5.2, I have no clue what
      the two steps should be. This shows that indirect requirements
      like the above are rather unhelpful.)
   
   *  5.1.(Martin Duerst): first paragraph: Although IDNs will often get
      extracted from IRIs or URIs, there are many cases where these
      constructs are not involved. Examples would be telnet or ping
      commands, and so on. So IRIs and URIs should be deemphasized more.
   
   *  5.1.(Martin Duerst): "Processing in this step and the next two are
      local matters, to be accomplished prior to actual invocation of
      IDNA.": Again, which steps? Before, we supposedly had two steps in
      5.2, now it looks as if we are talking about 5.2 and 5.3 as two
      steps. -> Create a subsection such as "Input preparation" or what
      where all the preliminary stuff goes in. Alternatively, talk about
      subsections, with subsection numbers for clear identification.
   
   *  5.2. The case of a character that is not supported by Unicode is
      not discussed.
   
   *  5.2. (Martin Duerst): "is not already Unicode" -> "is not already
      in Unicode" (in parallel to 'into' in the line before)
   
   *  5.2. (Martin Duerst): "A Unicode string may require normalization
      as discussed in Section 4.1.": There is no "discussion" in 4.1
      (and no need for discussion). Express the requirements here
      independently of Section 4.
   
   *  5.3. (Wil Tan): section 5.3, A-label Input section to add the
      lowercasing step prior to using the Punycode decoding algorithm.
      The section on symmetry constraint (-defs-10, section 2.3.2.1)
      should also have similar wordings.



Morfin                   Expires March 10, 2010                [Page 27]

Internet-Draft                  wgidnalc                  September 2009

   
   *  5.3. (Martin Duerst): (just checking) "See the Name Server
      Considerations section of [IDNA2008-Rationale] for additional
      discussion on this topic.": From the context, Name Server doesn't
      look related (we are client-side here).
   
   *  5.3. (Martin Duerst): "That conversion and testing SHOULD":
      Replace 'That' with something clearer and more precise.
   
   *  5.3. (Martin Duerst): para 2: List up the alternatives that are
      possible. Avoid mishmash textual paragraphs.
   
   *  5.4. The use of "U-Labels" in this part instead of "Labels" would
      probably clarify it.
   
   *  5.4. (Paul Hoffman): Section 5.4 assumes that an application knows
      the version of Unicode that is being used in the application. We
      should state that assumption in 5.4 or maybe further up near the
      beginning of section 5.
   
         Author's answer: It assumes that either the application or the
         operating system or library support keeps the two consistent.
         That range of options is the reason why Section 5.4 is not more
         explicit about which particular software elements or modules
         know what. If this is to be changed, I need suggestions about
         textual fixes that do not imply that the knowledge must be in
         the application itself.
   
         Further comment: The first bullet in 5.4 is the first time that
         "version of Unicode" is mentioned, so the note is probably most
         effective right there. I propose adding:
         
         This requirement means that the application must use a list of
         unassigned characters that is matched to the version of Unicode
         that is being used for the other requirements in this section.
         It is not required that the application know which version of
         Unicode is being used; that information might be part of the
         operating environment in which the application is running.
   
   *  5.4. (Paul Hoffman) The paragraph in section 5.4 that starts "This
      test may..." is out of date because the rules in the Bidi document
      no longer do inter-label checking. The whole paragraph can be
      removed.
   
   *  5.4. (Paul Hoffman) In the light of this, does the WG want to
      change the requirement level for checking Bidi on lookup from
      SHOULD to MUST? Given the above, I see no reason why not.
   
   *  5.4. "applying the test is likely to give much better information
      about the reason for a lookup failure -- information that may be



Morfin                   Expires March 10, 2010                [Page 28]

Internet-Draft                  wgidnalc                  September 2009


      usefully passed to the user when that is feasible -- than DNS
      resolution failure information alone" might this lead to the idea
      that they could also be carried in case of the failure to better
      document it?
   
   *  5.4. "For all other strings, the lookup application MUST rely on
      the presence or absence of labels in the DNS to determine the
      validity of those labels and the validity of the characters they
      contain". Is it correct to assume that the first labels stand for
      "A-Label" and the second one stands for "their corresponding
      U-Labels"?
   
   *  5.4. (Martin Duerst): para 1: Mishmash again. Most of this para is
      best removed.
   
   *  5.4. (Martin Duerst): para 1: "Putative labels": Both in Section 4
      and 5, labels are for the most part putative, because they don't
      conform to the definitions unless checked. Either before section
      4, or once at the start (Input subsection) of both section 4 and
      section 5, say that for the most part, we are dealing with
      putative labels, but 'putative' isn't repeated all the time to
      make the text easier to read.
   
   *  5.4. (Martin Duerst): page 12: Finally a bullet list. I almost
      thought that the author didn't know how to create bullet lists, or
      was of the opinion that bullet lists don't have a place in spec.
      Quite to the contrary, please make sure there are much more bullet
      lists. It will make everything much easier to read and clearer.
   
   *  5.4. (Martin Duerst): "Labels that are not in NFC form as defined
      in [Unicode-UAX15].": There is only one definition of NFC, but the
      sentence suggests there are several. Please change to "Labels that
      are not in NFC [Unicode-UAX15]."
   
   *  5.4. (Martin Duerst): Please move bullet 1 (UNASSIGNED) and bullet
      4 (DISALLOWED) and all the other table-related bullets together. I
      think it's best to put UNASSIGNED last (and mention that this is
      the category most subject to change).
   
   *  5.4. (Martin Duerst): Streamline the wording used to refer to
      Tables and a category. Currently, we have: "in the UNASSIGNED
      category of [IDNA2008-Tables]" - "in the "DISALLOWED" category in
      the permitted character table [IDNA2008-Tables]" that are
      identified in [IDNA2008-Tables] as "CONTEXTJ"
   
   *  5.4. (Martin Duerst): "Labels whose first character is a combining
      mark (see Section 4.2.3.2).": Refer directly to the relevant
      Unicode definition, rather than to section 4.2.3.2 (which contains
      a MUST, which is already implicit here).



Morfin                   Expires March 10, 2010                [Page 29]

Internet-Draft                  wgidnalc                  September 2009

   
   *  5.4. (Martin Duerst): "In any event, lookup applications should
      avoid attempting to resolve labels that are invalid under that
      test.": Remove. We already have a SHOULD, no need for a should on
      top of that.
   
   *  5.4. (Martin Duerst): last para: I assume this is e.g. about
      labels with mixed scripts,... What it essentially seems to say is
      that a browser may warn users if it detects mixed scripts, but if
      the user still wants to see the page, s/he is entitled to it. In
      such a context, the word 'validity' seems quite a bit out of
      place; it would be better to speak about 'other tests' or some
      such in a more general way.
   
   *  5.5. (Martin Duerst): para 1: "using the Punycode algorithm (with
      the ACE prefix added)": The parenthetical seems to suggest that
      addition or not of the ACE prefix is an (optional) part of the
      Punycode algorithm, but RFC 3492 does not define the prefix, nor
      is the additon of the prefix part of the punycode algorithm. ->
      Convert parenthetical to a clause or sentence ("... and then
      adding the ACE prefix." or so).
   
   *  5.5. (Martin Duerst): rest from second sentence in para 1: As said
      in my comments on Section 4, a summary is unnecessary. Also, it
      has nothing to do with punycode conversion. In addition, the
      second bullet point is confusing, because an A-label (checked or
      not) cannot be punycode-converted again. -> remove
   
   *  5.6. (Martin Duerst): "That ... string" -> "The string resulting
      from the conversion in Section 5.5"
   
   *  5.6. (Martin Duerst): "That lookup" -> "The lookup"
   
   *  5.7. (Martin Duerst): What about (streamlined): Security
      Considerations for this version of IDNA are described in
      [IDNA2008-Defs], except for the special issues associated with
      right to left scripts and characters, which are discussed in
      [IDNA2008-BIDI].
   
   *  7. IANA Considerations - There is no commitment from UNICODE to
      not update those Unicode documents that are accepted as normative
      in the IDNA documentation set. Should their copy at the time of
      the publication of this set not be stored by the IANA?
   
   *  8. (Andrew Sullivan): I suggest "This second-generation version
      would not have been possible without the work that went into that
      first version, due to its authors ." Or something like that.
   
   *  8./9.(Martin Duerst): These should be merged. The text explains it
      all.



Morfin                   Expires March 10, 2010                [Page 30]

Internet-Draft                  wgidnalc                  September 2009

   
   *  8. (Martin Duerst): "Hoffman and Costello ... should not be held
      responsible for any errors or omissions.": Remove, this is
      implicitly clear, in the end it's the WG and the IETF that's
      responsible. Similar for "As is usual with IETF specifications,
      while the document represents rough consensus, it should not be
      assumed that all participants and contributors agree with all
      provisions."
   
   *  9. (Andrew Sullivan): I object very strongly to the inclusion of
      the sentence, "As is usual with IETF specifications, while the
      document represents rough consensus, it should not be assumed that
      all participants and contributors agree with all provisions."
      Rough consensus is always rough on everyone, but if you are a
      participant who urges this sentence on the product of the WG, I
      ask you to reconsider. It is unworthy of your effort and the
      efforts of your colleagues. It would be better to have an outright
      flamewar on the mailing list than to have that sort of
      not-with-a-bang-but-a-whimper remark live forever in the WG
      output. If we as a WG really have such deep disagreements that we
      have to send drafts with this sort of disclaimer to the IESG, I
      feel pretty uncomfortable that the WG has in fact reached
      consensus.
   
   *  References. (Martin Duerst): [Unicode-RegEx], [Unicode-Scripts],
      [Unicode-UAX15] (and maybe others): Unicode data files don't have
      explicit authors, but Unicode TRs (and similar stuff) has
      authors/editors, same as RFCs. Please don't drop this information.
   
8.  IDNA BIDI
   
   *  (Paul Hoffman): This draft is not yet ready for publication. The
      text is still very confusing about the relationship between this
      document and RFC 3454. In many places, the wording makes it sound
      like this new algorithm is a replacement for that in RFC 3454,
      which it is not: it is the algorithm to use with IDNA2008. I would
      be happy to do a thorough edit to remove this ambiguity and make
      it clear that, while the algorithm is an improvement on the old
      one, it does not "change" or "fix" or "replace" the old one.
   
         Author's current answer: It changes the old definition in
         exactly the same meaning of the word "change" as the way
         IDNA2008 changes IDNA2003.
   
         Author's current answer: I'll send you the XML under separate
         cover; if you can make the changes you feel are needed, I can
         see if I agree with them.
   
   *  (Paul Hoffman) The terminology section needs to define "the end of
      the label". Tests 3 and 6 of section 2 are confusing without some



Morfin                   Expires March 10, 2010                [Page 31]

Internet-Draft                  wgidnalc                  September 2009


      definition.
   
         Author's current answer: Will add "beginning" and "end" to the
         paragraph that says "<t>In this memo, we use "network order" to
         describe the sequence of characters as transmitted on the wire
         or stored in a file; the terms "first", "next", "previous",
         "before" and "after" are used to refer to the relationship of
         characters and labels in network order.</t>
   
   *  Abstract, and potentially elsewhere. (Martin Duerst): Avoid the
      word 'new'. RFCs are archival documents.
   
   *  1.1. Advisable or not to specify "when U-labels" instead of
      "labels" ?
   
         Author's current answer: The first paragraph says that the
         document is about U-labels. This should not need repeating.
   
   *  1.1. (Martin Duerst): para 2: "When labels satisfy the rule, and
      when certain other conditions are satisfied, they can be used with
      a minimal chance of these labels being displayed in a confusing
      way by a bidirectional display algorithm.": "they" .. "these
      labels" is confusing. What about "When labels satisfy the rule,
      and when certain other conditions are satisfied, there is only a
      minimal chance that these labels will be displayed in a confusing
      way by a bidirectional display algorithm."
   
   *  1.1. (Martin Duerst): "A bidirectional display algorithm": How
      many of them do we have? (I only know one, the Unicode one (with
      some minor variants)). How many of them have been used for
      testing/verification?
   
   *  1.1. (Martin Duerst): para 3: what exactly is a "right-to-left
      character"?
   
   *  1.2. (Andrew Sullivan): "While the document proposes completely
      new text, most reasonable labels that were allowed under the old
      criterion will also be allowed under the new criterion, so the
      operational impact of the rule change is limited." It would be
      nice here, I suggest, to offer some definition of what labels fall
      outside "most reasonable labels". The description sounds too much
      like, "The labels we picked when we wrote this," which is
      indubitably not the impression anyone intended.
   
         Author's current answer: I added some examples (mixtures of
         numerals, and AN inside LTR labels). Hope it helsp.
   
   *  1.2. (Martin Duerst): This section ideally should also be moved to
      after Section 2.



Morfin                   Expires March 10, 2010                [Page 32]

Internet-Draft                  wgidnalc                  September 2009

   
   *  1.2. (Martin Duerst): para 1: "The IDNA specification
      "Stringprep"": change to something like "Stringprep, part of
      IDNA2003". Otherwise, it's not clear that this is an old spec.
   
   *  1.2. (Martin Duerst): para 4: "However, this makes certain words"
      -> "However, this made certain words" (past tense)
   
   *  1.2. (Martin Duerst): para 7: "While the document specifies rules"
      -> "While this document specifies rules"
   
   *  1.2. (Martin Duerst): para 7: "(the most important being label
      that mix Arabic and European digits (AN and EN) inside an RTL
      label, and labels that use AN in an LTR label)": Very weird. Such
      cases may not be completely impossible, but they are much less
      frequent than e.g. Arabic numbers inside Arabic letters, European
      numbers inside Arabic letters, and so on. There was even a strong
      movement to prohibit number mixing at the protocol level; this
      would never have happened if such mixing would have been deemed to
      be "most important". Also, after looking at the actual conditions,
      we either have an RTL label, which by condition 4 excludes mixing
      EN and AN, or we have an LTR label, which by condition 5 excludes
      AN and therefore the mixture of EN and AN.
   
   *  1.3. (Martin Duerst): title: "Layout" -> "Structure" or
      "Organization"
   
   *  1.3. (Martin Duerst): para 1: Change from "bidi test" to "bidi
      rule". (or unify otherwise)
   
   *  1.3. (Martin Duerst): para 1: ", that" -> ", which"
   
   *  1.3. (Martin Duerst): para 1: "no matter what the direction of the
      label is": What does this mean? It could either mean that you can
      apply the test forwards or backwards, or it could mean that it
      doesn't depend on what directionality the characters in the label
      have, or whatever. In the later case, I'd write e.g.: "This test
      [->rule, see above and below] can be applied to any kind of label,
      but becomes trivial if the input is guaranteed to contain only LTR
      characters."
   
   *  1.3. (Martin Duerst): "The primary initial use of that test":
      "that test" -> "this test" (this sentence talks about relationship
      with other documents, so it's the test in this document, not the
      test in that other section)
   
   *  1.3. (Martin Duerst): para 2: "a BIDI rule" -> "the BIDI rule"
   
   *  1.3. (Martin Duerst): para 3: "new rule proposed here" -> "new
      rule proposed" (we are talking about document organization, so



Morfin                   Expires March 10, 2010                [Page 33]

Internet-Draft                  wgidnalc                  September 2009


      it's "the rule in that other section over there", so "here"
      doesn't fit)
   
   *  1.3. (Martin Duerst): para 4: "Section 5 to Section 9 describe" ->
      "Section 5 to Section 7 describe": Section 8 is IANA
      consideration.
   
   *  1.4. (Mark Davis) "An RTL label is a label that contains at least
      one character of type R or AL." I believe you should also add
      "AN". There are cases where it affects ordering. What I mean is
      that if you had AN + L in a label (not nec in that order), you
      wouldn't even count it as a BIDI domain name, and thus none of the
      bidi doc would apply (according to the text). Yet such labels
      would be legal according to protocol, and I think they can cause
      reordering, and could thus cause the kind of visual confusion that
      BIDI is supposed to prevent.
   
         Author's current answer: Good point. I'll modify.
   
   *  1.4. BIDI properties come from Unicode. They might not be complete
      or could be completed in the future. What then?
   
         Author's current answer: See section 7.2, "This memo does not
         propose a solution for this problem".
   
   *  1.4. (Andrew Sullivan): There are some terms defined in Para 1.4.
      I think it would be way helpful to a naive reader to be directed
      to defs at the beginning of this section first, and then to say
      "there are specific BIDI-only terms also defined here". So I'd
      move the reference that's at the end of this section to the
      beginning.
   
   *  1.4. (Andrew Sullivan): The third paragraph now ends with a comma,
      so it looks like something was supposed to be added and wasn't. Or
      is this just a typo?
   
         Author's current answer: Typo, fixed.
   
   *  1.4. (Andrew Sullivan): I find this peculiar: A "Bidi domain name"
      is a domain name that contains at least one RTL label. If a domain
      name is RTL.RTL.RTL, it qualifies under this definition, even
      though there is no bidirectionality (all labels have the same
      directionality). Explaining why this is still "bidi" would leave
      me less confused.
   
         Author's current answer: Added some text to say "adding a
         separate RTL-name category would just make the spec more
         complicated".
   



Morfin                   Expires March 10, 2010                [Page 34]

Internet-Draft                  wgidnalc                  September 2009


   *  1.4. (Martin Duerst): "non spacing" -> "nonspacing"
   
   *  1.4. (Martin Duerst): "The directionality of such examples" ->
      "The display order of such examples"
   
   *  1.4. (Martin Duerst): "it means ..., approximately" -> "it
      approximately means"
   
   *  1.4. (Martin Duerst): "An RTL label": This seems to be the
      definition that Protocol might
   want to refer to.
   
   *  1.4. (Martin Duerst): 'Having a separate category of "RTL domain
      names" would not make this specification simpler, so has not been
      done.' -> 'Providing a separate category of "RTL domain names"
      would not make this specification simpler.'
   
   *  2. A replacement for the RFC 3454 BIDI rule: it would probably be
      good to indicate the applying order.
   
         Author's current answer: The 6 conditions can be checked in any
         order. All must be satisfied in order to make the test pass;
         different implementations may find that different checking
         orders make the code more or less efficient
   
   *  2. (Martin Duerst): (title), and elsewhere: Both "Bidi rule" and
      "Bidi test" are used, that's confusing. The term is always in
      singular. The document works that way in general, but "The
      following test" at the start of Section 2 is confusing, because
      the only 'tests' that one can see are the ones labeled 1. to 6.
      Maybe use something like "In order to pass the BIDI test, the
      following conditions 1. to 6. must all be satisfied."
   
   *  2. (Martin Duerst): conditions 2/4: Why are BN (control
      characters) allowed in RTL but not in LTR?
   
   *  2.1. (Andrew Sullivan): Rule 1 in Para. 2 says, "The first
      character must be a character with BIDI property L, R or AL." I
      can't tell whether that must is a requirement or a statement of
      fact that is entailed by other IDNA rules. If it's a requirement,
      it presumably ought to be a 2119 MUST; even if not, it seems that
      we have to know what to do in case the first character doesn't
      match this rule. If it's an entailment, it'd help to make that
      plain, which could be done by restating it, "The first character
      will be a character with BIDI property L, R, or AL due to [reason
      reference]."
   
         Author's current answer: I'm not sure what an entailment is -
         but this is a rule. In order to execute the text, you must look



Morfin                   Expires March 10, 2010                [Page 35]

Internet-Draft                  wgidnalc                  September 2009


         at the first character and check its BIDI property. This
         document isn't using 2119 (wow); I used to have the reference
         way back when, but it didn't seem to help any, so I removed it.
   
   *  3. (Andrew Sullivan): "One specific requirement was thought to be
      problematic, but turned out to be satisfied by a string that obeys
      the proposed rules:
      
      *  The Character Grouping requirement should be satisfied when
         directional controls (LRE, RLE, RLO, LRO, PDF) are used in the
         same paragraph (outside of the labels). Because these controls
         affect presentation order in non-obvious ways, by affecting the
         "sor" and "eor" properties of the Unicode BIDI algorithm, the
         conditions above require extra testing in order to figure out
         whether or not they influence the display of the domain name.
         Testing found that for the strings allowed under the rule
         presented in this document, directional controls do not
         influence the display of the domain name."
         
      comes after the discussion of things considered and rejected. This
      leaves me confused about whether the text above is in fact a
      requirement or not. If it is a requirement, then I'd move this
      segment to the part before the rejected requirements.
   
         Author's current answer: Added a little more text to explain
         the status of the requirement - it's a "nice to know".
   
   *  3. (Martin Duerst): "A requirement" -> "The requirement" (see
      above)
   
   *  3. (Martin Duerst): para 2: As this restricts things to the
      Unicode bidi algorithm, please say this earlier. (see above)
   
   *  3. (Martin Duerst): para 3: "requirements proposed" ->
      "requirements" (we are working on finalizing this document, we are
      no longer in the proposal stage)
   
   *  3. (Martin Duerst): requirement 2: Is the choice of 'characters
      delimiting the labels' open, is this only the ASCII dot, is this a
      small set (I'm interested in this both for spec clarity and
      because the answer might strongly affect draft-duerst-iri-bis).
   
   *  3. (Martin Duerst): 'possible requirement' related to
      directionality controls: "(outside of the labels)" -> "(outside,
      but potentially directly adjacent of the labels)" (does this
      include cases with directionality controls inside a domain name,
      i.e. before/after a dot?) "the conditions above require extra
      testing" -> "the conditions above required extra testing"
   



Morfin                   Expires March 10, 2010                [Page 36]

Internet-Draft                  wgidnalc                  September 2009


   *  3. (Martin Duerst): 'Delimiterchars': FULL STOP not allowed in
      domain names?
   
   *  4.1. (Martin Duerst): Thaana 'Computer' example: "UBIUFILI" ->
      "UBUFILI"
   
   *  4.2. (Martin Duerst): This section could be shortened
      considerably. "Greater latitude here than ... Dhivehi." is
      irrelevant; as long as a significant part of a language's words
      cannot be used in IDN, there's a problem. The subsection is
      interesting for people interested in Yiddish, but the average
      reader of the spec will try to find something relevant for the
      algorithm, and mostly be more confused than enlightened.
   
   *  4.2.3.4 (James Mitchell): If the proposed label contains any
      characters that are written from right to left it MUST meet the
      "bidi" criteria [IDNA2008-BIDI].
      
      The above implies that the label must meet the BIDI criteria,
      however BIDI criteria is applied to a BIDI domain name. From
      draft-ietf-idnabis-bidi-04, The following test has been developed
      for labels in BIDI domain names and A "Bidi domain name" is a
      domain name that contains at least one RTL label. I hesitate to
      provide alternative text for until the following question has been
      answered.
      
      The protocol states that the proposed label contains any
      characters written from right to left it MUST meet the bidi
      criteria. It does not impose such requirements on labels
      containing no right to left characters. Consider the registration
      of label 123abc in a zone containing labels written right to left.
      The label does not contain any right to left characters, therefore
      does not have to meet the BIDI criteria. However this name is a
      BIDI domain name, yet such a name would fail as the first label
      (LTR) does not begin with BIDI property L. Is the label intended
      to be valid for registration given a right to left zone?
   
         Author's current answer: In your specific case, registering
         "123abc" in the RTL zone "ABC" (usual convention applies) will
         lead to a domain name (network order: 123abc.ABC) that, on its
         own, will display as "123abc.CBA" in an RTL context, but if
         prepended by "DEF:", forming the network order string
         DEF:123abc.ABC, will display as "CBA.abc123:FED" - which may be
         surprising to some. (I'm only 90% confident on this - people
         more used to bidi in practice may be more confident).
         
         The WG has rejected inter-label tests, therefore all tests
         defined by the protocol as normative (MUST or SHOULD) apply
         only to one label at a time.



Morfin                   Expires March 10, 2010                [Page 37]

Internet-Draft                  wgidnalc                  September 2009

         
         Given the WG decision, I tried to make the BIDI document quite
         clear that certain properties can only be guaranteed for domain
         names where all the labels meet the test, but this is a case
         where people have to read the warnings and do something
         reasonable, rather than having the rules define that being
         unreasonable is forbidden.
         
         "Warning: Contains hot liquids".
   
      I am not concerned about the display ordering of the name in
      question. The issue is a mismatch between the registration and
      lookup protocols.
      
      The registration protocol asks the question of the label '123abc',
      which left-to-right is not required to satisfy BIDI. However, the
      lookup protocol says that one SHOULD apply the BIDI test (on I
      assume the name). Applying the BIDI test to this name will fail
      and the name will not be looked up.
      
      As a registry, should I allow registration of the name 123abc.RTL?
   
         Author's current answer: My recommendation is that you should
         (as a registry) establish policy that says "it is not allowed".
         The protocol does not require you to do so.
   
      As an application, should I lookup the name 123abc.RTL?
   
         Author's current answer: The protocol does not say that you
         can't. For the obvious reasons, I think it's a legitimate
         implementor decision to decide not to.
   
   *  4.2.3.4 (James Mitchell): I note also the inconsistent use of the
      term BIDI. In draft-ietf-idnabis-protocol-14 section 4.2.3.4. it
      is quoted and in lower-case, whereas the
      draft-ietf-idnabis-bidi-04 uses the upper-case version
      extensively. Also, within draft-ietf-idnabis-bidi-04 the term
      "Bidi domain name" in Section 1.4 is inconsistent with BIDI domain
      names in Section 2, and the tem Bidi rule in Section 10 is
      inconsistent with the several other occurrences in the document.
   
         Author's current answer: Thanks, I tried to normalize it to
         uppercase in an earlier round, but didn't remain consistent in
         later edits. Will fix!
   
   *  4.3. (Martin Duerst): "(with the 5 being considered right-to-left
      because of the leading ALEF)": No, the 5 itself is never
      right-to-left. Change to "(the overall directionality being
      right-to-left because of the leading ALEF)"
   



Morfin                   Expires March 10, 2010                [Page 38]

Internet-Draft                  wgidnalc                  September 2009


   *  4.3. (Martin Duerst): "but barring them both seems to require
      justification" -> "but barring them both seems unnecessary" or
      "but barring them both turned out to be unnecessary"
   
   *  5. (Martin Duerst): "Even if a label is registered under a "safe"
      label,": 'under' should be explained more clearly (I assume this
      refers to the hierarchical relationship in the DNS)
   
   *  5. (Martin Duerst): last paragraph: It would be better to change
      this into a SHOULD, such as "Where implementations see a a way to
      avoid ..., they SHOULD avoid". That will bring this issue on the
      radar screen of implementers, whereas it currently will just be
      glossed over.
   
   *  6 (Mark Davis)"Rules can also be specified at the protocol level,
      but while the example above involves right-to-left characters,
      this is not inherently a BIDI problem." I think the issue is that
      the word "can" was appropriate for when this was a proposal, but
      the situation is different in heading for release; the "can"
      should be changed according to what you mean.
      
      *  If you are referring to the situation as of when these are all
         released, then the rules either "are specified" or they are not
         (in which case the statement removed).
         
      *  If you are referring to a future time, then the "can" becomes
         "could" (or for clarity, adding "in a future version" or some
         such language).
         
      *  If you are referring to what could have been done, then "could
         also have been" would be appropriate.
         
      Because I can't tell what you want to say, I don't know which of
      these you would mean.
   
         Author's current answer: There's no guarantee of synchronity in
         further updates, so the situation isn't really all that
         different (BIDI doesn't place any constraint on future
         -tables), but I'll change to "are".
   
   *  6. (Martin Duerst): first paragraph: "All other issues with these
      scripts": What scripts???
   
   *  6. (Martin Duerst): "wishes to create rules for the mixing of
      digits" -> "wishes to create rules against the mixing of digits"
      or "wishes to restrict the mixing of digits"
   
   *  6. (Martin Duerst): "Rules are also specified at the protocol
      level, but while the example above involves right-to-left



Morfin                   Expires March 10, 2010                [Page 39]

Internet-Draft                  wgidnalc                  September 2009


      characters, this is not inherently a BIDI problem." -> "This
      example is not inherently a BIDI problem, so such restrictions are
      not specified at the protocol level." ("Rules are also specified
      at the protocol level" is inherently vague; it seems to mean "Some
      rules against mixing digits are also specified at the protocol
      level, but only when this is necessary to avoid a BIDI problem.")
   
   *  6. (Martin Duerst): "It is unrealistic to expect that applications
      will display domain names using embedded formatting codes between
      their labels (for one thing, no reliable algorithms for
      identifying domain names in running text exist);": Please add that
      it is also unrealistic that formatting codes are removed before
      IDNA processing, and that allowing formatting codes could lead to
      many kinds of 'mischief' that would go against the two
      requirements in section 3.
   
   *  6. (Martin Duerst): "which might surprise someone expecting to see
      labels displayed in hierarchical order.": Please add that this may
      not be such a big problem to general users familiar with BIDI,
      because they are used to seeing/reading a sequuence of RTL units
      (e.g. words) from right to left. (for wording alternatives, see
      http://tools.ietf.org/html/rfc3987#section-4.4, first para,
      *second para*, ...)
   
   *  7. Does that restriction mean that telephone numbers cannot be
      registered in BIDI zones?
   
         Author's current answer: If the registry desires that domain
         names behave sensibly, yes; if the registry only desires that
         domain names pass the test, no. There are no inter-label tests.
   
   *  7.1 (Martin Duerst): Bullet points 1 and 2 are major, whereas
      bullet point 3 is really farfetched (not impossible just because
      there is no guarantee against weird implementations). It would be
      good to indicate that somehow. (this includes the paragraph
      following bullet point 3)
   
   *  7.1. (Martin Duerst): "The editors believe": change to something
      less specific; this is a WG document, we either have rough
      consensus or we don't. (I for one fully agree with this point)
   
   *  7.2. (Martin Duerst): This should be slightly reworded to more
      clearly send the message that changes to Unicode bidi properties,
      while not totally impossible, are expected to be rare, and to
      affect mostly symbols and the like, which will limit their effect
      on what the BIDI rule(/test) allows and what not.
   
   *  8. IANA considerations. Same remark as in the Protocol case.
   



Morfin                   Expires March 10, 2010                [Page 40]

Internet-Draft                  wgidnalc                  September 2009


         Author's current answer: The Unicode Consortium does not make
         changes to published versions of its standards; I believe we
         can trust them to keep version 5.1 available for a while.
   
      Moreover, the section above then states: "the determination of
      validity for any string depends on the Unicode BIDI property
      values, which are not declared immutable by the Unicode
      Consortium."
   
         Author's current answer: See section 7.2.
   
   *  8. (Martin Duerst): "It is possible that differences in the
      interpretation of the specification": Wrong. There are no
      differences in interpretation for the old spec. There are no
      differences in the interpretation of the new spec. There are
      differences in the specs themselves.
   
9.  IDNA Tables
   
   *  disorder in paragraphs.
   
         Author's current answer: The exact order of the sections will
         be decided upon (and sections will potentially be moved around)
         at the time of publication by the RFC Editor while doing other
         formatting changes.
         
         I have been in contact with the other document editors, and our
         suggestion to our wg chair is to *NOT* risk destroying the
         actual content of the messages by moving things around at the
         time of last call (both wg and IETF).
   
   *  (Gihan Dias): Tamil digits. John Klensin: "I see a considerable
      difference between, e.g.,
      
      - "exclude Tamil numerals"
      
      and
      
      - "this character looks like that character, so exclude one of
      them".
      
      The latter is clearly part of a case-by-case character analysis.
      The former, whatever it might be, is a decision about a class of
      characters, whether Unicode's selection of properties identifies
      it as a class or not.
      
      The Sri Lanka IDN Task Force considered the document draft-ietf-
      idnabis-tables-06.txt and has an issue with the inclusion on the
      Tamil Digits as valid IDNA characters.



Morfin                   Expires March 10, 2010                [Page 41]

Internet-Draft                  wgidnalc                  September 2009

      
      0BE6..0BEF ; PVALID # TAMIL DIGIT ZERO..TAMIL DIGIT NINE
      
      However, we consider that the potential for confusion is
      sufficient that they be disallowed in the protocol, and request
      that they be excluded.
   
         Author's current answer: Given the current rules, and the
         properties in the Unicode Database, the only way to treat
         0BE6..0BEF as DISALLOWED is to add them to the exceptions table
         one by one. I.e. add them to section "2.6. Exceptions (F)" with
         the explicit value DISALLOWED.
   
   *  1. Introduction. "In particular, some combinations of allowed code
      points are not advisable for use in IDNs due to rules specific to
      a script or class of characters" introduces the concept of a
      "class of characters", but does not document it. IDNA Rationale
      7.1.3 states "Maintain IDNA and Unicode tables that are consistent
      with regard to versions, i.e., unless the application actually
      executes the classification rules in [IDNA2008-Tables]" yet the
      only time "classifications (rules) appears" in IDNA Tables is in
      "4. Code points" as "The Categories and Rules defined in Section 2
      and Section 3 apply to all Unicode code points. The table in
      Appendix B shows, for illustrative purposes, the consequences of
      the categories and classification rules, and the resulting
      property values."
      
      What is a "class of characters"?
   
   *  1. Introduction ends with " This document is part of a series
      that, together, constitute a proposal for updating the IDNA
      standards to resolve issues uncovered in recent years, cover a
      broader range of scripts, and provide for migration to newer
      versions of Unicode. See [IDNA2008-rationale] for a broader
      discussion. " Should this not be removed or edited?
   
   *  2.1. "For more information, see section 4.5 of The Unicode
      Standard [Unicode5]." Is it also the case in Unicode 5.1?
      Shouldn't this document be stored by the IANA?
   
   *  2.2. NFKC or NFC?
   
   *  2.2. (Paul Hoffman) Section 2.2 uses NFKC, but the protocol itself
      uses NFC. I think it is useful to make a note to this effect in
      Section 2.2.
   
   *  2.8. (Andrew Sullivan): "JoinControl (H)
      
      H: Join_Control(cp) = True
      



Morfin                   Expires March 10, 2010                [Page 42]

Internet-Draft                  wgidnalc                  September 2009


      This category consists of Join Control characters (i.e., they are
      not in LetterDigits (Section 2.1)) but are still required in IDN
      labels under some circumstances. They require extended special
      treatment in Lookup and Resolution."
      
      I think we previously agreed just to call the action "lookup".
      Strictly, all the special treatment is part of the lookup process,
      but not the resolution process (which is a straight DNS activity
      that happens to be using an A-label as its QNAME). As I've argued
      before, I want the documents to stay very far away from any
      suggestion that they are changing the operation of the DNS as
      such.
   
   *  2.10. "It should be noted that Unicode distinguishes between
      'unassigned code points' and 'unassigned characters'". Can the
      differences (nature and in relation to IDNA) between the
      characters and codepoints be explained here?
   
   *  5. IANA consideration. It is suggested that IANA should retain
      online copies of the version of external documents that are
      normatively referenced in the IETF documents.
   
   *  7. (Andrew Sullivan): " As is usual with IETF specifications,
      while the document represents rough consensus, it should not be
      assumed that all participants and contributors agree with all
      provisions." I'll spare participants my speech on why this is a
      bad thing this time.
   
   *  "A table from which that registry can be initialized, and some
      further discussion, appears in Appendix A. " - Who is to decide
      and maintain the table and according to which rules/procedures?
   
   *  Appendix A. as a comment, we do not understand, from the presented
      kind of logic, as to why:
      
      *  Tamil digits cannot be made subject to a rule and added to
         CONTEXTO?
         
      *  The same for French majuscules?
         
      *  The same for any zone specific restriction?
         
      It seems implied that the logic should be the same on the sending
      and receiving end? The receiving end is only for decoding what the
      sending end chose to encode in its own context. That context needs
      to be considered and supported. If my application is in Tamil or
      French, it knows it and can be demanded to proceed accordingly.
   
   *  Appendix A (Andrew Sullivan): paragraph "Note that "Before" and



Morfin                   Expires March 10, 2010                [Page 43]

Internet-Draft                  wgidnalc                  September 2009


      "After" do not refer to the visual display order of the character
      in a label, which may be reversed or otherwise modified by the
      bidirectional algorithm for labels including characters from
      scripts written right-to-left." might benefit from the addition of
      another sentence, "Instead, 'Before' and 'After' refer to the
      network order of the character in the label."
   
   *  Appendix A (Andrew Sullivan): "Appendix A.7. KATAKANA MIDDLE DOT
      
      Code point:
      
      U+30FB
      
      Overview:
      
      Note that the Script of Katakana Middle Dot is not any of
      "Hiragana", "Katakana" or "Han". The effect of this rule is to
      require at least one character in the label to be in one of those
      scripts. ...." there is no "End For" as called for in the
      pseudocode definition.


Author's address

   Jean-Francois C. Morfin
   INTLNET
   23 rue Saint Honore
   Versailles
   78000 Versailles
   France

   Phone: (33.1) 39 50 05 10
   Email: jefsey@jefsey.com
   URI:   http://intlnet.org


















Morfin                   Expires March 10, 2010                [Page 44]