INTERNET-DRAFT John C Klensin 21 October 2002 Expires April 2003 National and Local Characters in DNS TLD Names draft-klensin-idn-tld-00.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 except that the right to produce derivative works is not granted. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Abstract In the context of work on internationalizing the Domain Name System (DNS), there have been extensive discussions about "multilingual" or "internationalized" top level domain names (TLDs), especially for countries whose predominant language is not written in a Roman-based script. This document reviews some of the motivations for such domains and the constraints that the DNS imposes. It then suggests an alternative, local translation, that may solve a superset of the problem while avoiding protocol changes, serious deployment delays, and other difficulties. Table of Contents 1 Introduction 1.1 Background on the "Multilingual Name" Problem 1.2 Domain Name System Constraints 1.3 Internationalization and Localization 2. Client-side solutions 2.1 IDNA and the client 2.2 Local translation tables for TLD names 3. Advantages and disadvantages of local translation 3.1 Every TLD in the local language and character set 3.2 Unification of country code domains 3.3 User understanding of local and global reference 3.4 Limits on TLD propagation 4. Security Considerations 5. References 6. Acknowledgements 7. Author's Address 1. Introduction 1.1 Background on the "Multilingual Name" Problem People who share a language prefer to communicate in it, using whatever characters are normally used to write that language, rather than in some "foreign" one. There have been standards for using mutually-agreed characters and languages in electronic mail message bodies and selected headers since the introduction of MIME in 1992 [MIME] and the Web has permitted multilingual text since its inception. However, since domain names are exposed to users in email addresses and URLs, and corresponding arrangements in other protocols, demand rapidly arose to permit domain names in applications that used characters other than those of the very restrictive, ASCII-subset, "LDH" conventions [LDH]. The effort to do this rapidly became known as "multilingual domain names", although that is a misnomer, since the DNS deals only with characters and identifier strings, and not, except by accident, what people usually think of as "names". And there has been little actual interest in what would actually be a "multilingual name" -- i.e., a name that contains components from more than one language -- but only the use of strings conforming to different languages in the context of the DNS. 1.1.1 Approaches to the requirement If the requirement is seen, not as "modifying the DNS", but as "providing users with access to the DNS from a variety of languages and character sets", three sets of proposals have emerged in the IETF and elsewhere. They are: (1) Perform processing in client software that recodes a user-visible string into an ASCII-compatible form that can safely be passed through the DNS protocols and stored in the DNS. This is the approach used, for example, in the IETF's "IDNA" protocol [IDNA]. (2) Modify the DNS to be more hospitable to non-ASCII names and strings. There have been a variety of proposals to do this in almost as many ways, some of which have been implemented on a proprietary basis by various vendors. None of them have gained acceptance in the IETF community, primarily because they would take a long time to deploy and would leave many problems unsolved. (3) Move the problem out of the DNS entirely, relying instead on a "directory" or "presentation" layer to handle internationalization. The rationale for this approach is discussed in [DNSROLE]. This document proposes a fourth approach, applicable to the top level domains (TLDs) only (see section 1.2.1 for a discussion of the special issues that make TLDs problematic). That approach could be used as an alternate or supplement to the strategies summarized above. 1.1.2 Writing the name of one's country in its own characters An early focus of the "multilingual domain name" efforts was expressed in statements such as "users in my country, in which ASCII is rarely used, should be able to write an entire domain name in their own character set. In particular, since all top-level domain names, at present, follow the LDH rules, the somewhat more restrictive naming rules discussed in [STD3], and the coding conventions specified in [RFC1591], all fully-qualified DNS names were effectively required to contain at least one ASCII label (the TLD name), and that was considered inappropriate. One should, instead, be able to write the name of the ccTLD for China in Chinese, the name of the ccTLD for Saudi Arabia in Arabic, and so on. 1.1.3 Countries with multiple languages and countries with multiple names >From a user interface standpoint, writing ccTLD names in local characters is a problem. As discussed in section 1.2.2, the DNS itself does not easily permit a domain to be referred to by more than one name (or spelling or translation of a name). Countries with more than one official language would require that the country name be represented in each of those languages. And, just as it is important that a user in China be able to represent the name of the Chinese ccTLD in Chinese characters, she should be able to access a Chinese-language site in France using Chinese characters, requiring that she be able to write the name of the French ccTLD in those characters rather than in a form based on a Roman character set. 1.2 Domain Name System Constraints 1.2.1 Administrative hierarchy The domain name system is designed around the idea of an "administrative hierarchy", with the entity responsible for a given node of the hierarchy responsible for policies applicable to its subhierarchies (Cf. [STD13]). The model works quite well for the domain and subdomains of a particular enterprise, where the hierarchy can be organized to match the organizational structure, there are established ways to set policies and there is, at least presumably, shared assumptions about overall goals and objectives among all registrants in the domain. It is more problematic when a domain is shared by unrelated entities which lack common policy assumptions. It is difficult to reach agreement on rules that should apply to all of them. That situation always prevails for the labels registered in a TLD (second-level names) except in those TLDs for which the second level is structural (e.g., the .CO, .AC, .GOV conventions in many ccTLD) in which case, it exists for the labels within that structural level. TLDs may, but need not, have consistent registration policies for those second (or third) level names. Countries (or ccTLD administrators) have often adopted rules about what entities may register in those ccTLDs, and the forms the names may take. RFC 1591 outlined registration norms for most of the gTLDs, even though those norms have been largely ignored in recent years. And some recent "sponsored" domains are based on quite specific rules about appropriate registrations. Homogeneous registration rules for the root are, by contrast, impossible: almost by definition, the subdomains registered in it are diverse and no single policy applying to all root subdomains (TLDs) is feasible. 1.2.2 Aliases In an environment different from the DNS, a rational way to permit assigning local-language names to a country code (or other) domain would be to set up an alias for the name, or to use some sort of "see instead" reference. But the DNS does not have quite the right facilities for either. Instead, it supports a "CNAME" record, whose label can refer onto to a particular label and not to a subtree. For example, if A.B.C is a fully-qualified name, then a CNAME reference from X to A would make X.B.C appear to have the same values as A.B.C. However, a CNAME reference from Y to C would not make A.B.Y referenceable (or even defined) at all. A second record type, DNAME [RFC2672], can provide an alias for a portion of the tree. But it is problematic technically, and its use is strongly discouraged except for transition uses from one domain to another. 1.3 Internationalization and Localization It has often been observed that while many people talk about "internationalization" (a term we typically use for making something globally accessible while incorporating a broad-range "universal" character set and conventions appropriate to all languages), they often really mean, and want, "localization" (making things work well in a particular locality, or well, but potentially differently, for a broad range of localities). Anything that actually involves the DNS must be global and hence internationalized since the DNS cannot meaningfully support different responses based, e.g., on the location of the user making a query. While the DNS cannot support localization internally, many of the features discussed earlier in this section are much more easily thought about in local terms --whether localized to a geographical area, users of a language, or using some other criteria -- than in global ones. 2. Client-side solutions Traditionally, the IETF has avoided becoming involved in standardization for actions that take place strictly on individual hosts on the network, assuming that it should confine itself to behavior that is observable "on the wire", i.e., in protocols between network hosts. Exceptions to this general principle have been made when different clients were required to utilize data or interpret values in compatible ways to preserve interoperability: the standards for email and web body formats, and IDNA itself, are examples of these exceptions. Regardless of what is required to be standardized, it is almost never required, and often unwise, that a user interface, by default, present on-the-wire formats to the user. However, in most cases when the presentation format and the wire format differ, the client program must take precautions that the wire format can be reconstructed from user input, or to keep the wire format, while hidden, bound to the presentation mechanism so that it can be reconstructed. And, while it is rarely a goal in itself, it is often necessary that the user be at least vaguely aware that the wire ("real") format is different from the presentation one and that the wire format be available for debugging. 2.1 IDNA and the client As mentioned above, IDNA itself is entirely a client-side protocol. It works by providing labels to the DNS in a special format (so-called "ACE"). When labels in that format are encountered, they are transformed, by the client, back into internationalized (normally Unicode) characters. In the context of this document, the important obvservation about IDNA is that any application program that supports it is already doing considerable transformation work on the client; it is not simply presenting the on-the-wire formats to the user. 2.2 Local translation tables for TLD names We suggest that, in addition to maintaining the code and tables required to support IDNA, clients may want to maintain a table that contains a list of TLDs and that maps between them and locally-desirable names. For ccTLDs, these might be the names (or locally-standard abbreviations) by which the relevant countries are known locally (whether in ASCII characters or others). With some care on the part of the application designer (e.g., to ensure that local forms do not conflict with the actual TLD names), a particular TLD name input from the user could be either in local or standard form without special tagging or problems. When DNS names are received by these client programs, the TLD labels would be mapped to local form before IDNA is applied to the rest of the name; when names are received from users, local TLD names would be mapped to the global ones before being passed into IDNA or for other DNS processing. 3. Advantages and disadvantages of local translation 3.1 Every TLD in the local language and character set The notion of a top-level domain whose name matches, e.g., the name that is used for a country in that country or the name of a language in that language as, as mentioned above, immediately appealing. But most of the reasons for it argue equally strongly for other TLDs being accessible from that language. A user in Korea who can access the national ccTLD in the Korean language and character set has every reason to expect that both generic top level domains and and domains associated with other countries would be similarly accessible, especially if the second-level domains bear Korean names. A user in Spain or Portugal, or in Latin America, would presumably have similar expectations, but would expect to use Spanish names, not Korean ones. That level of local optimization is not realistic --some would argue not possible-- with the DNS since it would ultimately require that every top level domain be replicated for each of the world's languages. That replication process would involve not just the top level domain itself: in principle, all of its subtrees would need to be completely replicated as well (or at least all of the subtrees for which a the language associated with the a given replicant was relevant). The administrative hierarchy characteristics of the DNS (see section 1.2.1) turn the replication process into an administrative nightmare: every administrator of a second-level domain in the world would be forced to maintain dozens, probably hundreds, of similar zone files for the the replicates of the domain. Even if only the zones relevant to a particular country or language were replicated, the administrative and tracking problems to bind these to the appropriate top-level domain and keep all of the replicas synchronized would be extremely difficulty at best. And many administrators of third- and fourth-level domains, and beyond, would be faced with similar problems. By contrast, dealing with the names of TLDs as a localization problem, using local translation, is fairly simple. Each function represented by a TLD -- a country, generic registrations, or purpose-specific registrations -- could be represented in the local language and character set as needed. And, for countries with many languages, or users living, working, or visiting countries where their language was not dominant, "local" could be defined in terms of the needs or wishes of each particular user. 3.2 Unification of country code domains It follows from some of the comments above that, while there appears to be some immediate appeal from having (at least) two domains for each country, one using the ISO 3166-1 code and another one using a name based on the national name in the national language, such a situation would create considerable problems for registrants in the multiple domains. For registrants maintaining enterprise or organizational subdomains, ease of administration in a single family of zone files will usually make a registration in a single top-level domain preferable to replicated sets of them, at least as long as their functional requirements (such a local-language access) are met by the unified structure. Of course, having replicated domains might be popular with registries and registrars, since replication would almost inevitably increase the total number of domains to be registered. 3.3 User understanding of local and global references While the IDNA tables (actually Nameprep and Stringprep -- see the IDNA specification) must be identical globally for IDNA to work reliably, the tables for mapping between local names and TLD names could be locally determined, and differ from one locale to another, as long as users understood that international interchange of names required using the standard forms. That understanding could be assisted by software. It is likely that, at least for the foreseeable future, DNS names being passed among users in different countries, or using different languages, will be forced to be in ACE form to guarantee compatibility in any event, so the marginal knowledge or effort needed to put TLD names into standard form and transmit them that way would be very small. 3.4 Limits on TLD propagation The concept of using local translation does have one side-effect, which some portions of the Internet community might consider undesirable. The size and complexity of translation tables, and maintaining those tables, will be, to a considerable extent, a function of the number of top-level domains, the frequency with which new domains are added, and the number of domains that are added at a time. A country or other locale that wished to maintain a few set of translations (i.e., so that every TLD had a representation in the local language) would presumably find setting up a table for the current collection of a few hundred domains to be a task that would take some days. If the number of TLDs was relatively stable, with a relatively small number being added at infrequent intervals, the updates could probably be dealt with on an ad hoc basis. But, if large numbers of domains were added frequently, or if the total number of TLDs became very large, maintaining the table might require dedicated staff. Worse, updating the tables stored on client machines might require update and synchronization protocols and all of the related complexities. 4. Security Considerations IDNA provides a client-based mechanism for presenting Unicode names in applications while passing only ASCII-based names on the wire. As such, it constitutes a major step along the path of introducing a client-based presentation layer into the Internet. Client-based presentation layer transformations introduce risks from variant tables that can change meaning without external protection. For example, if a mapping table normally maps A onto C and that table is altered by an attacker so that A maps onto D instead, much mischief can be committed. On the other hand, these are not the usual sort of network attacks: they may be thought of as falling into the "users can always cause harm to themselves" category. The local translation model outlined here does not significantly increase the risks over those associated with IDNA, but may provide some new avenues for exploiting them. Both this approach and IDNA rely on having updated programs present information to the user in a very different form than the one in which it is transmitted on the wire. Unless the internal (wire) form is always used in interchange, there are possibilities for ambiguity and confusion about references. 5. References [DNSROLE] Klensin, J.C., "Role of the Domain Name System", work in progress (draft-klensin-dns-role-04.txt). [IDNA] Faltstorm, F., P. Hoffman, A. M. Costello, "Internationalizing Domain Names in Applications (IDNA)", work in progress (draft-ietf-idn-idna-13.txt) [LDH] STD13 and comments [MIME] Borenstein, N. and N. Freed, "MIME (Multipurpose Internet Mail Extensions): Mechanisms for Specifying and Describing the Format of Internet Message Bodies", RFC 1341, June 1992. Updated and replaced by Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", RFC2045, November 1996. Also, Moore, K., "Representation of Non-ASCII Text in Internet Message Headers", RFC 1342, June 1992. Updated and replaced by Moore, K., "MIME (Multipurpose Internet Mail Extensions) Part Three: Message Header Extensions for Non-ASCII Text", RFC 2047, November 1996. [RFC1591] Postel, J., "Domain Name System Structure and Delegation", RFC1591, March 1994. [RFC2672] Crawford, M., "Non-Terminal DNS Name Redirection", RFC 2672, August 1999. [STD3] Braden, R., Ed., "Requirements for Internet Hosts - Application and Support", RFC1123, October 1989. [STD13] Mockapetris, P.V., 1034 "Domain names - concepts and facilities", RFC 1034, and "Domain names - implementation and specification", RFC 1035, November 1987. 6. Acknowledgements This document was inspired by a number of conversations in ICANN, IETF, MINC, and private contexts about the future evolution and internationalization of top level domains. Discussions within, and about, the ICANN IDN Committee have been particularly helpful, although several of the members of that committee may be surprised about where those discussions led. 7. Author's Address John C Klensin 1770 Massachusetts Ave, #322 Cambridge, MA 02140 USA email: john+ietf@jck.com