IDN Working Group Edmon Chung & David Leung Internet Draft Neteka Inc. August 2000 The DNSII Multilingual Domain Name Protocol STATUS OF THIS MEMO This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The reader is cautioned not to depend on the values that appear in examples to be current or complete, since their purpose is primarily educational. Distribution of this memo is unlimited. The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract Historically, the DNS is capable of handling only names within the basic English alphanumeric character set (plus the hyphen), yet the standards were so elegantly and openly designed that the extension of the DNS into a multilingual and symbols based system proves to be possible with simple adjustments. These adjustments will be made on both the client side and the server side. However, DNSII works on the principal that it is preferable to make the transition to multilingual domain names seamless and transparent to the end-user. Which means initially the server, or more specifically, the resolver, SHOULD take the primary responsibility for the technical implementation of the changes required for a multilingual Internet. The DNSII protocol is designed to allow the preservation of interoperability, consistency and simplicity of the original DNS, while being expandable and flexible for the handling of any character or symbol used for the naming of an Internet domain. This draft forms the introduction of a series of draft including intended resolution processes and other DNSII documents. DNSII-MDNP Multilingual Domain Name Protocol August 2000 1. Introduction This Internet-draft describes details of the DNSII Multilingual Domain Name protocol. The Internet-Draft assumes that the reader is familiar with the concepts discussed in the widely distributed RFCs "Domain Names _ Concepts and Facilities" [RFC 1034] and _Domain Names _Implementation and Specification" [RFC 1035]. 1.1 Terminology The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", and "MAY" in this document are to be interpreted as described in RFC 2119 [RFC2119]. A number of multilingual characters are used in this document for examples. Please select your view encoding type to UTF-8 for it to be displayed properly. 1.2 DNSII Many of the current proposals for a multilingual domain name system involve working around the current ANSI based DNS. So doing either affects the integrity of the original spirit of the DNS or does not well address the encoding conflict issues apparent in different character encoding schemes. The DNSII specifications takes a radically different approach: it successfully identifies the difference between original DNS and DNSII packets within the labels and at the same time allows the use of multiple charsets to be easily incorporated in a standardized manner. It causes no harm to the current DNS because it embraces the original format for DNS laid out in RFC1035, complemented with the ideas incorporated in EDNS [RFC2671]. 2. DNSII Protocol The DNSII Protocol consists mainly of two parts: the InPacket DNSII Identifier and the InPacket Label Encoding Type. In addition, there are several special considerations for specific record types. 2.1 InPacket DNSII Identifier In the DNSII specifications, an InPacket DNSII Identifier MUST be inserted before a label to signify that it contains extended characters that are not supported by the current DNS. This DNSII flag, which is the first two bits of a label, effectively distinguishes a DNSII compliant request from the existing format, without having to conduct a guess from a name check whether the DNSII-MDNP Multilingual Domain Name Protocol August 2000 incoming packet is multilingual aware. This is a substantial improvement over character encoding schemes and multilingual implementations in which it is almost impossible to determine the language of an incoming request. The DNSII flag makes the process clear and simple. Currently: "00" regular label [RFC1035] "11" a redirection for DNS compression [RFC1035] "01" indicates the use of EDNS for multiple UDP packets [RFC2671] DNSII calls for the use of the bit sequence "10" to identify that the querying node is DNSII aware. This will mean that all the possible variations at top two label bits will be used. Therefore, in consideration, following two bits MUST be reserved for future flagging use. The 2 bits SHOULD be arbitrarily set to "00". This effectively opens up 3 more possible implementations for future enhancements. The motivation for this approach is the belief there should be no ambiguity in name resolution. Any name that the client wishes to resolve, should resolve, regardless of the client side-encoding scheme. 2.2 InPacket Label Encoding Type (ILET) Immediately following the 2 assigned DNSII flag and the 2 reserved bits are 12 bits assigned to determine the InPacket Label Encoding Type (ILET). The ILET is a 12-bit number that is used to determine the encoding scheme used by the characters of the label. The MIBenum numbers [RFC1700] SHOULD be used in this field. The allocation of 12 bits aligns perfectly with the MIBenum specification, of which the value goes up to over 2200. With 12 bits, the total possible values would be 4096 (with 11 bits, the largest value that can be represented is only 2047, slightly short of the specification). The reason for the adoption of MIBenum is to make use of the existing list of encoding numbering schemes rather than re-inventing the wheel. The value in the ILET field SHOULD only be allowed for the valid encoding schemes defined in the MIBenum list. DNSII-MDNP Multilingual Domain Name Protocol August 2000 After identifying the encoding type, the regular count-label scheme of the DNS resumes. The resulting label should look like this: 1 1 1 1 1 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +---+---+-------+---------------+ |1 0| z | ILET | +---------------+---------------+ | COUNT | characters... | +---------------+---------------+ To minimize the size of a DNS packet, if the entire label is constituted in characters only from the ANSI table, the DNS label will appear identical to current implementations. The first two bits will remain "00". For example, using the DNSII format the label for "dns" MAY be represented as: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | 1 0| 0 0| 0 0 0 0 0 0 0 0 0 0 1 1| MIBenum 3 = ANSI +-----------------------------------------------+ | 3 | 6 4 | "d"=64 +-----------------------------------------------+ | 6 E | 7 3 | "n"=6E "s"=73 +-----------------------------------------------+ Or, the same domain label "dns" MAY also be represented as: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | 3 | d | +-----------------------------------------------+ | n | s | +-----------------------------------------------+ With a multilingual domain name ns.…––…Ιμ‡ώ©‡΄˜.tld as an example: 1 1 1 1 1 1 1 1 1 1 1 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +---------------------------------------------------------------+ |1 0| z | ANSI=3 | 2 | n | +---------------------------------------------------------------+ | s |1 0|0 0| UCS-2=1000 | 4 | +---------------------------------------------------------------+ | …–– (U+57DF) | …Ιμ (U+540D) | +---------------------------------------------------------------+ | ‡ώ© (U+7CFB) | ‡΄˜ (U+7D71) | +---------------------------------------------------------------+ |0 0| 3 | t | l | d | +---------------------------------------------------------------+ | 0 | +---------------+ DNSII-MDNP Multilingual Domain Name Protocol August 2000 From the above example, we can see that the DNSII format is used for the first label "ns", as well as for the second label, which is in Chinese (the MIBenum for UCS-2 or ISO 10646 [Unicode] is 1000). The third label "tld" however uses the current format. In any case, the count-label-count-label mechanism is largely preserved. Especially in the case of extended characters where in other proposals, the "count" no longer represents the character count. In the above example, the domain is still represented as 2ns4…––…Ιμ‡ώ©‡΄˜3tld0, exactly in line with the original specifications. Note that the first label in any query SHOULD be represented in DNSII format to alert the destination server that it is DNSII aware. This is specifically configured for the considerations with CNAME, A6, DNAME and PTR records. This approach is used to ensure that there is no confusion about the encoding format of the label. ILET allows the capability of employing all existing encoding schemes (UTF-7, UTF-8, ISO 10646 [UCS-2], ISO 10646 [UCS-4]). ILET also allows the flexibility of employing future encoding schemes. 2.3 The Rationale for using ILET Besides being able to preserve the count-label-count-label structure, which in itself is actually a very important part because of the problematic non-uniform byte encoding schemes, the use of ILET aligns perfectly with previous IETF specifications as well as beneficial for tricky case folding and canonicalization issues. We know that all protocols MUST identify, for all character data, which charset is in use [RFC2277], therefore it is necessary to specify whatever encoding scheme, whether it be UTF-8, UTF-7, 16-bit UCS-2 or ISO 8859 that is being used. In essence, we understand that it is paramount that a charset be clearly identified, especially in situation like the DNS where no direct communication is established. "At times and in specific cases, language information may be required to achieve a particular level of quality for the purpose of displaying a text stream. For example, UTF-8 encoded Han may require transmission of a language tag to select the specific glyphs to be displayed at a particular level of quality. Note that information other than language may be used to achieve the required level of quality in a display process. In particular, a font tag is sufficient to produce identical results. However, the association of a language with a specific block of text has usefulness far beyond its use in display. In particular, as the amount of information available in multiple languages on the World Wide Web grows, it becomes critical to specify which language is in use in particular documents, to assist automatic indexing and retrieval of relevant documents." [RFC2130] DNSII-MDNP Multilingual Domain Name Protocol August 2000 In effect, this means for different languages, it is beneficial to be able to identify the language in order to perform specific functions to the characters, including case folding. With ILET, the local encoding scheme could be used and with them there are well defined folding methods. Therefore, the use of ILET enables an optimized folding mechanism brought about by the preservation of local encoding schemes, which is otherwise very difficult or virtually impossible to do if only UTF-8 is used. For the DNS however, a language tag is less feasible because if a name is consisted of multiple languages, it would be very difficult for tagging to be performed. The possibility of having multiple languages is very sound, and is used frequently as trademarks around the world. For example the famous Toys"Ο―"Us name, uses a character from the Cyrillic language set. 2.4 Considerations for Specific Requests For certain requests, an ANSI only name could result in a multilingual domain as an answer. These include PTR, CNAME, A6 and DNAME requests. Special considerations are made within the DNSII protocol to make sure that non-DNSII aware servers will not be fed with a DNSII format packet. 2.4.1 PTR Records For all PTR requests, the first label of the query MUST use DNSII format to alert the destination server. Upon which, a DNSII packet will be replied should the name contain extended characters. If the DNSII format is not used, and the PTR record stumbles upon a multilingual domain name, one of the following responses SHOULD be given: a. The implementer of DNSII MAY chose to reject the request; or b. An ACE format domain with a "for.ref.only" suffix MAY be returned; or c. A DNSII compliant server MAY return an 8-bit format of the requested domain. Since the PTR record is usually used for display purposes only, the rejection (the IP address will then be used) or ACE format is acceptable. If the response is however used for further resolution, an ACE format MUST not be used. DNSII-MDNP Multilingual Domain Name Protocol August 2000 2.4.2 CNAME, A6 & DNAME For queries concerning the record types CNAME, A6 or DNAME, a DNSII aware server should first check to see if the incoming request is DNSII compliant (flagged by the "10" bits in the first label): If so, and the domain to be returned includes extended characters, the response SHOULD be in DNSII format. If not, any multilingual domains returned should be in an 8 bit form. For the above record types it is strongly RECOMMENDED not to associate an alphanumeric label to a multilingual label as the RDATA. However, it is permissible to associate a multilingual label with an alphanumeric label as the RDATA. 3. Alternate Implementations The DNSII-MDNP is intended to be a framework for the implementation of multilingual domain names. While the core concepts and the design principles remain consistent, it is possible to contemplate alternative implementations, which for some people may feel easier to implement. 3.1 Restricted ILET Values One possible implementation guideline is for the ILET to be restricted to values only representing ISO 10646 transformations including UCS-2, UCS-4, UTF-7, UTF-8, UTF-16 and other as they become available and included as a standard MIBenum. Although this takes away some of the benefits of keeping the local encoding scheme which includes the issues of case folding, canonicalization and other related concerns, it creates a system that on one hand contains only encoding schemes from ISO 10646, but on the other hand still provides the flexibility of deploying new encoding schemes that stem from ISO 10646, such as the 32-bit format that is due to be used soon. We understand it is specified that in protocols, which up to now have used US-ASCII only, UTF-8 forms a simple upgrade path; however, its use should be negotiated either by negotiating a protocol version or by negotiating charset usage, and a fallback to UTF-7 MUST be available. [RFC2130] With DNSII, the required fallback to UTF-7 could easily be done by setting the ILET value to reflect UTF-7. DNSII-MDNP Multilingual Domain Name Protocol August 2000 3.2 Reduced ILET Bit Allocation Furthering the restriction of the ILET to ISO 10646 transformations only, the ILET bit allocations could also be reduced from 12 bit to 5 bit. This successfully creates a total of 32 possible values. The reserved bits are also reduced to one. 1 1 1 1 1 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +---+-+---------+---------------+ |1 0|z| ILET | COUNT | +---------------+---------------+ | characters... | +---------------+ For example, the label "…––…Ιμ‡ώ©‡΄˜" will now be reflected in DNS packets in the following form: 1 1 1 1 1 1 1 1 1 1 1 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +---------------------------------------------------------------+ |1 0|z| ILET=1 | 4 | …–– (U+57DF) | +---------------------------------------------------------------+ | …Ιμ (U+540D) | ‡ώ© (U+7CFB) | +---------------------------------------------------------------+ | ‡΄˜ (U+7D71) | +--------------------------------+ To start off with, the ILET values MAY be determined as follows: 0 = reserved for ANSI only 1 = 16 bit UCS-2 2 = UTF-8 3 = UTF-7 4 = 32 bit UCS-4 4. Implementation & Deployment Strategies The first step in any multilingual domain name implementation should be to encourage an 8-bit clean approach to DNS. However, even when the system is 8-bit clean the problem with conflicting characters still exists. This is where the DNSII protocol becomes most valuable. Although the DNSII protocol could be implemented at any level of the DNS, the following phased rollout is contemplated. (1) Registry Level - The most meaningful starting point for deployment would be at the registry level since this creates the demand from the end users to use multilingual and extended character domain names for Second Level Domains. DNSII-MDNP Multilingual Domain Name Protocol August 2000 (2) Host Level - At the same time, registrants of the new extended domain names could start to implement DNSII to host these special kinds of domain names. All other hosts that do not wish to use extended characters do not have to migrate to the DNSII. (3) Client Level - Once the multilingual aspect and the DNSII specifications become mainstream, the user level resolvers will begin to migrate. This will include both the client resolver as well as the ISP's DNS. (4) Root Level - Eventually, as the DNSII is proven to be stable and beneficial for the Internet at large, it could be used in the Root Level so that new multilingual TLDs could be created. 5. IDN Requirements Considerations The DNSII protocol specification is in line with most if not all of the requirements identified by the IDN work group. 6. DNSSEC, EDNS and IPv6 Considerations The use of DNSII should not require any adjustments with the implementation of DNSSEC, EDNS or IPv6. EDNS as well as compression in fact will be done exactly the same as the existing system. For example, the domain host.dns.…––…Ιμ‡ώ©‡΄˜.tld running with EDNS as well as compression after host will look as follows: 1 1 1 1 1 1 1 1 1 1 1 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +---------------------------------------------------------------+ 20|0 1| ELT |0 0| 3 | d | n | +---------------------------------------------------------------+ | s |1 0|0 0| UCS-2=1000 | 4 | +---------------------------------------------------------------+ | …–– (U+57DF) | …Ιμ (U+540D) | +---------------------------------------------------------------+ | ‡ώ© (U+7CFB) | ‡΄˜ (U+7D71) | +---------------------------------------------------------------+ |0 0| 3 | t | l | d | +---------------------------------------------------------------+ | 0 | +---------------+ +---------------------------------------------------------------+ |0 0| 4 | h | o | s | +---------------------------------------------------------------+ | t |1 1| 21 | +-----------------------------------------------+ DNSII-MDNP Multilingual Domain Name Protocol August 2000 7. Intellectual Property Considerations It is the intention of Neteka to submit the DNSII protocol and other elements of the multilingual domain name server software to IETF for review, comment or standardization. Neteka Inc. has applied for one or more patents on the technology related to multilingual domain name server software and multilingual email server software suite. If a standard is adopted by IETF and any patents are issued to Neteka with claims that are necessary for practicing the standard, any party will be able to obtain the right to implement, use and distribute the technology or works when implementing, using or distributing technology based upon the specific specifications under fair, reasonable and non-discriminatory terms. 8. References [RFC1700] J. Reynolds, J. Postel, "ASSIGNED NUMBERS", RFC 1700, October 1994. [ISO10646] ISO/IEC 10646-1:2000. International Standard -- Information technology -- Universal Multiple-Octet Coded Character Set (UCS) [RFC1034] Mockapetris, P., "Domain Names - Concepts and Facilities," STD 13, RFC 1034, USC/ISI, November 1987 [RFC1035] Mockapetris, P., "Domain Names - Implementation and Specification," STD 13, RFC 1035, USC/ISI, November 1987 [RFC2119] S. Bradner, "Key words for use in RFCs to Indicate Requirement Levels," RFC 2119, March 1997 [RFC2130] C. Weider, et al. _The Report of the IAB Character Set Workshop held 29 February - 1 March, 1996_ RFC 2130, April 1997 [RFC2277] H. Alvestrand, _IETF Policy on Character Sets and Languages_ RFC 2277, January 1998 [RFC2671] Paul Vixie, "Extension Mechanisms for DNS (EDNS0)", August 1999, RFC 2671. Authors: Edmon Chung Neteka Inc. DNSII-MDNP Multilingual Domain Name Protocol August 2000 2462 Yonge St. Toronto, Ontario, Canada M4P 2H5 edmon@neteka.com David Leung Neteka Inc. 2462 Yonge St. Toronto, Ontario, Canada M4P 2H5 david@neteka.com