INTERNET-DRAFT James SENG draft-jseng-idn-admin-00.txt Kazunori KONISHI, JPNIC 6th May 2002 Kenny HUANG, TWNIC Expires 6th Nov 2002 QIAN Hualin, CNNIC KO YangWoo, PeaceNet Internationalized Domain Name Administration Guideline Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract There are many complex issues revolving around the internationalized access to domain names (IDN) such as the IDN protocol, IDN deployment, IDN transition and IDN administration. While the IDN working group focuses on the standard track specification on access to IDN, the administration guideline is also necessary to ensure a smooth deployment and transition. This document provides a guideline for all zone administrators, including but not limited to registry/registrars operators and all domain names holders on the administration of these domain names. Comments on this document can send to the authors at idn-admin@jdna.jp. Definitions Unless otherwise stated, the definition of the terms used in this document is consistent with ‚Ç£Terminology Used in Internationalization in the IETF‚Ç¥ [I18NTERMS]. Locale is defined as language from a region, if applicable. RFC3066 [RFC3066] defines how locale should be represented. Characters mentioned in this document are identified by their position or code point in the ISO/IEC 10646/Unicode. The notation U+12AB, for example, indicates the character at the position 12AB (hexadecimal) in the ISO/IEC 10646/Unicode. 1. Introduction Internationalized Domain Names (IDN) is a one of the most controversial task IETF have taken on in recent years. Domain name is the fundamental naming architecture of the Internet; many internet protocols and applications rely on the stability and continuity of DNS. The introduction of IDN amplifies the difficulty of putting names into identifiers and the confusion between scripts and languages. It impacts many internet protocols/applications and creates more complexity to the technical administration and services. While the IDN working group [IDN-WG] focuses on the technical problems of IDN, the administration guideline is also important in order to avoid unnecessary domain name dispute between domain names holders. This is the main purpose of this guideline. The IDN working group has completed working group last call for the following internet-drafts: 1. Internationalizing Host Names In Applications [IDNA] 2. Punycode version 0.3.3 [PUNYCODE] 3. Preparation of Internationalized Strings [STRINGPREP] 4. A Stringprep Profile for Internationalized Domain Names [NAMEPREP] This set of drafts proposes that the domain names system infrastructure remains unchanged. Instead, it introduces internationalization (I18N) only on client side (IDNA) using an ASCII Compatible Encoding (ACE) known as Punycode. Domain names were also normally case-insensitive. But with the introduction of characters beyond the [US-ASCII], and the possibility to represent a single character in multiple ways in ISO10646/Unicode [UNICODE], a normalization process for these identifiers known as Nameprep has been proposed. Nameprep is also done on the client side as described in IDNA. While Nameprep normalizes domain name so that the users have the highest chance getting the right domain name, in the interest of I18N, Nameprep does not handle any localization (L10N). This become significant when a domain name holder attempts to put a string of I18N characters forming a ‚Ç£name‚Ç¥ or ‚Ç£word‚Ç¥ or ‚Ç£phrase‚Ç¥ that may have certain meaning in a certain language as a domain name. Such string of I18N characters may have different variants in the context of the language or culture or locale. Generally, these localized variants can be classified into four categories [C2C]: (Please see ‚Ç£Disclaimer‚Ç¥ below) a. Character (or Code) variants Character (or Code) variants refer to variants that are generated by character-by-character (or code-by-code) substitution. An example in English would be A/a (U+0041/U+0061). An example in Chinese would be ‰ú¢/‰ú¤ (U+98DB/U+98DE) or †¨–/†£¦ (U+6A5F/U+673A). Note that this does not mean U+6A5F/U+673A is bicameral like A/a ‚Çô it is only true for Chinese but not Japanese. It is possible that character variant may be corresponding to null. For example, points and vowels characters in Hebrew (U+05B0 to U+05C4) and Arabic (U+064B to U+0652) are optional. Code variants may also occur when there different code points are assigned a ‚Ç£same‚Ç¥ character, possibility due to compatibility issues, type face differences or script range. For example, LATIN CAPITAL LETTER A (U+0041) looks similar to GREEK CAPTIAL LETTER A (U+0391). CJK have font variants for compatibility (U+4E0D/U+F967) and ‚Ç£zVariant‚Ç¥ U+5154/U+514E (*). The difficulty is defining what characters are ‚Ç£same‚Ç¥ and what are not. b. Orthographic variants Orthographic variants refer to variants that are generated by word-by- word substitution. An example in English would be color/colour. Some of these orthographic variants may be possible to be generated by character variants. For example airplane in Chinese ‰ú¢†¨–/‰ú¤†£¦ (U+98DB U+6A5F/U+98DE U+673A). Other orthographic variants may not be generate by character variants. For example, in Chinese, ‚Ç£‡Ö­‚Ç¥ (U+767C) and ‚Ç£‰½«‚Ç¥ (U+9AEE) are both related to ‚Ç£…Åæ‚Ç¥ (U+53D1) depending on the word. For hair, ‚Ç£…ñ³…Åæ‚Ç¥ (U+5934 U+53D1), the variant should be ‚Ç£‰á¡‰½«‚Ç¥ (U+982D U+9AEE) but not ‚Ç£‰á¡‡Ö­‚Ç¥ (U+982D U+767C). c. Lexemic variants Lexemic variants refer to variants that can be generated by word-by- word substitution with locale consideration. An example in English would be cab/taxi, or check/cheque. An example in Chinese would be ˆþ爿è/„Àí†ü¯ (U+8CC7 U+8A0A/U+4FE1 U+606F). Note that there is no relationship between U+8CC7/U+4FE1 or U+8A0A/U+606F. d. Contextual variants Contextual variants refer to variants that are generated by word-by- word substitution with contextual consideration. In English, the word ‚Ç£plane‚Ç¥ have different meanings and could be substituted with different equivalent word such as ‚Ç£airplane‚Ç¥ or ‚Ç£plane‚Ç¥ (as in a flat-surface) depending on context. Similarly, the word ‚Ç£†ûç„©µ‚Ç¥ (U+6587 U+4EF6) could be either document ‚Ç£†ûç „©µ‚Ç¥ (U+6587 U+4EF6) or data file ‚Ç£†¬ö†íê‚Ç¥ (U+6A94 U+6848) depending on context. Although domain name was designed to be an identifier without any language context, it has not stop users putting ‚Ç£words‚Ç¥ or ‚Ç£names‚Ç¥ into domain names. It is foreseeable that users will do likewise with IDN. Therefore, precautions will be required to deploy these IDN. The intention of the guideline is to provide a mechanism to deploy IDN with language context only at the category of character variant to increase the possibility of successful resolution and reduced confusion. Note: * The variants in CJK are very complex and require many different layers of solution. This guideline is a one of the solution component, but not sufficient to solve the whole problem alone. 2. Administration Framework Zone administrators are responsible for the administration of the domain names under their control. Zone administrator could be responsible for a large zone such as a Top Level Domain (TLD), generic or country code, or a smaller one like a second level or third level. A large zone would have more complexity then a smaller one but the administration tasks such as addition, deletion, delegation and transfer of zones between domain name holders are similar for all zone administrators. Different zone also have different policies and processes. For example, a pay-per-domain policy and registry/registrar model for .COM may not be applicable to other zone such as .SG or .IBM.COM. The latter is likely to have restricted policies of who can have a zone under IBM.COM and the procedure is very different. Understanding these differences, this document provides only guideline of how I18N characters with locale consideration should be handled within a zone and how these IDN should be administrated (registration, deletion and transfer). Policies of IDN such as new TLD or cost are out of scope for the document. Such discussions should be conduct in other forum outside IETF. Technical implementations are also out of scope. Zone administrators have to decide where (registry or registrar side) and how to implement this guideline. 2.1. Guideline Principles The principles provided are for a single zone on a per-label basis. The word ‚Ç£IDN‚Ç¥ should be more read as ‚Ç£domain name label‚Ç¥ and not ‚Ç£Fully Quantifiable Domain Name‚Ç¥. The document also assumed that ‚Ç£First-Come-First-Serve‚Ç¥ (FCFS) is used to determine the rights of domain name holders although it is not one of the principles. If FCFS is not used, then replace all FCFS with an appropriate policy for the zone. (a) Each IDN should be associated with a set of locales. Although some IDN may be pure identifiers made up of random selection of characters, IDN are likely to be names or phrases that have certain meaning in some locale. Zone administrators should associate a locale to each IDN administratively, either pre-determined by the zone administrator or chosen by the domain names holders. IDN could also have multiple locales association or no locale association but these are not recommended. With a locale association, the zone administrator could also verify the validity of the IDN requested. (b) The domain name holder of an IDN should also have all character variants, depending on the associated locale(s), of the IDN requested. Depending on the associated locale(s), there are different character variants for the IDN. To minimize the domain names dispute between holders over similar IDNs, these character variants should be reserved. Reserved IDNs are not inside the DNS zone file. In other words, these reserved names do not resolve. Domain name holder could request these reserved IDN to be inside the zone file, i.e. make the reserve names active. In the case whereby there are overlapping reserved names, then the reserved names should be resolve with the same registration policy, usually based on FCFS. (c) Some IDN may have a preferred character variant that should be recommended to the domain name holder. Some locale rules may prefer certain character variant over others. To increase the end-user chance of resolution of the IDN, the preferred variant should be active. (d) The IDN and its reserved character variants with the locale(s) association should be atomic. The IDN and its reserved character variants with the locale(s) association should be contain with a single package (‚Ç£IDN Package‚Ç¥). The IDN Package is created upon registration. The IDN Package is atomic ‚Çô Transfer and deletion of IDN are done with IDN Package as a whole. IDNs, either active or reserved, within the IDN Package must not be transfer or deleted individually. 2.2. Registration of IDN Conformance to the principles described in 2.1, the registration of IDN would require at least two components, character variant tables for the locale and the registration algorithm. 2.2.1. Locale character variant table Every locale group should provide a character variant table. The table should be generated based on an establish language standards, documenting its references. For example, Reference 1: CP936 or commonly known as GBK Reference 2: zVariant, zTradVariant, zSimpVariant in Unihan.txt Reference 3: List of Simplified character Table (Simplified column) Reference 4: zSimpVariant in Unihan.txt Reference 5: variant that exists in GB2312, common simplified hanzi The table has three fields, separated by a semicolon: ‚Ç£valid code point‚Ç¥, ‚Ç£recommended code point‚Ç¥ and ‚Ç£character variant(s)‚Ç¥. Only code points listed in the ‚Ç£valid code point‚Ç¥ are allowed to be registered in the language. There can be at most one ‚Ç£recommended code point‚Ç¥. If the ‚Ç£recommended code point‚Ç¥ column is empty, then the code point would be recommended to be ‚Ç£null‚Ç¥. By default, ‚Ç£character variant(s)‚Ç¥ always include the ‚Ç£valid code point‚Ç¥. If the variant is composed of a series of code points, then each code point is should be listed in the appropriate order separated by a space in the ‚Ç£character variant(s)‚Ç¥. If there are multiple variants, each variant must be separated by a comma in the ‚Ç£character variant(s)‚Ç¥. It is possible that a code point in the ‚Ç£character variant(s)‚Ç¥ may not be allowed to be registered in the locale. Every code point in the table should have a corresponding reference number (associated with the references) specified for justification. Reference number is place in round bracket after each code point. If there is more than one reference, then each number place in the round bracket separated by a space. Any content after hash ‚Ç£#‚Ç¥ are treated as comment. This document does not define any locale variants tables. Each locale group will have to supplement their documents including the rules which derived their tables. 2.2.2. Registration Algorithm 1. IN = IDN and {L} = Set of IN associated locale(s) 2. NP(IN) = Nameprep processed IN and check for availability of NP(IN) 3. For each AL in {L} 3.1. Check validity of NP(IN) in AL. If failed, stop processing. 3.2. PV(IN,AL) = Preferred character variant of IN in AL 3.3. RV(IN,AL) = Set of character variants of IN in AL 3.4. End of Loop 4. {ZV} = Set of all PV(IN,AL) + NP(IN) 5. {RV} = Set of all RV(IN,AL) (all character variants) minus {ZV} 6. Create IDN Package for IN using IN, {L}, {ZV} and {RV} 7. Put {ZV} into zone file Step 1 takes the IDN to be registered and the associated locale(s) as input to the process. Following that, the IDN goes through Nameprep in Step 2. If the Nameprep‚ÇÖed IDN is already registered or reserved, then IDN cannot be registered based on FCFS. Step 3 goes through all associated language with IDN and check for the validity in each language, generate the recommended variant and the reserved variants. In step 3.1, validation for IDN are done by checking every code point in Nameprep‚ÇÖed IDN is a code point allowed by ‚Ç£valid code point‚Ç¥ column for the ‚Ç£character variant table‚Ç¥ of the language. Step 3.2 generates the preferred variant of the IDN by replacing every codepoint in the IDN with the associated ‚Ç£recommended code point‚Ç¥ column, followed by Nameprep. If the preferred variant of the IDN is registered or reserved, then there is no preferred variant for that language based on FCFS. However, this does not prevent IDN from being registered. Step 3.3 generates the lists of reserved variants by doing a permutation of all the possible variants listed in ‚Ç£character variant(s)‚Ç¥ column for each code point in the Nameprep‚ÇÖed IDN. Generated variants should be also Nameprep‚ÇÖed. If any of the variants are registered or reserved, then that variant must be removed from the list based on FCFS. Similarly, this does not prevent IDN from being registered. Then an ‚Ç£IDN Package‚Ç¥ for IDN is created in Step 6 with the original IDN, the associated language(s), all the list of activated IDNs (Step 4) and the list of variants (Step 5). Lastly, the activated IDNs is then put into the zone file and delegated. It may be delegated to different domain name server so long it is owned by the same domain name holder. 2.3. Deletion and Transfer of IDN In normal domain administration, every domain name is atomic. Registration, deletion and transfer of domain names is done on a per domain name basis. However, with IDN, each domain name is tied with a list of variants domain names, depending of the locale association, tied together in an IDN Package. Because all variants of the IDN should belong to a single domain name holder, the IDN Package should be atomic. IDN, either active or registered, within the IDN must not be deleted or transfer on its own. If IDN is to be deleted or transfer, it must be done as IDN Package. 2.4. Activation and De-activation of IDN variants With the introduction of IDN Package with active and inactive IDN, a new process is required to activate or de-active IDN variants in the IDN Package. The activation algorithm is described below: 1. IN = IDN & PA = IDN Package 2. NP(IN) = Nameprep processed IN 3. If NP(IN) not in {RV} then stop 4. {RV} = {RV} ‚Çô NP(IN) and {ZV} = {ZV} + NP(IN) 5. Put {ZV} into the zone file Similarly, the deactivation algorithm: 1. IN = IDN & PA = IDN Package 2. NP(IN) = Nameprep processed IN 3. If NP(IN) not in {ZV} then stop 4. {RV} = {RV} + NP(IN) and {ZV} = {ZV} ‚Çô NP(IN) 5. Put {ZV} into the zone file 2.5. Adding/Deleting locale(s) association The list of variants is generated from the IDN and locale(s) association. If there is a change in the locale(s) association, then the list of variants has to be update. On the other hand, the IDN Package is atomic and the list of variants should not be changed after creation. Therefore, to add or delete locale(s) association from the IDN Package, the document recommends deleting the IDN Package followed by a registration with the new set of locales. 3. Example of Guideline Adoption To provide a meaningful example, some locale character variant tables have to be defined. Assuming there the following four locale character variants tables are defined: a) locale character variants tables for zh-cn and zh-sg Reference 1: CP936 or commonly known as GBK Reference 2: zVariant, zTradVariant, zSimpVariant in Unihan.txt [UNIHAN] Reference 3: List of Simplified character Table (Simplified column) Reference 4: zSimpVariant in Unihan.txt Reference 5: variant that exist in GB2312, common simplified hanzi 56E2(1);56E2(5);5718(2) # sphere, ball, circle; mass, lump 5718(1);56E2(4);56E2(2),56E3(2) # sphere, ball, circle; mass, lump 60F3(1);60F3(5); # think, speculate, plan, consider 654E(1);6559(5);6559(2) # teach 6559(1);6559(5);654E(2) # teach, class 6DF8(1);6E05(5);6E05(2) # clear 6E05(1);6E05(5);6DF8(2) # clear, pure, clean; peaceful 771E(1);771F(5);771F(2) # real, actual, true, genuine 771F(1);771F(5);771E(2) # real, actual, true, genuine 8054(1);8054(3);806F(2) # connect, join; associate, ally 806F(1);8054(3);8054(2),8068(2) # connect, join; associate, ally 96C6(1);96C6(5); # assemble, collect together b) locale variants table for zh-tw Reference 1: CP950 or commonly known as BIG5 Reference 2: zVariant, zTradVariant, zSimpVariant in Unihan.txt Reference 3: List of Simplified Character Table (Traditional column) Reference 4: zTradVariant in Unihan.txt Reference 5: reference itself 5718(1);5718(4);56E2(2),56E3(2) # sphere, ball, circle; mass, lump 60F3(1);60F3(5); # think, speculate, plan, consider 6559(1);6559(5);654E(2) # teach, class 6E05(1);6E05(5);6DF8(2) # clear, pure, clean; peaceful 771F(1);771F(5);771E(2) # real, actual, true, genuine 806F(1);806F(3);8054(2),8068(2) # connect, join; associate, ally 96C6(1);96C6(5); # assemble, collect together c) locale variants table for ja Reference 1: CP932 or commonly known as Shift-JIS Reference 2: zVariant in Unihan.txt Reference 3: variant that exist in JIS X0208, commonly used Kanji Refernece 4: reference itself 5718(1);5718(3);56E3(2) # sphere, ball, circle; mass, lump 60F3(1);60F3(3); # think, speculate, plan, consider 654E(1);6559(3);6559(2) # teach 6559(1);6559(3);654E(2) # teach, class 6DF8(1);6E05(3);6E05(2) # clear 6E05(1);6E05(3);6DF8(2) # clear, pure, clean; peaceful 771E(1);771E(4);771F(2) # real, actual, true, genuine 771F(1);771F(4);771E(2) # real, actual, true, genuine 806F(1);806F(4);8068(2) # connect, join; associate, ally 96C6(1);96C6(3); # assemble, collect together d) locale variants table for ko Reference 1: CP949 or commonly known as EUC-KR Reference 2: zVariant in Unihan.txt Reference 3: reference itself 5718(1);56E2(3);56E3(2) # sphere, ball, circle; mass, lump 60F3(1);60F3(3); # think, speculate, plan, consider 654E(1);6559(3);6559(2) # teach 6DF8(1);6E05(3);6E05(2) # clear 771E(1);771F(3);771F(2) # real, actual, true, genuine 806F(1);8054(3);8068(2) # connect, join; associate, ally 96C6(1);96C6(3); # assemble, collect together (Note that these tables or the rules that define these tables are not official, nor is it a sample sniplet of the real table. The tables are only as an illustration.) Example 1: IDN = †¹à‡£–†òÖ (U+6E05 U+771F U+6559) {L} = {zh-cn, zh-sg, zh-tw} NP(IN) = †¹à‡£–†òÖ (U+6E05 U+771F U+6559) PV(IN,zh-cn) = †¹à‡£–†òÖ (U+6E05 U+771F U+6559) PV(IN,zh-sg) = †¹à‡£–†òÖ (U+6E05 U+771F U+6559) PV(IN,zh-tw) = †¹à‡£–†òÖ (U+6E05 U+771F U+6559) {ZV} = {†¹à‡£–†òÖ (U+6E05 U+771F U+6559)} {RV} = {†¹à‡£¤†òÖ (U+6E05 U+771E U+6559), †¹à‡£¤†òÄ (U+6E05 U+771E U+654E), †¹à‡£–†òÄ (U+6E05 U+771F U+654E), †¸¹‡£¤†òÖ (U+6DF8 U+771E U+6559), †¸¹‡£¤†òÄ (U+6DF8 U+771E U+654E), †¸¹‡£–†òÖ (U+6DF8 U+771F U+6559), †¸¹‡£–†òÄ (U+6DF8 U+771F U+654E)} Example 2: IDN = †¹à‡£–†òÖ (U+6E05 U+771F U+6559) {L} = {ja} NP(IN) = †¹à‡£–†òÖ (U+6E05 U+771F U+6559) PV(IN,ja) = †¹à‡£–†òÖ (U+6E05 U+771F U+6559) {ZV} = {†¹à‡£–†òÖ (U+6E05 U+771F U+6559)} {RV} = {†¹à‡£¤†òÖ (U+6E05 U+771E U+6559), †¹à‡£¤†òÄ (U+6E05 U+771E U+654E), †¹à‡£–†òÄ (U+6E05 U+771F U+654E), †¸¹‡£¤†òÖ (U+6DF8 U+771E U+6559), †¸¹‡£¤†òÄ (U+6DF8 U+771E U+654E), †¸¹‡£–†òÖ (U+6DF8 U+771F U+6559), †¸¹‡£–†òÄ (U+6DF8 U+771F U+654E)} Example 3: IDN = †¹à‡£–†òÖ (U+6E05 U+771F U+6559) {L} = {zh-cn, zh-sg, zh-tw, ja, ko} NP(IN) = †¹à‡£–†òÖ (U+6E05 U+771F U+6559) Invalid registration because U+6E05 is invalid in L = ko Example 4: IDN = ˆü¯†âþ‰¢å…£ÿ (U+806F U+60F3 U+96C6 U+5718) {L} = {zh-cn, zh-sg, zh-tw} NP(IN) = ˆü¯†âþ‰¢å…£ÿ (U+806F U+60F3 U+96C6 U+5718) PV(IN,zh-cn) = ˆüö†âþ‰¢å…¢ó (U+8054 U+60F3 U+96C6 U+56E2) PV(IN,zh-sg) = ˆüö†âþ‰¢å…¢ó (U+8054 U+60F3 U+96C6 U+56E2) PV(IN,zh-tw) = ˆü¯†âþ‰¢å…£ÿ (U+806F U+60F3 U+96C6 U+5718) {ZV} = {ˆüö†âþ‰¢å…¢ó (U+8054 U+60F3 U+96C6 U+56E2), ˆü¯†âþ‰¢å…£ÿ (U+806F U+60F3 U+96C6 U+5718)} {PV} = {ˆüö†âþ‰¢å…¢ú (U+8054 U+60F3 U+96C6 U+56E3), ˆüö†âþ‰¢å…£ÿ (U+8054 U+60F3 U+96C6 U+5718), ˆü¯†âþ‰¢å…¢ó (U+806F U+60F3 U+96C6 U+56E2), ˆü¯†âþ‰¢å…¢ú (U+806f U+60F3 U+96C6 U+56E3), ˆü¿†âþ‰¢å…¢ó (U+8068 U+60F3 U+96C6 U+56E2), ˆü¿†âþ‰¢å…¢ú (U+8068 U+60F3 U+96C6 U+56E3), ˆü¿†âþ‰¢å…£ÿ (U+8068 U+60F3 U+96C6 U+5718) Example 4: IDN = ˆüö†âþ‰¢å…¢ó (U+8054 U+60F3 U+96C6 U+56E2) {L} = {zh-cn, zh-sg} NP(IN) = ˆü¯†âþ‰¢å…£ÿ (U+806F U+60F3 U+96C6 U+5718) PV(IN,zh-cn) = ˆüö†âþ‰¢å…¢ó (U+8054 U+60F3 U+96C6 U+56E2) PV(IN,zh-sg) = ˆüö†âþ‰¢å…¢ó (U+8054 U+60F3 U+96C6 U+56E2) {ZV} = {ˆüö†âþ‰¢å…¢ó (U+8054 U+60F3 U+96C6 U+56E2)} {PV} = {ˆüö†âþ‰¢å…¢ú (U+8054 U+60F3 U+96C6 U+56E3), ˆüö†âþ‰¢å…£ÿ (U+8054 U+60F3 U+96C6 U+5718), ˆü¯†âþ‰¢å…¢ó (U+806F U+60F3 U+96C6 U+56E2), ˆü¯†âþ‰¢å…¢ú (U+806f U+60F3 U+96C6 U+56E3), ˆü¯†âþ‰¢å…£ÿ (U+806F U+60F3 U+96C6 U+5718), ˆü¿†âþ‰¢å…¢ó (U+8068 U+60F3 U+96C6 U+56E2), ˆü¿†âþ‰¢å…¢ú (U+8068 U+60F3 U+96C6 U+56E3), ˆü¿†âþ‰¢å…£ÿ (U+8068 U+60F3 U+96C6 U+5718)} Example 5: IDN = ˆüö†âþ‰¢å…¢ó (U+8054 U+60F3 U+96C6 U+56E2) {L} = {zh-cn, zh-sg, zh-tw} NP(IN) = ˆüö†âþ‰¢å…¢ó (U+8054 U+60F3 U+96C6 U+56E2) Invalid registration because U+8054 is invalid in L = zh-tw Example 6: IDN = ˆü¯†âþ‰¢å…£ÿ (U+806F U+60F3 U+96C6 U+5718) {L} = {ja,ko} NP(IN) = ˆü¯†âþ‰¢å…£ÿ (U+806F U+60F3 U+96C6 U+5718) PV(IN,ja) = ˆü¯†âþ‰¢å…£ÿ (U+806F U+60F3 U+96C6 U+5718) PV(IN,ko) = ˆü¯†âþ‰¢å…£ÿ (U+806F U+60F3 U+96C6 U+5718) {ZV} = {ˆü¯†âþ‰¢å…£ÿ (U+806F U+60F3 U+96C6 U+5718)} {PV} = {ˆü¯†âþ‰¢å…¢ú (U+806F U+60F3 U+96C6 U+56E3), ˆü¿†âþ‰¢å…£ÿ (U+8068 U+60F3 U+96C6 U+5718), ˆü¿†âþ‰¢å…¢ú (U+8068 U+60F3 U+96C6 U+56E3)} While the guideline uses examples from zh-cn, zh-tw, zh-sg, ja, and ko, this can be applied to other locales. 4. Other Issues It is possible that many variants generated may have no meaning in the language or locale. The intention is not to generate meaningful ‚Ç£words‚Ç¥ but to generate similar variants to be reserved. The Locale Character Variants tables are critical to the success of the guideline. A badly designed table may either generate too many meaningless variants or may not generate enough meaningful variants. However, the tables or the rules used to generate the tables are not within the scope of this document. This document does not recommend allowing registration of IDN in a locale have not defined its locale character variants tables. Disclaimer Every human language is unique and therefore, every linguistic and localization issue is also unique. It is difficult to apply comparison across the multiple languages or to classify them into categories. For example, to classify Traditional Chinese/Simplified Chinese as upper/lower case makes as much sense as to classify TC/SC as ‚Ç£spelling variant‚Ç¥ like ‚Ç£color‚Ç¥ and ‚Ç£colour‚Ç¥. Both are close comparison but neither are 100% correct. This document disclaims any the classification or analogy across different languages are linguistically accurate. It only attempts to provide a generic framework to a linguistically challenging problem. Unresolved Issues 1. How do we deal with updates of tables? Different version? 2. Should we have multiple recommended variant per locale? Acknowledgement The authors gratefully acknowledge the contributions of: V.Chen, N.Hsu, H.Hotta, S.Tashiro, Y.Yoneya and other Joint Engineering Team members in the JET Bangkok meeting. Yves Arrouye, an observer during JET Bangkok, for his contribution on the IDN Package. Soobok Lee L.M Tseng Patrik Faltstrom Paul Hoffman Erin Chen Author(s) James SENG Title Address Email: jseng@pobox.org.sg Kazunori KONISHI JPNIC Kokusai-Kougyou-Kanda Bldg 6F 2-3-4 Uchi-Kanda, Chiyoda-ku Tokyo 101-0047 JAPAN Phone: +81 49-278-7313 Email: konishi@jp.apan.net Kenny HUANG Title Address Email: huangk@alum.sinica.edu QIAN Hualin Title Address Email: KO YangWoo PeaceNet Yangchun P.O. Box 81 Seoul 158-600 Email: newcat@peacenet.or.kr References [I18NTERMS] Terminology Used in Internationalization in the IETF draft-hoffman-i18n-terms, Jan 2002, Paul Hoffman [RFC3066] Tags for the Identification of Languages, RFC3066, Jan 2001, H. Alvestrand [IDN-WG] IETF Internationalized Domain Names Working Group, idn@ops.ietf.org, James Seng, Marc Blanchet [IDNA] Internationalizing Domain Names in Applications, draft-ietf-idn-idna, Feb 2002, Patrik Faltstrom, Paul Hoffman, Adam M. Costella [PUNYCODE] Punycode: An encoding of Unicode for use with IDNA, draft-ietf-idn-punycode, Feb 2002, Adam M. Costella [STRINGPREP]Preparation of Internationalized Strings, draft-hoffman-stringprep, Feb 2002, Paul Hoffman, Marc Blanchet [NAMEPREP] Nameprep: A Stringprep Profile for Internationalized Domain Names, draft-ietf-idn-nameprep, Feb 2002, Paul Hoffman, Marc Blanchet [C2C] Pitfalls and Complexities of Chinese to Chinese Conversion, http://www.cjk.org/cjk/c2c/c2c.pdf, Jack Halpern, Jouni Kerman [UNIHAN] Unicode Han Database, Unicode Consortium ftp://ftp.unicode.org/Public/UNIDATA/Unihan.txt