Internet Draft Yoshiro Yoneya draft-ietf-idn-ace-eval-jp-00.txt JPNIC Jun 26, 2000 Expires Dec 26, 2001 Evaluation of various ACEs with existing Japanese Domain Names Status of this memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract The ACE (ASCII Compatible Encoding) Design Team in IDN Working Group is working on choosing a good ACE proposal. To prompt the ACE Design Team work, this document explains about result of applying various ACEs to existing Japanese Domain Names, proposes evaluation criteria, and considers which ACE is appropriate to Japanese Domain Names. 1. Evaluation data outline The sample data of this evaluation was over 50000 existing Japanese Domain names under JP ccTLD that were registered during 22 Feb 2001 to 23 Mar 2001. The Japanese Domain Names involves a lot of following names that are familiar to Japanese. - Company names - Registered names - University names - Personal names The Japanese Domain Name consists from combination one or more Japanese Characters defined in [JPCHAR] and zero or more LDH (Letters, Digits, Hyphens) defined in [RFC1035]. Number of characters of the Japanese Domain Name are 1 through 15 due to restriction of JP ccTLD registration regulation. The Japanese Domain Names are normalized according to [NAMEPREP] and [JPCHAR] at the registration time. Following graph shows distribution of number of characters of sample data. N 15|** u 14|*** m 13|**** b 12|****** e 11|******* r 10|************ 9|************** o 8|********************** f 7|************************* 6|******************************** c 5|******************************** h 4|************************************************ a 3|*************************** r 2|********************** s 1|** +----+----+----+----+----+----+----+----+----+---- 1 2 3 4 5 6 7 8 9 (x1000) Number of Japanese Domain Names Applied ACEs are RACE-03 [RACE], BRACE-00 [BRACE], LACE-01 [LACE], UTF-6-00 [UTF6], DUDE-02 [DUDE], AMC-ACE-M-00 [AMCACEM], AltDUDE-01 [AltDUDE], AMC-ACE-O-00 [AMCACEO], AMC-ACE-R-01 [AMCACER], AMC-ACE-V-00 [AMCACEV], AMC-ACE-W-00 [AMCACEW] and MACE-00 [MACE]. JPNIC's mDNkit-2.2 [MDNKIT] will provide implementation of all of them. Of course, current version of mDNkit, mDNkit-2.1, provides most of them. This evaluation was done with mDNkit-2.2 snapshot. The process for creating evaluation data was following. 1) Applies Japanese Domain Names to each ACE and measures conversion time. 2) Extracts ACE signature from resulting string. 3) Compares resulting string length with original Japanese Domain Name string length. 4) Calculates statistical values. 2. Result of each ACE In this section, CHARS indicates original Japanese Domain Name string length (by means of characters), MAX and MIN indicates worst and best case of resulting string length (by means of octets) respectively, MEAN indicates mean of resulting string length, VAR indicates variance of resulting string length and SLOPE indicates slope of regression line of MAX, MIN and MEAN. 2.1 RACE CHARS 1 2 3 4 5 6 7 8 MAX 5 8 12 15 18 21 24 28 MIN 4 5 7 8 10 12 13 15 MEAN 4.01 7.63 9.80 12.06 13.73 16.23 17.53 21.81 VAR 0.01 0.96 6.14 11.86 15.58 19.73 28.47 41.14 CHARS 9 10 11 12 13 14 15 SLOPE MAX 31 34 37 40 44 47 50 3.21 MIN 16 18 20 21 23 24 26 1.59 MEAN 23.71 27.14 29.10 32.27 34.72 37.36 39.71 2.53 VAR 54.12 60.27 69.14 82.95 103.23 122.69 138.72 2.2 BRACE CHARS 1 2 3 4 5 6 7 8 MAX 4 7 10 14 17 20 23 26 MIN 4 5 7 8 9 11 12 14 MEAN 4.00 6.78 8.62 11.32 12.66 14.34 15.62 18.86 VAR 0.00 0.36 2.11 7.55 9.06 13.75 19.04 24.83 CHARS 9 10 11 12 13 14 15 SLOPE MAX 30 33 36 39 42 46 49 3.21 MIN 15 16 17 19 21 22 24 1.40 MEAN 20.27 23.31 24.16 26.72 27.62 29.72 30.84 1.93 VAR 32.43 40.05 42.54 54.13 48.78 57.94 47.31 2.3 LACE CHARS 1 2 3 4 5 6 7 8 MAX 5 8 12 15 18 21 24 28 MIN 5 7 8 10 12 13 15 16 MEAN 5.00 7.88 10.25 12.85 14.72 16.52 18.27 21.74 VAR 0.00 0.11 3.94 5.90 8.39 13.91 15.86 30.83 CHARS 9 10 11 12 13 14 15 SLOPE MAX 31 34 37 40 44 47 50 3.21 MIN 18 20 21 23 24 26 28 1.61 MEAN 23.58 26.55 27.99 31.01 32.29 34.83 35.74 2.23 VAR 33.45 36.77 46.60 51.81 63.82 69.28 60.90 2.4 UTF6 CHARS 1 2 3 4 5 6 7 8 MAX 4 8 12 16 20 24 28 32 MIN 4 6 6 10 13 14 16 18 MEAN 4.00 7.86 10.64 13.86 16.18 19.20 21.45 25.79 VAR 0.00 0.14 2.23 6.05 11.77 19.51 27.94 40.51 CHARS 9 10 11 12 13 14 15 SLOPE MAX 36 40 44 48 52 56 60 4.00 MIN 21 22 25 27 29 31 33 2.12 MEAN 28.66 32.73 35.25 39.46 42.24 45.75 48.20 3.17 VAR 53.18 66.86 84.41 98.58 120.62 141.40 170.42 2.5 DUDE CHARS 1 2 3 4 5 6 7 8 MAX 4 8 12 16 20 24 28 32 MIN 4 5 6 7 8 9 12 14 MEAN 4.00 7.57 9.75 12.69 14.01 16.45 18.22 21.94 VAR 0.00 0.57 3.82 7.93 10.31 15.55 18.58 29.66 CHARS 9 10 11 12 13 14 15 SLOPE MAX 36 40 44 48 52 56 58 3.95 MIN 15 17 19 20 23 25 28 1.70 MEAN 23.55 26.67 28.15 30.96 32.38 34.71 35.91 2.29 VAR 30.62 41.07 41.05 47.99 45.43 49.33 36.39 2.6 AMC-ACE-M CHARS 1 2 3 4 5 6 7 8 MAX 4 8 12 15 18 21 24 27 MIN 4 5 6 7 8 10 12 14 MEAN 4.00 7.57 9.30 11.71 13.04 15.17 16.78 19.83 VAR 0.00 0.58 3.05 5.47 6.69 9.36 11.06 16.02 CHARS 9 10 11 12 13 14 15 SLOPE MAX 30 33 36 38 42 44 47 3.01 MIN 15 17 18 20 22 23 26 1.58 MEAN 21.40 24.06 25.64 28.08 29.49 31.68 33.08 2.05 VAR 17.30 21.74 23.11 26.29 26.49 29.50 24.81 2.7 AltDUDE CHARS 1 2 3 4 5 6 7 8 MAX 4 8 12 16 20 24 28 32 MIN 4 5 6 7 8 9 12 14 MEAN 4.00 7.57 9.75 12.69 14.01 16.45 18.22 21.94 VAR 0.00 0.57 3.82 7.93 10.31 15.55 18.58 29.66 CHARS 9 10 11 12 13 14 15 SLOPE MAX 36 40 44 48 52 56 58 3.95 MIN 15 17 19 20 23 25 28 1.70 MEAN 23.55 26.67 28.15 30.96 32.38 34.71 35.91 2.29 VAR 30.62 41.07 41.05 47.99 45.43 49.33 36.39 2.8 AMC-ACE-O CHARS 1 2 3 4 5 6 7 8 MAX 4 8 12 16 20 24 27 31 MIN 4 5 6 7 8 10 12 14 MEAN 4.00 7.58 9.62 12.33 13.59 15.81 17.46 20.94 VAR 0.00 0.55 3.76 7.75 9.82 14.57 17.65 27.78 CHARS 9 10 11 12 13 14 15 SLOPE MAX 35 39 42 46 50 53 57 3.77 MIN 15 17 18 20 22 23 26 1.58 MEAN 22.37 25.32 26.73 29.29 30.49 32.74 33.74 2.12 VAR 28.73 37.75 37.60 43.39 40.41 44.92 36.04 2.9 AMC-ACE-R CHARS 1 2 3 4 5 6 7 8 MAX 4 8 12 16 20 24 28 32 MIN 4 5 6 7 8 10 12 14 MEAN 4.00 7.58 9.79 12.74 14.28 16.69 18.49 22.28 VAR 0.00 0.55 3.92 8.38 11.49 16.94 21.65 33.06 CHARS 9 10 11 12 13 14 15 SLOPE MAX 36 40 44 48 52 56 59 3.98 MIN 15 17 18 20 22 23 27 1.60 MEAN 23.93 27.08 28.61 31.47 32.68 34.98 36.14 2.31 VAR 36.33 45.99 48.35 57.62 54.90 60.44 48.34 2.10 AMC-ACE-V CHARS 1 2 3 4 5 6 7 8 MAX 4 8 11 15 18 20 24 26 MIN 4 6 7 8 9 12 13 15 MEAN 4.00 6.89 9.10 11.63 13.41 15.64 17.52 20.47 VAR 0.00 0.12 1.10 2.24 3.25 4.69 5.99 9.20 CHARS 9 10 11 12 13 14 15 SLOPE MAX 29 32 36 38 41 44 46 2.98 MIN 15 17 18 21 23 25 28 1.62 MEAN 22.23 24.88 26.62 29.02 30.66 32.77 34.33 2.17 VAR 10.35 13.30 14.15 17.08 16.36 17.98 14.11 2.11 AMC-ACE-W CHARS 1 2 3 4 5 6 7 8 MAX 4 8 11 15 18 21 25 28 MIN 4 6 7 8 9 11 13 14 MEAN 4.00 6.89 9.07 11.60 13.32 15.52 17.37 20.37 VAR 0.00 0.12 1.27 2.62 4.06 5.85 7.36 10.89 CHARS 9 10 11 12 13 14 15 SLOPE MAX 32 33 36 38 42 44 46 3.01 MIN 15 17 18 20 23 25 29 1.64 MEAN 22.14 24.77 26.46 28.92 30.60 32.77 34.29 2.17 VAR 12.42 15.74 16.64 19.59 19.37 21.07 16.14 2.12 MACE CHARS 1 2 3 4 5 6 7 8 MAX 4 7 10 13 16 19 22 25 MIN 4 6 7 8 9 11 13 15 MEAN 4.00 6.98 9.43 11.97 13.86 16.08 17.96 20.85 VAR 0.00 0.02 0.51 1.48 2.75 4.35 5.80 8.66 CHARS 9 10 11 12 13 14 15 SLOPE MAX 28 31 34 37 40 43 46 3.00 MIN 15 17 19 21 24 25 29 1.68 MEAN 22.67 25.26 26.98 29.36 31.16 33.29 34.78 2.19 VAR 10.16 13.20 13.97 16.65 17.02 18.12 13.97 2.13 Conversion Time Conversion time (by means of seconds) was measured by converting all sample data at once on the same machine and the same condition (without another load). To From AltDUDE 1.98 2.41 DUDE 1.99 2.43 MACE 2.03 2.58 RACE 2.08 2.57 AMC-ACE-W 2.08 2.60 UTF-6 2.11 2.71 LACE 2.12 2.89 AMC-ACE-R 2.22 2.85 BRACE 2.23 2.87 AMC-ACE-M 2.89 3.47 AMC-ACE-V 3.49 5.36 AMC-ACE-O 4.86 5.42 3. Consideration about results 3.1 MAX Following is a compiled graph of each ACE's MAX value of each CHARS. AMC-ACE-V| 1 2 3 4 5 6 7 8 9 A B C D E F 2.98 MACE| 1 2 3 4 5 6 7 8 9 A B C D E F 3.00 AMC-ACE-W| 1 2 3 4 5 6 7 8 9A B C D E F 3.01 AMC-ACE-M| 1 2 3 4 5 6 7 8 9 A B C D E F 3.01 BRACE| 1 2 3 4 5 6 7 8 9 A B C D E F 3.21 LACE| 1 2 3 4 5 6 7 8 9 A B C D E F 3.21 RACE| 1 2 3 4 5 6 7 8 9 A B C D E F 3.21 AMC-ACE-O| 1 2 3 4 5 6 7 8 9 A B C D E F 3.77 AltDUDE| 1 2 3 4 5 6 7 8 9 A B C D E F 3.95 DUDE| 1 2 3 4 5 6 7 8 9 A B C D E F 3.95 AMC-ACE-R| 1 2 3 4 5 6 7 8 9 A B C D E F 3.98 UTF-6| 1 2 3 4 5 6 7 8 9 A B C D E F 4.00 +----+----+----+----+----+----+----+----+----+----+----+----+ 5 10 15 20 25 30 35 40 45 50 55 60 X-axis is resulting string length, Y-axis is ACE name and letter in graph is number of characters in domain name. A-Z represents 10-15 respectively. Rightmost decimal represents slope of regression line. 3.2 MIN Following is a compiled graph of each ACE's MIN value of each CHARS. BRACE| 12 345 67 89AB C DE F 1.40 AMC-ACE-M| 12345 6 7 89 AB C DE F 1.58 AMC-ACE-O| 12345 6 7 89 AB C DE F 1.58 RACE| 12 34 5 67 89 A BC DE F 1.59 AMC-ACE-R| 12345 6 7 89 AB C DE F 1.60 LACE| 1 23 4 56 78 9 AB CD E F 1.61 AMC-ACE-V| 1 2345 67 9 AB C D E F 1.62 AMC-ACE-W| 1 2345 6 789 AB C D E F 1.64 MACE| 1 2345 6 7 9 A B C DE F 1.68 AltDUDE| 123456 7 89 A BC D E F 1.70 DUDE| 123456 7 89 A BC D E F 1.70 UTF-6| 1 3 4 56 7 8 9A B C D E F 2.12 +----+----+----+----+----+----+----+----+----+----+----+----+ 5 10 15 20 25 30 35 40 45 50 55 60 X-axis is resulting string length, Y-axis is ACE name and letter in graph is number of characters in domain name. A-Z represents 10-15 respectively. Rightmost decimal represents slope of regression line. 3.3 MEAN Following is a compiled graph of each ACE's MEAN value of each CHARS. BRACE| 1 2 3 45 67 8 9 AB CD EF 1.93 AMC-ACE-M| 1 2 3 4 5 67 8 9 AB CD E F 2.05 AMC-ACE-O| 1 2 3 45 6 7 8 9 AB CD EF 2.12 AMC-ACE-W| 1 2 3 4 5 6 7 8 9 A B C D E F 2.17 AMC-ACE-V| 1 2 3 4 5 6 7 8 9 A B CD E F 2.17 MACE| 1 2 3 4 5 67 8 9 AB C D EF 2.19 LACE| 1 2 3 4 5 6 7 8 9 AB CD EF 2.23 AltDUDE| 1 2 3 4 5 6 7 8 9 A B C D EF 2.29 DUDE| 1 2 3 4 5 6 7 8 9 A B C D EF 2.29 AMC-ACE-R| 1 2 3 4 5 6 7 89 AB CD E F 2.31 RACE| 1 2 3 45 67 8 9 A B C D E F 2.53 UTF-6| 1 2 3 4 5 6 7 8 9 A B C D E F 3.17 +----+----+----+----+----+----+----+----+----+----+----+----+ 5 10 15 20 25 30 35 40 45 50 55 60 X-axis is resulting string length, Y-axis is ACE name and letter in graph is number of characters in domain name. A-Z represents 10-15 respectively. Rightmost decimal represents slope of regression line. 3.4 VAR Following is a compiled graph of each ACE's VAR value of each CHARS. MACE|24 56 7 89 AF DE AMC-ACE-V|2345 67 89 AF DCE AMC-ACE-W|23 45 67 89 FB DCE AMC-ACE-M|12 3 4 5 6 7 89 AB FD E AMC-ACE-O|12 3 4 5 6 7 89 F B D C E AltDUDE|12 3 4 5 6 7 89 F B D CE DUDE|12 3 4 5 6 7 89 F B D CE BRACE|2 3 45 6 7 8 9 A B F DCE LACE|2 3 4 5 6 7 8 9 A B CFE AMC-ACE-R|12 3 4 5 6 7 8 9 A F E RACE|12 3 4 5 6 7 8 9ABC D E F UTF-6|2 3 4 5 6 7 8 9 AB C D E F ++----+----+----+----+----+----+----+----+----+----+----+----+---- 0 5 10 15 20 25 30 35 40 45 50 100 150 X-axis is value of variance, Y-axis is ACE name and letter in graph is number of characters in domain name. A-Z represents 10-15 respectively. Note that scale of X-axis over 50 is 10 times large. 3.5 Consideration Regarding to the result of each ACE, It could be said: - MIN (best case) is not significant. It is small enough. - Conversion time is significant. Difference between fastest and slowest is double. - MAX (worst case) is significant. It reflects maximum number of characters in domain name label. Many of Japanese company or organization names exceeds 15 characters. From the Japanese point of view, ACE should allow more than 15 characters. - MEAN (average) is significant. It reflects efficiency of the ACE. - VAR (variance) is significant. It also reflects efficiency of the ACE. End users easily can estimate resulting length of conversion. In other words, ACE with smaller slope of regression line of MAX and MEAN, faster in conversion and smaller in VAR is effective. Therefore, following could be additional criteria for evaluating ACEs in a certain language. 1) Length of resulting string (by means of octets) is shorter than another ACEs in MAX (worst case). 2) Value of resulting string (by means of octets) is smaller than another ACEs in MEAN (average). 3) Seconds of conversion is faster than other ACEs. 4) Value of variance of resulting string length in each number of characters. 4. Conclusion For Japanese Domain Names, by criteria described above, MACE or AMC-ACE-W is preferable. 5. References [JPCHAR] "Japanese characters in multilingual domain name labels", draft-ietf-idn-jpchar-01.txt, Mar 2001, Y Yoneya, Y Morishita [RFC1035] "DOMAIN NAMES - IMPLEMENTATION AND SPECIFICATION", RFC1034, Nov 1987, P. Mockapetris [NAMEPREP] "Preparation of Internationalized Host Names", draft-ietf-idn-nameprep-03.txt, Feb 2001, P Hoffman, M Blanchet [RACE] "RACE: Row-based ASCII Compatible Encoding for IDN", draft-ietf-idn-race-03.txt, Nov 2000, P Hoffman [BRACE] "BRACE: Bi-mode Row-based ASCII-Compatible Encoding for IDN version 0.1.2" draft-ietf-idn-brace-00.txti, Sep 2000, A Costello [LACE] "LACE: Length-based ASCII Compatible Encoding for IDN" draft-ietf-idn-lace-01.txt, Jan 2001, M Davis, P Hoffman [UTF6] "UTF-6 - Yet Another ASCII-Compatible Encoding for IDN" draft-ietf-idn-utf6-00, Nov 2000, M Welter, B Spolarich [DUDE] "Differential Unicode Domain Encoding (DUDE)" draft-ietf-idn-dude-02.txt, Jun 2001, M Welter, B Spolarich, A Costello [AMCACEM] "AMC-ACE-M version 0.1.0" draft-ietf-idn-amc-ace-m-00.txt, Feb 2001, A Costello [AltDUDE] "AltDUDE version 0.0.2" draft-ietf-idn-altdude-00.txt, Mar 2001, A Costello [AMCACEO] "AMC-ACE-O version 0.0.3" draft-ietf-idn-amc-ace-o-00.txt, Mar 2001, A Costello [AMCACER] "AMC-ACE-R version 0.2.1" draft-ietf-idn-amc-ace-r-01.txt, May 2001, A Costello [AMCACEV] "AMC-ACE-V version 0.1.0" draft-ietf-idn-amc-ace-v-00.txt, May 2001, A Costello [AMCACEW] "AMC-ACE-W version 0.1.0" draft-ietf-idn-amc-ace-w-00.txt, May 2001, A Costello [MACE] "MACE: Modal ASCII Compatible Encoding for IDN" draft-ietf-idn-mace-00.txt, Jun 2001, M Ishisone, Y Yoneya [MDNKIT] "Multilingual Domain Name tool Kit", http://www.nic.ad.jp/jp/research/idn/mdnkit/download/ 6. Acknowledgements Japan Registry Service Co., Ltd. provided really registered Japanese Domain Names. JPNIC mDNkit development team provided me new ACE implementation swiftly when it was published as Internet Draft. Tomoyuki Hasei created fundamental tools and data for evaluation. JPNIC IDN-TF members gave me a lot of advices. 7. Author's Address Yoshiro Yoneya Japan Network Information Center Fuundo Bldg 1F, 1-2 Kanda-ogawamachi Chiyoda-ku Tokyo 101-0052, Japan yone@nic.ad.jp