Internet Draft M. Duerst University of Zurich Expires 30 February 1997 28 February 1997 Ruby in the Hypertext Markup Language Status of this Memo This document is an Internet-Draft. Internet-Drafts are working doc- uments of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute work- ing documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months. Internet-Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet- Drafts as reference material or to cite them other than as a "working draft" or "work in progress". To learn the current status of any Internet-Draft, please check the 1id-abstracts.txt listing contained in the Internet-Drafts Shadow Directories on ds.internic.net (US East Coast), nic.nordu.net (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific Rim). Distribution of this document is unlimited. Please send comments to the author at . This document is intended to become an informational RFC, and its contents is designed for adop- tion in other standards and specifications. The present version 01 of this draft is a minor update of version 00. It contains additions about the possibility of displaying ruby out- side the document, and the problem of ruby on elements using fixed- width display. Abstract The Hypertext Markup Language (HTML) is a markup language used to create hypertext documents that are platform independent. Initially HTML was designed primarily for Western European languages; most of the issues of basic internationalization to make HTML better usable for other languages have in the meantime been addressed. Ruby are important phonetic annotations used mainly for ideographic characters in East Asia. This document proposes markup for ruby in HTML and explains its usage. Expires 31 August 1997 [Page 1] Internet Draft Ruby in HTML 28 February 1997 Table of contents 1. Introduction ................................................... 2 1.1 General ......................................................2 1.2 Notational Conventions .......................................3 2. Syntax ......................................................... 3 2.1 The RUBY Attribute ...........................................3 2.2 Usage Limitations ............................................3 2.3 Changes to the DTD ...........................................4 2.4 Nested Attributes ............................................4 3. Guidelines for Implementation .................................. 5 4. Design Considerations .......................................... 6 Bibliography .......................................................6 Author's Address ...................................................7 1. Introduction 1.1 General The Hypertext Markup Language (HTML) [RFC1866] is a simple markup language used to create hypertext documents that are platform inde- pendent. The main features for full international use of HTML are described in [HTML-I18N]. This draft describes markup for an addi- tional feature needed for international HTML, namely ruby. Ruby are short phonetic annotations for ideographic characters used throughout East Asia. Ruby are placed at the right side of their base characters for verti- cal text, and atop for horizontal text. They are rendered with about half the size of their base characters. Ruby are used frequently in Japan in most kinds of publications, such a books and magazines, but also in China, especially in schoolbooks. With the increasing international use of the WWW, new and very bene- ficial uses of ruby can also appear. In texts stored electronically and enriched with structural markup, ruby can be very convenient for other applications than rendering. In particular, they should be of immense value for searching, indexing, and text-to-speech conversion. The name "ruby" is the name of the 5.5 point type size in British terminology; this was the size most used for ruby. In Japan, the term "furigana" is also used. Expires 31 August 1997 [Page 2] Internet Draft Ruby in HTML 28 February 1997 1.2 Notational Conventions In the examples in this document, ideographic characters are denoted as space-separated strings (words) of uppercase letters, annotation characters as words of lowercase letters. 2. Syntax 2.1 The RUBY Attribute A ruby annotation is a string of ruby characters associated with a string of characters from the base text. This association is expressed by introducing an attribute RUBY to the inline elements of HTML. Examples of inline elements are , , , and . is the generic phrase-level element. Other than car- rying attributes, it does not have any particular semantics. As ruby usually are not combined with other kinds of markup, will be used most of the time to place ruby on base characters. This is an example: KO HAYASHI 2.2 Usage Limitations The length of a group of base characters or the number of ruby char- acters per base character are not limited by this specification. However, authors and tools are requested to keep these numbers rea- sonably low. Otherwise, it will be very difficult even for a sophis- ticated renderer to construct an nice display. Also, this specifica- tion does not limit the types of base characters to which ruby can be attached, or of the types of characters that can be used as ruby. The length of a group of base characters, in the case of Japanese, will have an average of about two, with four or five characters still being common. For the number of ruby per base character, five is a number for which examples are known, but here also the average will be close to two. For both linguistic and typographic reasons, it is not possible to limit ruby to associate to single base characters. For Chinese texts annotated with Pinyin romanization, the average number of ruby per base character is closer to four; for Chinese texts with bopomofo annotations, the average number of ruby per base character is again around two. For other combinations of base charac- ters and ruby, these numbers can be different. Expires 31 August 1997 [Page 3] Internet Draft Ruby in HTML 28 February 1997 According to the formal changes to the DTD in Section 2.3, the use of the RUBY attribute is not limited to specific elements. However, content providers should take into account that adding ruby in ele- ments such as PRE, CODE, SAMP, TT, and KBD, typically rendered with a fixed-width font, may lead to distortions of the intended layout, or may be handled differently than in other elements, or even ignored. The same applies for elements such as SUP, SUB, and IMG. 2.3 Changes to the DTD This section describes the changes to the HTML DTD necessary to include the RUBY attribute. The description is based on the DTD in [HTML-I18N]. In this case, the only change necessary is to add the following text to the "attrs" DTD "Macro": RUBY CDATA #IMPLIED -- phonetic annotation for ideographs -- For other versions of HTML, other changes may be necessary. 2.4 Nested Attributes If RUBY attributes are present on several levels of nested in-line elements, then these attributes are to be considered as alternatives, and not in a cumulative way. Thus for examlpe KO HAYASHI could be interpreted as KO HAYASHI to distribute the ruby evenly over the base characters, or as KO HAYASHI to allow to split ruby correctly when breaking lines between KO and Expires 31 August 1997 [Page 4] Internet Draft Ruby in HTML 28 February 1997 HAYASHI. NOTE -- the above is designed to allow extremely sophisti- cated renderers to do high quality line breaking, and to avoid any future ambiguities as to the semantics of ruby attributes on nested elements. The author of this draft however does not know any display algorithm or software that currently is able to perform this function, and there- fore does suggest to authors that they do not use this fea- ture. 3. Guidelines for Implementation This document does not specify any particular implementation for the rendering of ruby. The following are some possibilities, listed by increasing typographic quality, with some comments. - Display ruby in-line, after their base characters, in parentheses. In this case, an option to switch off ruby display is almost mandatory, because texts with many ruby will otherwise be diffi- cult to read. For other implementations, an option to switch off ruby display may also be a good idea, but it is not as necessary as here. - In particular for low resolution displays and for documents that contain a large number of ruby for educational purposes, do not display ruby directly, but make them available in a pop-up label or at the top or bottom of the displayed text following cursor movement. - Place ruby above their base characters, with half the height of the base characters. Use fixed spacing. In case the ruby are longer than their corresponding base characters, leave some space blank after the base characters. Always keep a group of base characters and their ruby on the same line. - Same as last solution, but expand ruby proportionally in case they are shorter than their associated base characters. - In case the ruby are longer than their associated base characters, test if previous or following characters of the base text have associated ruby. If this is not the case (particularly if these characters are not ideographic), let the ruby overlap the base characters to avoid blank space. Expires 31 August 1997 [Page 5] Internet Draft Ruby in HTML 28 February 1997 - Use nested ruby attributes for highest-quality rendering including line-breaks (very difficult to implement). More strict implementation specifications with examples can be found in [JIS95]. Please note that for high typographic quality, fonts spe- cially designed for ruby should be used. 4. Design Considerations Besides the solution proposed in this document, various alternatives for ruby markup were discussed. They all turned out to be more com- plex than having ruby as an attribute, without significant additional benefits. For some more details about these proposals, please see [DUR96]. Some solutions, defining one or more elements for base characters and ruby, would have made ruby visible even by browsers not aware of the new markup. However, to provide reasonable rendering in these cases, complicated rules about the removal of parentheses would have had to be introduced. Using an attribute to indicate ruby also has the disadvantage that only the whole string of ruby, but not individual characters in it, can be given a special appearance. As it is highly unlikely that such a feature was ever used anywhere, this is not really a problem. Acknowledgements I am grateful in particular to the following persons for their advice and help: Junichiro Kida, Yasuo Kida, Tatsuo L. Kobayashi, Francois Yergeau, Gavin Nicol, Martin Brian, Rick Jellife, Taro Yamamoto; the organizers of the 8th Unicode conference and the participants of the I18N workshop at the 1996 WWW conference in Paris. Bibliography [DUR96] M.J. Duerst, "Ruby in HTML", , May 1996. [GOLD90] C. F. Goldfarb, "The SGML Handbook", Y. Rubinsky, Ed., Oxford University Press, 1990. Expires 31 August 1997 [Page 6] Internet Draft Ruby in HTML 28 February 1997 [HTML] T. Berners-Lee and D. Connolly, "Hypertext Markup Lan- guage - 2.0" (RFC1866), MIT/W3C, November 1995. [HTML-I18N] F. Yergeau, G. Nicol, G. Adams, and M. Duerst, "Inter- nationalization of the Hypertext Markup Language", Work in progress (draft-ietf-html-i18n-05.txt), August 1996. [JIS95] Japanese Industrial Standards Committee, "Line compo- sition rules for Japanese documents", Japanese Indus- trial Standard JIS X 4051-1995, October 1995 (in Japanese). Author's Address Martin J. Duerst Multimedia-Laboratory Department of Computer Science University of Zurich Winterthurerstrasse 190 CH-8057 Zurich Switzerland Tel: +41 1 257 43 16 Fax: +41 1 363 00 35 E-mail: mduerst@ifi.unizh.ch NOTE -- Please write the author's name with u-Umlaut wherever possible, e.g. in HTML as Dürst. Expires 31 August 1997 [Page 7]