Internet Draft                                               M. Duerst
<draft-duerst-ruby-00.txt>                        University of Zurich
Expires 30 February 1997                                30 August 1996


                 Ruby in the Hypertext Markup Language


Status of this Memo

   This document is an Internet-Draft.  Internet-Drafts are working doc-
   uments of the Internet Engineering Task Force (IETF), its areas, and
   its working groups. Note that other groups may also distribute work-
   ing documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six
   months. Internet-Drafts may be updated, replaced, or obsoleted by
   other documents at any time.  It is not appropriate to use Internet-
   Drafts as reference material or to cite them other than as a "working
   draft" or "work in progress".

   To learn the current status of any Internet-Draft, please check the
   1id-abstracts.txt listing contained in the Internet-Drafts Shadow
   Directories on ds.internic.net (US East Coast), nic.nordu.net
   (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific
   Rim).

   Distribution of this document is unlimited.  Please send comments to
   the author at <mduerst@ifi.unizh.ch>. This document is intended to
   become an informational RFC, and its contents is designed for adop-
   tion in other standards and specifications.


Abstract

   The Hypertext Markup Language (HTML) is a markup language used to
   create hypertext documents that are platform independent.  Initially
   HTML was designed primarily for Western European languages; most of
   the issues of basic internationalization to make HTML better usable
   for other languages have in the meantime been addressed.  Ruby are
   importannt phonetic annotations used mainly for ideographic charac-
   ters in East Asia. This document proposes markup for ruby in HTML and
   explains its usage.








                        Expires 28 February 1997        [Page 1]

Internet Draft                Ruby in HTML                28 August 1996


Table of contents

   1. Introduction ................................................... 2
     1.1 General ......................................................2
     1.2 Notational Conventions .......................................3
   2. Syntax ......................................................... 3
     2.1 The RUBY Attribute ...........................................3
     2.2 Usage Limitations ............................................3
     2.3 Changes to the DTD ...........................................4
     2.4 Nested Attributes ............................................4
   3. Guidelines for Implementation .................................. 5
   4. Design Considerations .......................................... 5
   Bibliography .......................................................6
   Author's Address ...................................................7



1. Introduction


1.1 General


   The Hypertext Markup Language (HTML) [RFC1866] is a simple markup
   language used to create hypertext documents that are platform inde-
   pendent.  The main features for full international use of HTML are
   described in [HTML-I18N]. This draft describes markup for an addi-
   tional feature needed for international HTML, namely ruby. Ruby are
   short phonetic annotations for ideographic characters used throughout
   East Asia.

   Ruby are placed at the right side of their base characters for verti-
   cal text, and atop for horizontal text. They are rendered with about
   half the size of their base characters.

   Ruby are used frequently in Japan in most kinds of publications, such
   a books and magazines, but also in China, especially in schoolbooks.
   With the increasing international use of the WWW, new and very bene-
   ficial uses of ruby can also appear.

   In texts stored electoronically and enriched with structural markup,
   ruby can be very convenient for other applications than rendering. In
   particular, they should be of immense value for searching, indexing,
   and text-to-speach conversion.

   The name "ruby" is the name of the 5.5 point type size in British
   terminology; this was the size most used for ruby. In Japan, the term
   "furigana" is also used.



                        Expires 28 February 1997        [Page 2]

Internet Draft                Ruby in HTML                28 August 1996


1.2 Notational Conventions

   In the examples in this document, ideographic characters are denoted
   as space-separated strings of uppercase letters.  Annotation charac-
   ters are denoted by lowercase letters.


2. Syntax


2.1 The RUBY Attribute

   A ruby annotation is a string of ruby characters associated with a
   string of characters from the base text. This association is
   expressed by introducing an attribute RUBY to the inline elements of
   HTML.  Examples of inline elements are <EM>, <STRONG>, <Q>, and
   <SPAN>.  <SPAN> is the generic phrase-level element. Other than car-
   rying attributes, it does not have any particular semantics. As ruby
   usually are not combined with other kinds of markup, <SPAN> will be
   used most of the time to place ruby on base characters. This is an
   examlpe:

   <SPAN RUBY="kobayashi">KO HAYASHI</SPAN>



2.2 Usage Limitations

   The length of a group of base characters or the number of ruby char-
   acters per base character are not limited by this specification.
   However, authors and tools are requested to keep these numbers rea-
   sonably low. Otherwise, it will be very difficult even for a sophis-
   ticated renderer to construct an nice display. Also, this specifica-
   tion does not limit the types of base characters to which ruby can be
   attached, or of the types of characters that can be used as ruby.

   The length of a group of base characters, in the case of Japanese,
   will have an average of about two, with four or five characters still
   being common. For the number of ruby per base character, five is a
   number for which examples are known, but here also the average will
   be close to two. For both linguistic and typographic reasons, it is
   not possible to limit ruby to associate to single base characters.
   For Chinese texts annotated with Pinyin romanization, the average
   number of ruby per base character is closer to four; for Chinese
   texts with bopomofo annotations, the average number of ruby per base
   character is again around two. For other combinations of base charac-
   ters and ruby, these numbers can be different.




                        Expires 28 February 1997        [Page 3]

Internet Draft                Ruby in HTML                28 August 1996


2.3 Changes to the DTD

   This section describes the changes to the HTML DTD necessary to
   include the RUBY attribute. The description is based on the DTD in
   [HTML-I18N].  In this case, the only change necessary is to add the
   following text to the "attrs" DTD "Macro":

        RUBY CDATA     #IMPLIED   -- phonetic annotation for ideographs --

   For other versions of HTML, other changes may be necessary.


2.4 Nested Attributes

   If RUBY attributes are present on several levels of nested in-line
   elements, then these attributes are to be considered as alternatives,
   and not in a cumulative way. Thus for examlpe

   <SPAN RUBY="kobayashi">
        <SPAN RUBY="ko">KO</SPAN>
        <SPAN RUBY="bayashi">HAYASHI</SPAN>
   </SPAN>

   could be interpreted as

   <SPAN RUBY="kobayashi">
        <SPAN>KO</SPAN>
        <SPAN>HAYASHI</SPAN>
   </SPAN>

   to distribute the ruby evenly over the base characters, or as

   <SPAN>
        <SPAN RUBY="ko">KO</SPAN>
        <SPAN RUBY="bayashi">HAYASHI</SPAN>
   </SPAN>

   to allow to split ruby correctly when breaking lines between KO and
   HAYASHI.


        NOTE -- the above is designed to allow extremely sophisti-
        cated renderers to do high quality line breaking. The
        author of this draft however does not know any display
        algorithm or software that currently is able to perform
        this function, and therefore does suggest to authors that
        they do not use this feature.




                        Expires 28 February 1997        [Page 4]

Internet Draft                Ruby in HTML                28 August 1996


3. Guidelines for Implementation

   This document does not specify any particular implementation for the
   rendering of ruby.  The following are some possibilities, listed by
   increasing typographic quality, with some comments.

   -  Display ruby in-line, after their base charcaters, in parentheses.
      In this case, an option to switch off ruby display is almost
      mandatory, because texts with many ruby will otherwise be diffi-
      cult to read. For other implementations, an option to switch off
      ruby display may also be a good idea, but it is not as necessary
      as here.

   -  Place ruby above their base characters, with half the hight of the
      base characters. Use fixed spacing. In case the ruby are longer
      than their corresponding base characters, leave some space blank
      after the base characters.  Always keep a group of base characters
      and their ruby on the same line.

   -  Same as last solution, but expand ruby proportionally in case they
      are shorter than their associated base characters.

   -  In case the ruby are longer than their associated base characters,
      test if previous or following characters of the base text have
      associated ruby. If this is not the case (particularly if these
      characters are not ideographic), let the ruby overlap the base
      characters to avoid blank space.

   -  Use nested ruby attributes for highest-quality rendering including
      line-breaks (very difficult to implement).

   More strict implementation specifications with examples can be found
   in [JIS95].


4. Design Considerations

   Besides the solution proposed in this document, various alternatives
   for ruby markup were discussed. They all turned out to be more com-
   plex than having ruby as an attribute, without significant additional
   benefits. For some more details about these proposals, please see
   [DUR96].

   Some solutions, defining one or more elements for base characters and
   ruby, would have made ruby visible even by browsers not aware of the
   new markup. However, to provide reasonable rendering in these cases,
   complicated rules about the removal of parentheses would have had to
   be introduced.



                        Expires 28 February 1997        [Page 5]

Internet Draft                Ruby in HTML                28 August 1996


   Using an attribute to indicate ruby also has the disadvantage that
   only the whole string of ruby, but not individual characters in it,
   can be given a special appearance. As it is highly unlikely that such
   a feature was ever used anywhere, this is not really a problem.


Acknowledgements

   I am grateful in particular to the following persons for their advice
   and help: Junichiro Kida, Literary Critic, Japan; Yasuo Kida, Apple
   Japan; Tatsuo L. Kobayashi, Just Systems, Japan; Francois Yergeau,
   Alis Technology, Canada; Gavin Nicol, ETB, Tokyo; Martin Brian, The
   SGML Centre, UK; the organizers of the 8th Unicode conference; the
   participants of the I18N workshop at the 1996 WWW conference in
   Paris.




Bibliography

   [DUR96]        M.J. Duerst, "Ruby in HTML", <http://www.ifi.unizh.ch/
                  groups/mml/people/mduerst/ruby/ruby.html>, May 1996.

   [GOLD90]       C. F. Goldfarb, "The SGML Handbook", Y. Rubinsky, Ed.,
                  Oxford University Press, 1990.

   [HTML]         T. Berners-Lee and D. Connolly, "Hypertext Markup Lan-
                  guage - 2.0" (RFC1866), MIT/W3C, November 1995.

   [HTML-I18N]    F. Yergeau, G. Nicol, G. Adams, and M. Duerst, "Inter-
                  nationalization of the Hypertext Markup Language",
                  Work in progress (draft-ietf-html-i18n-05.txt), August
                  1996.

   [JIS95]        Japanese Industrial Standards Committee, "Line compo-
                  sition Rules in Japanese", Japanese Industrial Stan-
                  dard JIS X 4051-1995 (in Japanese).













                        Expires 28 February 1997        [Page 6]

Internet Draft                Ruby in HTML                28 August 1996


Author's Address

   Martin J. Duerst
   Multimedia-Laboratory
   Department of Computer Science
   University of Zurich
   Winterthurerstrasse 190
   CH-8057 Zurich
   Switzerland

   Tel: +41 1 257 43 16
   Fax: +41 1 363 00 35
   E-mail: mduerst@ifi.unizh.ch


     NOTE -- Please write the author's name with u-Umlaut wherever
     possible, e.g. in HTML as D&uuml;rst.


































                        Expires 28 February 1997        [Page 7]