INTERNET-DRAFT
Expiration date: September 2002.

Network Working Group                                          C. Kostin
Request for Comments: nnnn
Category: Standards Track                               Date: March 2002

             Language Property For Character References
               draft-kostin-language-property-01.txt

Status of this Memo

         This document is an Internet-Draft and is subject to
         all provisions of Section 10 of RFC 2026 (in spite of
         this section is so large that I, probably, even have
         not yet entirely read it).

     Internet-Drafts are working documents of the Internet Engineering
     Task Force (IETF), its areas, and its working groups.  Note that
     other groups may also distribute working documents as
     Internet-Drafts.

     Internet-Drafts are draft documents valid for a maximum of six
     months and may be updated, replaced, or obsoleted by other
     documents at any time.  It is inappropriate to use Internet-
     Drafts as reference material or to cite them other than as
     "work in progress."

     The list of current Internet-Drafts can be accessed at
     http://www.ietf.org/1id-abstracts.html

     The list of Internet-Draft Shadow Directories can be accessed at
     http://www.ietf.org/shadow.html

________________________________________________________________________
Copyright Notice

     Copyright (C) The Internet Society (2002).  All Rights Reserved.

________________________________________________________________________
Abstract

     (Standard Backus-Naur Form (BNF) is used hereinafter.)
     This document describes an enhancing of symbolic name way of 
character references in HTML by adding to the character symbolic name an 
optional language property in the following manner "&<character symbolic 
name>[+<character language property>];." This language property shows 
what language a referenced character belongs to.

Kostin                        Standards Track                   [Page 1]

RFC NNNN      Language Property For Character References      March 2002


________________________________________________________________________
Introduction

     Dear Sirs,

     It seems likely that method of encoding characters in operating 
systems becomes more and more similar to character entity references 
implemented in HTML by notation &<character symbolic name>; (see ref. 
[1] below) (notation, probably, originated from SGML (for SGML, see ref. 
[3] below)). Especially this process becomes apparent in UTF-8 
sequences. At the days of writing this document the symbolic name is not 
yet become a method of operating systems. Probably, the way for 
implementing this method inside operating systems lies over prior trial 
in HTML. This document describes (1st step on the way) an enhancing the 
symbolic name notation by adding to it an optional language property 
which must show what language an encoded character belongs to.


Kostin                        Standards Track                   [Page 2]

RFC NNNN      Language Property For Character References      March 2002


________________________________________________________________________
CHAPTER 1 - NOTATIONAL CONVENTIONS


(001.001.001.0001 - Chapter 1, division 1, section 1, paragraph 1)

     A standard Backus-Naur Form (BNF) notation is used in this 
document.
<...> - means that enclosed expression is a definition of some element.
[...] - means that enclosed expression is an optional element.
{...} - means that enclosed expression is repeated 0 (zero) or more
        times.
<ABL> - defines left angle bracket i. e. "<".
<ABL> - defines right angle bracket i. e. ">".
<SBL> - defines left square bracket i. e. "[".
<SBR> - defines right square bracket i. e. "]".
<braceL> - defines left brace i. e. "{".
<braceR> - defines right brace i. e. "}".


Kostin                        Standards Track                   [Page 3]

RFC NNNN      Language Property For Character References      March 2002


________________________________________________________________________
CHAPTER 2 - LANGUAGE PROPERTY FOR CHARACTER REFERENCES


                          CHAPTER 2, DIVISION 1
                        LANGUAGE PROPERTY SYNTAX


(002.001.001.0001 - Chapter 2, division 1, section 1, paragraph 1)

     This document introduces an optional language property for 
characters symbolic names, the names by which characters are encoded in 
HTML (see ref. [1] below). The language property is a short name of a 
language, i. e. abbreviation of a name of some language, which specifies 
what a language the character encoded by a symbolic name belongs to.

(002.001.001.0002 - Chapter 2, division 1, section 1, paragraph 2)

     The language property is added after specified symbolic name 
separated from the symbolic name (or from another character property) by 
a language property separator, a plus ("+") sign. (A hyphen, minus ("-") 
sign, is reserved for using inside the language property (abbreviated 
language name), for example to separate subtags as in RFC 3066 (see ref. 
[5] below), also for using inside other possible character properties 
and maybe for using inside the symbolic names too.)

(002.001.001.0003 - Chapter 2, division 1, section 1, paragraph 3)

     The language property has a higher priority than the language tag 
described in RFC 3066 (ref. [5] below).

(002.001.001.0004 - Chapter 2, division 1, section 1, paragraph 4)

     BNF: &<symbolic name>{<separator><another property>}[+<language 
property>]{<separator><another property>};

     The language property, as you can see and as mentioned above 
(paragraph 1, abstract, introduction), is an optional parameter.

(002.001.001.0005 - Chapter 2, division 1, section 1, paragraph 5)

     Exempli gratia:
"&e+lat;&x+lat; &g+lat;&r+lat;&a+lat;&t+lat;&i+lat;&a+lat;" - defines
     Latin expression which, probably, means "for example";
"&p+serb;&a+serb;&r+serb;&k+serb;" - defines Serbo-Croatian word
     which means "a park".


Kostin                        Standards Track                   [Page 4]

RFC NNNN      Language Property For Character References      March 2002


002.001.002 - CHARACTERS USED TO COMPOSE THE LANGUAGE PROPERTY ITSELF


(002.001.002.0001 - Chapter 2, division 1, section 2, paragraph 1)

     The language property (abbreviation of a language name) is composed 
of ASCII characters, the characters which form a complete and perhaps a 
most stable and reliable set of characters so far (for ASCII, see ref. 
[4] below), excluding:
1. (excluding) ASCII-characters used as character property separators
    (to avoid confusion with the beginning of another character 
    property) (e. g. a plus ("+") sign is a language property 
    separator),
2. (excluding) the "&" ASCII-character
    (used as delimiter of the beginning of another character reference 
    (ref. [1] below)),
3. (excluding) the ";" ASCII-character
    (used as delimiter of the end of the character reference (ref. [1] 
    below)).

(002.001.002.0002 - Chapter 2, division 1, section 2, paragraph 2)

     So list of characters surely allowable for now for composing 
language properties looks as follows:
     A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
     a b c d e f g h i j k l m n o p q r s t u v w x y z
     0 1 2 3 4 5 6 7 8 9 - _ ' 

(002.001.002.0003 - Chapter 2, division 1, section 2, paragraph 3)

     There is no restriction on a length of a word for representing the 
language property. The word should be as long (or as short) as enough to 
identify, intuitively, by just reading the word, the language, which 
this word represents. (Simple rule for abbreviations.)


                          CHAPTER 2, DIVISION 2
                ABOUT LIST OF SHORT NAMES FOR LANGUAGES


(002.002.001.0001 - Chapter 2, division 2, section 1, paragraph 1)

     For purposes of guarantee unambiguously to identify a language by 
its short name and against possible confusion when different languages 
will named with the same short name and wise versa a list of short words 
(abbreviations) already chosen for names for some languages are 
maintained. This list is permanently opened for free reading, copying, 
distributing, and adding a new language name (the last is rather with 
few restrictions).

Kostin                        Standards Track                   [Page 5]

RFC NNNN      Language Property For Character References      March 2002

(002.002.001.0002 - Chapter 2, division 2, section 1, paragraph 2)

     Please, take into consideration that this document doesn't contain 
any list of languages names.

(002.002.001.0003 - Chapter 2, division 2, section 1, paragraph 3)

     An example of the languages names list.
bopo         - Bopomofo
eng          - English
eng-US       - English, American
kling        - Klingons' language
lat          - Latin
slav-old     - Old Slavonic/Slavic
rus          - Russian
serb         - Serbo-Croatian


Kostin                        Standards Track                   [Page 6]

RFC NNNN      Language Property For Character References      March 2002


Security Considerations

     This document raises no security issues.


Informative References

[1]  HTML 4.01 Specification, 5.3.2 Character entity references 
     - http://www.w3.org/TR/html4/charset.html#h-5.3.2 .

[2]  The Unicode Standard, (Version 3.0 - ISBN 0-201-61633-5) 
     - http://www.unicode.org/unicode/uni2book/u2.html .

[3]  ISO/IEC 8879 - ISO (International Organization for 
     Standardization). ISO/IEC 8879-1986 (E). Information processing -
     Text and Office Systems - Standard Generalized Markup Language 
     (SGML). First edition - 1986-10-15. [Geneva]: International 
     Organization for Standardization, 1986. 

[4]  Information Systems. Coded Character Sets. 7-Bit American National
     Standard Code for Information Interchange (7-Bit ASCII). - ANSI 
     X3.4-1986. 
     - ($$) http://webstore.ansi.org/ansidocstore/product.asp?sku=ANSI+
       X3%2E4%2D1986+%28R1997%29 .

[5]  "Tags for the Identification of Languages", H. Alvestrand,
     RFC 3066, (Obsoletes RFC 1766,) Cisco Systems, January 2001.
     - http://www.ietf.org/rfc/rfc3066.txt


   Author's Address

     Cyril Kostin, 
     Ural'skaya ul, 1, 118.
     107241  Moscow, Russia     
     Voice: (+7 095) 462-3260 (It is in Moscow.)
     E-mail: cyril@chat.ru,
             cyril2@mail.ru, 
             cyril@aha.ru


Kostin                        Standards Track                   [Page 7]

RFC NNNN      Language Property For Character References      March 2002
Expiration date: September 2002.

________________________________________________________________________
Full Copyright Statement

      Copyright (C) The Internet Society (2002).  All Rights Reserved.

      This document and translations of it may be copied and furnished
      to others, and derivative works that comment on or otherwise
      explain it or assist in its implementation may be prepared,
      copied, published and distributed, in whole or in part, without
      restriction of any kind, provided that the above copyright notice
      and this paragraph are included on all such copies and derivative
      works.  However, this document itself may not be modified in any
      way, such as by removing the copyright notice or references to the
      Internet Society or other Internet organizations, except as needed
      for the purpose of developing Internet standards in which case the
      procedures for copyrights defined in the Internet Standards
      process must be followed, or as required to translate it into
      languages other than English.

      The limited permissions granted above are perpetual and will not
      be revoked by the Internet Society or its successors or assigns.

      This document and the information contained herein is provided on
      an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET
      ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR
      IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
      THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
      WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Kostin                        Standards Track                   [Page 8]