TOC 
Internet Engineering Task ForceM. Davis
Internet-DraftGoogle
Intended status: BCPA. Phillips
Expires: July 19, 2010Lab126
 Y. Umaoka
 IBM
 January 15, 2010


BCP 47 Extension U
draft-davis-u-langtag-ext-00

Abstract

This document specifies an Extension to BCP 47 which provides subtags that specify language and/or locale-based behavior or refinements to language tags, according to work done by the Unicode Consortium.

Status of this Memo

This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress.”

The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt.

The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html.

This Internet-Draft will expire on July 19, 2010.

Copyright Notice

Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the BSD License.



Table of Contents

1.  Introduction
    1.1.  Requirements Language
2.  BCP47 Required Information
    2.1.  Summary
        2.1.1.  Canonicalization
    2.2.  Registration Form
3.  Acknowledgements
4.  IANA Considerations
5.  Security Considerations
6.  References
    6.1.  Normative References
    6.2.  Informative References
§  Authors' Addresses




 TOC 

1.  Introduction

[BCP47] (Davis, M., Ed., “Tags for the Identification of Language (BCP47),” September 2009.) permits the definition and registration of language tag extensions "that contain a language component and are compatible with applications that understand language tags". This document defines an extension for identifying Unicode locale-based variations using language tags. The "singleton" identifier for this extension is 'u'.



 TOC 

1.1.  Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.



 TOC 

2.  BCP47 Required Information

Language tags, as defined by [BCP47] (Davis, M., Ed., “Tags for the Identification of Language (BCP47),” September 2009.), are useful for identifying the language of content. They are also used as locale identifiers (or can be mapped to locales) in many operating environments and APIs. However, most such locale identifiers also provide additional "tailorings" or options for specific values within a language, culture, region, or other variation. This extension provides a mechanism for using these additional tailorings within language tags for general interchange.

The maintaining authority for this extension's registry is the Unicode Consortium. Unicode defines common locale data and identifiers for this data:

ItemValue
Name Unicode Consortium
Contact Email cldr@unicode.org
Discussion List Email cldr-users@unicode.org
URL Location cldr.unicode.org
Specification Unicode Technical Standard #35 Unicode Locale Data Markup Language (LDML), http://unicode.org/reports/tr35/
Section Section 3.2 BCP 47 Tag Conversion

The specification of extension subtags is provided by Section 3 of Unicode Technical Standard #35 Unicode Locale Data Markup Language (Davis, M., “Unicode Technical Standard #35: Locale Data Markup Language (LDML),” December 2007.) [LDML]. As required by BCP 47, subtags follow the language tag ABNF and other rules for the formation of language tags and subtags, are restricted to the ASCII letters and digits, are not case sensitive, and do not exceed eight characters in length.

[LDML] (Davis, M., “Unicode Technical Standard #35: Locale Data Markup Language (LDML),” December 2007.) specifies a canonical representation. LDML is available over the Internet and at no cost, and is available via a royalty-free license at http://unicode.org/copyright.html. LDML is versioned, and each version of LDML is numbered, dated, and stable. Extension subtags, once defined by LDML, are never retracted or change in meaning in a substantial way.



 TOC 

2.1.  Summary

The subtags available for use in the 'u' extension consist of a set of attributes, keys, and types. Attributes, keys, types, and their respective meanings are defined in Section 3 (Unicode Language and Locale Identifiers) of [LDML] (Davis, M., “Unicode Technical Standard #35: Locale Data Markup Language (LDML),” December 2007.). The following is a summary of that definition (for details see Section 3):

For example, the language tag "de-DE-u-attr-co-phonebk" consists of:

With successive versions of [LDML] (Davis, M., “Unicode Technical Standard #35: Locale Data Markup Language (LDML),” December 2007.), additional attributes, keys, and types MAY be defined. Once defined, attributes, keys, and types will never be removed. Machine-readable files listing the valid attributes, keys, and types are available in the CLDR repository for each version. For example, for version 1.7.2, the files are located at http://unicode.org/repos/cldr/tags/release-1-7-2/common/bcp47/. These also can contain aliases which were used in previous versions of [LDML] (Davis, M., “Unicode Technical Standard #35: Locale Data Markup Language (LDML),” December 2007.).



 TOC 

2.1.1.  Canonicalization

As required by [BCP47] (Davis, M., Ed., “Tags for the Identification of Language (BCP47),” September 2009.), case is not significant. The canonical form for all subtags in the extension is lowercase. The canonical order of attributes is in [US‑ASCII] (International Organization for Standardization, “ISO/IEC 646:1991, Information technology -- ISO 7-bit coded character set for information interchange.,” 1991.) order (that is, numbers before letters, with letters sorted as lowercase US-ASCII code points). The canonical order of keywords is in [US‑ASCII] (International Organization for Standardization, “ISO/IEC 646:1991, Information technology -- ISO 7-bit coded character set for information interchange.,” 1991.) order by key. The order of subtags within a keyword is significant; the meaning of this extension is altered if those subtags are rearranged. Thus, the canonical form of the extension never reorders the subtags within a keyword.



 TOC 

2.2.  Registration Form

Per [RFC5646] (Phillips, A. and M. Davis, “Tags for Identifying Languages,” September 2009.), Section 3.7:

%%
Identifier: u
Description: Unicode Locale
Comments: Subtags for the identification of language and cultural
    variations. Used to set behavior in locale APIs.
Added: 2009-mm-dd
RFC: [TBD]
Authority: Unicode Consortium
Contact_Email: cldr@unicode.org
Mailing_List: cldr-users@unicode.org
URL: http://cldr.unicode.org
%%


 TOC 

3.  Acknowledgements

Thanks to John Emmons and the rest of the Unicode CLDR Technical Committee for their work in developing the BCP 47 subtags for LDML.



 TOC 

4.  IANA Considerations

This document will require IANA to insert the record in Section 2.2 (Registration Form) into the Language Extensions Registry, according to Section 3.7. Extensions and the Extensions Registry of "Tags for Identifying Languages" in [BCP47] (Davis, M., Ed., “Tags for the Identification of Language (BCP47),” September 2009.). There might be occasional maintenance of this record. This document does not require IANA to create or maintain a new registry or otherwise impact IANA.



 TOC 

5.  Security Considerations

The security considerations for this extension are the same as those for [RFC5646] (Phillips, A. and M. Davis, “Tags for Identifying Languages,” September 2009.) (or its successors). See Section 6. Security Considerations of [RFC5646] (Phillips, A. and M. Davis, “Tags for Identifying Languages,” September 2009.).



 TOC 

6.  References



 TOC 

6.1. Normative References

[BCP47] Davis, M., Ed., “Tags for the Identification of Language (BCP47),” September 2009.
[LDML] Davis, M., “Unicode Technical Standard #35: Locale Data Markup Language (LDML),” December 2007.
[RFC5646] Phillips, A. and M. Davis, “Tags for Identifying Languages,” BCP 47, RFC 5646, September 2009 (TXT).
[US-ASCII] International Organization for Standardization, “ISO/IEC 646:1991, Information technology -- ISO 7-bit coded character set for information interchange.,” 1991.


 TOC 

6.2. Informative References

[ldml-registry] “Registry for Common Locale Data Repository tag elements,” September 2009.


 TOC 

Authors' Addresses

  Mark Davis
  Google
Email:  mark@macchiato.com
  
  Addison Phillips
  Lab126
Email:  addison@inter-locale.com
  
  Yoshito Umaoka
  IBM
Email:  yoshito_umaoka@us.ibm.com