Internet DRAFT - draft-hoehrmann-cp-collation


Network Working Group                                       B. Hoehrmann
Internet-Draft                                        September 15, 2010
Intended status: Informational
Expires: March 19, 2011

                       The i;codepoint collation


   This memo describes the "i;codepoint" collation.  Character strings
   are compared based on the Unicode scalar values of the characters.
   The collation supports equality, substring, and ordering operations.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on March 19, 2011.

Copyright Notice

   Copyright (c) 2010 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   ( in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Hoehrmann                Expires March 19, 2011                 [Page 1]
Internet-Draft          The i;codepoint collation         September 2010

1.  Introduction

   The i;codepoint collation operates on Unicode strings and treats any
   and all differences between two strings as significant.  Ordering of
   different strings is determined by the Unicode scalar values of the
   characters.  It produces usable results where further information is
   unavailable.  In that it is suitable as default collation.

   The equality operation determines if two strings are identical.  This
   makes the collation suitable for use with strings known to be in some
   canonical form.  Similarily, applications that require strings to be
   in a canonical but otherwise arbitrary order may find this collation
   the most efficient as it requires no transformations.

2.  Definition

   The i;codepoint collation is the same as the i;octet collation except
   that it operates on sequences of Unicode scalar values, not octets.
   Note that by definition the set of Unicode scalar values excludes the
   surrogate code points and as such they do not occur in valid input.

3.  Security Considerations

   None beyond those in RFC 4790 [RFC4790].

4.  IANA Considerations

   This document defines the i;codepoint collation as per [RFC4790].

5.  Registration document

   <?xml version='1.0'?>
   <!DOCTYPE collation SYSTEM 'collationreg.dtd'>
   <collation rfc="XXXX" scope="global" intendedUse="common">
     <title>Unicode identity</title>
     <operations>equality order substring</operations>
     <specification>RFC XXXX</specification>

6.  References

   [RFC4790]  Newman, C., Duerst, M., and A. Gulbrandsen, "Internet
              Application Protocol Collation Registry", RFC 4790,
              March 2007.

Hoehrmann                Expires March 19, 2011                 [Page 2]
Internet-Draft          The i;codepoint collation         September 2010

Author's Address

   Bjoern Hoehrmann
   Mittelstrasse 50
   39114 Magdeburg


   Note: Please write "Bjoern Hoehrmann" with o-umlaut (U+00F6) wherever
   possible, e.g., as "Bj&#246;rn H&#246;hrmann" in HTML and XML.

Hoehrmann                Expires March 19, 2011                 [Page 3]