Network Working Group B. Hoehrmann Internet-Draft May 25, 2009 Intended status: Informational Expires: November 26, 2009 The i;codepoint collation draft-hoehrmann-cp-collation-00 Status of This Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on November 26, 2009. Copyright Notice Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents in effect on the date of publication of this document (http://trustee.ietf.org/license-info). Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Abstract This memo describes the "i;codepoint" collation. Character strings are compared based on the Unicode scalar values of the characters. The collation supports equality, substring, and ordering operations. Hoehrmann Expires November 26, 2009 [Page 1] Internet-Draft The i;codepoint collation May 2009 1. Introduction The i;codepoint collation operates on Unicode strings and treats any and all differences between two strings as significant. Ordering of different strings is determined by the Unicode scalar values of the characters. It produces usable results where further information is unavailable. In that it is suitable as default collation. The equality operation determines if two strings are identical. This makes the collation suitable for use with strings known to be in some canonical form. Similarily, applications that require strings to be in a canonical but otherwise arbitrary order may find this collation the most efficient as it requires no transformations. 2. Definition The i;codepoint collation is the same as the i;octet collation except that it operates on sequences of Unicode scalar values, not octets. Note that by definition the set of Unicode scalar values excludes the surrogate code points and as such they do not occur in valid input. 3. Security Considerations None beyond those in RFC 4790 [RFC4790]. 4. IANA Considerations The i;codepoint collation should be added to the registry [RFC4790]. 5. Registration document i;codepoint Unicode identity equality order substring RFC XXXX IETF bjoern@hoehrmann.de Hoehrmann Expires November 26, 2009 [Page 2] Internet-Draft The i;codepoint collation May 2009 6. References [RFC4790] Newman, C., Duerst, M., and A. Gulbrandsen, "Internet Application Protocol Collation Registry", RFC 4790, March 2007. Author's Address Bjoern Hoehrmann Mittelstrasse 50 39114 Magdeburg Germany EMail: mailto:bjoern@hoehrmann.de URI: http://bjoern.hoehrmann.de Note: Please write "Bjoern Hoehrmann" with o-umlaut (U+00F6) wherever possible, e.g., as "Björn Höhrmann" in HTML and XML. Hoehrmann Expires November 26, 2009 [Page 3]