Network Working Group Y. YONEYA
Internet-Draft JPRS
Intended status: Informational October 24, 2011
Expires: April 26, 2012

Mapping characters for PRECIS classes
draft-yoneya-precis-mappings-00

Abstract

Preparation and comparison of internationalized strings ("PRECIS") Framework [I-D.ietf-precis-framework] is defining several classes of strings for preparation and comparison. In the document, case mapping is defined because many of protocols handle case sensitive or case insensitive string comparison and therefore preparation of string is mandatory. As described in IDNA mapping [RFC5895] and PRECIS problem statement [I-D.ietf-precis-problem-statement], mappings in internationalized strings are not limited to case, but also width and/or delimiters are taken into consideration. This document considers mappings other than case mapping in PRECIS context.

Status of this Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on April 26, 2012.

Copyright Notice

Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

1. Introduction

In many cases, user input of internationalized strings is generated by input method editor ("IME") or copy-and-paste from free text. Usually users do not care case and/or width of input characters because they are identical for users' eyes. Further, users rarely switch IME state to input special characters such as protocol elements. For Internationalized Domain Names ("IDNs"), IDNA Mapping [RFC5895] describes methods to treat these issues. For PRECIS strings, case mapping is defined as a process in PRECIS Framework [I-D.ietf-precis-framework], but width mapping and/or delimiter mapping are not defined. Handling of mappings other than case is also important to increase chance of strings match as users expect. This document considers such mappings in PRECIS context.

2. Type of mappings

2.1. Width mapping

Fullwidth and halfwidth characters (those defined with Decomposition Types <wide> and <narrow>) are mapped to their decomposition mappings as shown in the Unicode character database [Unicode]. This mapping should be performed before case mapping because fullwidth/halfwidth characters includes both upper case and lower case letters.

Width mapping will increase backward compatibility with Stringprep [RFC3454] and PRECIS Framework [I-D.ietf-precis-framework]. Because in a Stringprep profile which specifies Unicode normalization form KC (NFKC) for normalization method, fullwidth/halfwidth characters are mapped into its compatible form. If a PRECIS Framework profile specified NFKC (which is not recommended), width mapping might not be useful.

2.2. Delimiter mapping

Definitions of delimiters in certain protocols are differ from each other. Therefore, delimiter mapping should be based on well defined mapping table for each protocols. This mapping should be performed after width mapping because some punctuations have fullwidth form.

One of the most useful case of delimiter mapping is when FULL STOP character (U+002E) is a delimiter as well as domain name. Some of IME generates FULL STOP compatible characters such as IDEOGRAPHIC FULL STOP (U+3002) when users type FULL STOP on the keyboard.

3. Discussion

There are several points for discussion on this topic.

4. IANA Considerations

TBD.

5. Security Considerations

TBD.

6. References

[RFC5895] Resnick, P. and P. Hoffman, "Mapping Characters for Internationalized Domain Names in Applications (IDNA) 2008", RFC 5895, September 2010.
[RFC3454] Hoffman, P. and M. Blanchet, "Preparation of Internationalized Strings ("stringprep")", RFC 3454, December 2002.
[I-D.ietf-precis-framework] Blanchet, M and P Saint-Andre, "PRECIS Framework: Handling Internationalized Strings in Protocols", Internet-Draft draft-ietf-precis-framework-01, October 2011.
[I-D.ietf-precis-problem-statement] Blanchet, M and A Sullivan, "Stringprep Revision Problem Statement", Internet-Draft draft-ietf-precis-problem-statement-03, July 2011.
[Unicode] The Unicode Consortium, , "The Unicode Standard, Version 6.0.0", http://www.unicode.org/versions/Unicode6.0.0/, 2010.

Author's Address

Yoshiro YONEYA JPRS Chiyoda First Bldg. East 13F 3-8-1 Nishi-Kanda Chiyoda-ku, Tokyo 101-0065 Japan Phone: +81 3 5215 8451 EMail: yoshiro.yoneya@jprs.co.jp

Table of Contents