Internet Draft D. Crocker
draft-crocker-idn-idn-00.txt Brandenburg InternetWorking
Expires in six months 23 June 2002
Internationalized Domain Names (IDN)
Status of this Memo
This document is an Internet-Draft and is in full
conformance with all provisions of Section 10 of
RFC2026.
Internet-Drafts are working documents of the Internet
Engineering Task Force (IETF), its areas and its working
groups. Note that other groups may also distribute
working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum
of six months and may be updated, replaced, or obsoleted
by other documents at any time. It is inappropriate to
use Internet-Drafts as reference material or to cite
them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be
accessed at http://www.ietf.org/shadow.html.
Abstract
Globalization of the Internet requires that domain names
be able use characters outside the ASCII repertoire.
This document specifies internationalized domain names
(IDNs) and defines initial domain name constructs in
which IDNs can be used. IDNs use characters drawn from
a large repertoire (Unicode).
0. Document Change Notes --
This is a revision to draft-ietf-idn-idna-09.txt. It is
being distributed independently to facilitate
discussion.
The goal is to gain consensus about revisions to the IDN
working group document, specifically for the following
changes:
a. Split the document into two, one for defining
Internationalized Domain Names (IDN) and the other for
defining an encoding method of IDNs, namely IDNA using ACE.
b. Distinguish general IDN from its specific use for host
names (IDN-Host). Use for host names is specified more
precisely, in terms of a specific syntax BNF rule from the
relevant existing DNS specification, so that IDN-Host will
apply precisely to all DNS record fields and protocol units
conforming to that BNF.
1. Introduction
Until now, there has been no standard method for domain
names to use characters outside the ASCII repertoire.
This document defines enhancements to the definition of
domain names, to support internationalized domain names
(IDN). The details for doing protocol encoding of IDNs
are specified separately.
2. Terminology
The key words "MUST", "SHALL", "REQUIRED", "SHOULD",
"RECOMMENDED", and "MAY" in this document are to be
interpreted as described in RFC 2119 [RFC2119].
"ASCII"
means US-ASCII [USASCII], a coded character set
containing 128 characters associated with code
points in the range 0..7F. Unicode is an extension
of ASCII: it includes all the ASCII characters and
associates them with the same code points.
Code point
refers to an integral value associated with a
character in a coded character set.
Domain name
is used as a general term for strings conforming to
[STD13]. [STD13] talks about "domain names" and
"host names", but many people use the terms
interchangeably. Further, because [STD13] was not
terribly clear, many people who are sure they know
the exact definitions of each of these terms
disagree on the definitions. This document uses the
terms separately.
Domain name slot
refers to a protocol element or a function argument
or a return value (and so on) explicitly designated
for carrying a domain name. Examples of domain name
slots include: the QNAME field of a DNS query; the
name argument of the gethostbyname() library
function; the part of an email address following
the at-sign (@) in the From: field of an email
message header; and the host portion of the URI in
the src attribute of an HTML
tag. General
text that just happens to contain a domain name is
not a domain name slot; for example, a domain name
appearing in the plain text body of an email
message is not occupying a domain name slot.
Host name
is a domain name conforming to STD13, with the
naming character set limited to LDH.
Internationalized host name (IDN-Host)
is an IDN conforming to the STD13, except that it
also supports non-ASCII characters from Unicode.
Internationalized domain name" (IDN)
is a domain name that has characters drawn from the
restricted set of Unicode defined in <??>>
Internationalized label
is a label composed of characters from the Unicode
character set; note, however, that not every string
of Unicode characters can be an internationalized
label.
IDN-native
is a domain name slot specified to hold an
internationalized domain name. The designation may
be static (for example, in the specification of the
protocol or interface) or dynamic (for example, as
a result of negotiation in an interactive session).
Label
is an individual part of a domain name. Labels are
usually shown separated by dots; for example, the
domain name "www.example.com" is composed of three
labels: "www", "example", and "com". (The zero-
length root label described in [STD13], which can
be explicit as in "www.example.com." or implicit as
in "www.example.com", is not considered a label in
this specification.) Throughout this document the
term "label" is shorthand for "text label", and
"every label" means "every text label". In IDNA,
not all text strings can be labels.
LDH code points
is defined to mean the codepoints associated with
ASCII letters, digits, and the hyphen-minus; that
is, U+002D, 30..39, 41..5A, and 61..7A. "LDH" is an
abbreviation for "letters, digits, hyphen".
Unicode
is a coded character set [UNICODE] containing tens
of thousands of characters. A single Unicode code
point is denoted by "U+" followed by four to six
hexadecimal digits, while a range of Unicode code
points is denoted by two hexadecimal numbers
separated by "..", with no prefixes.
3. International Domain Names (IDN)
3.1. Data representation
This specification enhances the set of values for valid
domain name labels from the restricted ASCII specified
in [STD3], to include [Unicode].
Mechanisms for encoding Unicode values in Domain Names
is specified separately. Hence this specification
provides no detail for IDNs in "native" binary form (IDN-
Native) or for "encoded" Unicode-based IDNs.
3.2. Dot as label separator
For systems supporting IDN, wherever dot is permitted as
a label separator, the following characters MUST be
recognized as dots: U+002E (full stop), U+3002
(ideographic full stop), U+FF0E (fullwidth full stop),
U+FF61 (halfwidth ideographic full stop).
<< // Are there also multiple Unicode characters
permitted for at-sign? What about for slash ("/")?
If not, then why is the domain name lexical
analyzer now required to look for 4 characters
rather than only one?
This appears to be a case of putting into the
protocol something that is, in fact, entirely a
user-interface issue. That some user interfaces
will choose to map U+3002 to ASCII dot does not
mean that it needs to be in the protocol. // /Dave
>>
4. References
4.1. Normative references
[STD3] Bob Braden, "Requirements for Internet Hosts --
Communication Layers" (RFC 1122) and "Requirements for
Internet Hosts -- Application and Support" (RFC 1123),
STD 3, October 1989.
[STD13] Paul Mockapetris, "Domain names - concepts and
facilities" (RFC 1034) and "Domain names -
implementation and specification" (RFC 1035), STD 13,
November 1987.
4.2. Informative references
[DNSSEC] Don Eastlake, "Domain Name System Security
Extensions", RFC 2535, March 1999.
[RFC2119] Scott Bradner, "Key words for use in RFCs to
Indicate Requirement Levels", March 1997, RFC 2119.
[UAX9] Unicode Standard Annex #9, The Bidirectional
Algorithm,
.
[UNICODE] The Unicode Standard, Version 3.1.0: The
Unicode Consortium. The Unicode Standard, Version 3.0.
Reading, MA, Addison-Wesley Developers Press, 2000. ISBN
0-201-61633-5, as amended by: Unicode Standard Annex
#27: Unicode 3.1,
.
[USASCII] Vint Cerf, "ASCII format for Network
Interchange", October 1969, RFC 20.
5. Security Considerations
Security on the Internet partly relies on the DNS. Thus,
any change to the characteristics of the DNS can change
the security of much of the Internet.
This memo describes an algorithm that encodes characters
that are not valid according to STD3 and STD13 into
octet values that are valid. No security issues such as
string length increases or new allowed values are
introduced by the encoding process or the use of these
encoded values, apart from those introduced by the ACE
encoding itself.
Domain names are used by users to connect to Internet
servers. The security of the Internet would be
compromised if a user entering a single
internationalized name could be connected to different
servers based on different interpretations of the
internationalized domain name.
6. Authors' Addresses
Patrik Faltstrom
Cisco Systems
Arstaangsvagen 31 J
S-117 43 Stockholm Sweden
paf@cisco.com
Paul Hoffman
Internet Mail Consortium and VPN Consortium
127 Segre Place
Santa Cruz, CA 95060 USA
phoffman@imc.org
Adam M. Costello
University of California, Berkeley
idna-spec.amc @ nicemice.net