INTERNET DRAFT                                       Yngve N. Pettersen
<draft-pettersen-subtld-structure-00.txt>            Opera Software ASA
Expires: August 2006                                      February 2006

               The TLD Subdomain Structure Protocol
             and its use for Cookie domain validation


Status of this Memo

By submitting this Internet-Draft, each author represents that any 
applicable patent or other IPR claims of which he or she is aware have 
been or will be disclosed, and any of which he or she becomes aware will 
be disclosed, in accordance with Section 6 of BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task
Force (IETF), its areas, and its working groups.  Note that other groups
may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time.  It is inappropriate to use Internet-Drafts as reference material
or to cite them other than as "work in progress."

The list of current Internet-Drafts can be accessed at
http://www.ietf.org/1id-abstracts.html

The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html


Abstract

This document defines a protocol that can be used by a client to
discover how a Top Level Domain (TLD) is organized in terms of what
subdomains are used to place closely related but independent domains,
e.g. commercial domains in country code TLDs (ccTLD) like .uk are placed
in the .co.uk subTLD domain.

This information is then used to limit which domains an Internet service
can set cookies for, strengthening the rules already defined by the
cookie specifications.

1. Introduction

The Domain Name System [DNS] used to name Internet hosts allow a wide
range of hierarchical names to be used to indicate what a given host is,
some implemented by the owners of a domain, such as creating subdomains
for certain tasks or functions, others by the Top Level Domain registry
owner to indicate what kind of service the domain is, e.g. commercial,
educational, government or geographic location, e.g. city or state.

Pettersen                                                       [Page 1]
draft-pettersen-subtld-structure-00.txt                   February 2006

While this system makes it relatively easy for TLD administrators to
organize online services, and for the user to locate and recognize
relevant services, this flexibility causes various security and privacy
related problems when services located at different hosts are allowed to
share data through functionality administrated by the client, e.g. HTTP
state management cookies [RFC2965], [NETSC].

Most information sharing mechanisms make the process of sharing easy,
perhaps too easy, since in many cases there is no mechanism to ensure
that the servers receiving the information really want it, and it is
often difficult to determine the source of the information being shared.

To some extent [RFC2965] addresses some of these concerns for cookies, 
in that clients that supports [RFC2965]-style cookies sends the target 
domain for the cookie along with the cookie so that the recipient can 
verify that the cookie has the correct domain.  Unfortunatly, [RFC2965] 
is not widely deployed in clients, or on servers.

The recipient(s) can make inappropriate information sharing more
difficult by requiring the information to contain data identifying the
source and assuring the integrity of the data, e.g. by use of
cryptographic technologies.  These techniques tend, however, to be
computationally costly.

There are two problem areas:

  * Incorrect sharing of information between non-associated services,
  e.g. example1.com and example2.com or example1.co.uk and
  example2.co.uk.  That is, the information may be distributed to all
  services within a given Top Level Domain.

  * Undesirable information sharing within a single service.  This is, 
  in particular, a problem for services that sell hosting services to 
  many different customers, such as webhotels, where the service itself 
  has little or no control of the customers actions.

While both these problems are in some ways similar, they call for 
different solutions.  This specification will only propose a solution 
for the first problem area.  The second problem area must be handled 
separately.

This specification will first define a TLS Subdomain Structure Protocol
that can be used to discover the actual structure of a Top Level Domain
e.g. that the TLD have several subTLDs co.tld, ac.tld, org.tld, then it
will show how this information can be used to determine when information
sharing through cookies is not desirable.

1.1 Requirements

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119.


Pettersen                                                       [Page 2]
draft-pettersen-subtld-structure-00.txt                   February 2006

2. The TLD Subdomain Structure Protocol

The TLD Subdomain Structure Protocol is a HTTP service, managed by the
TLD owner, and located at a well known URI location that, when queried,
returns information about a TLD's domain structure.  The client can then
use this information to decide what actions are permitted for the
protocol data the client is processing.

Procedure for use:

 - The client should retrieve the domain list for the domain tld from
 https://www.subdomains.tld/domainlist .  [The actual location must be
 decided by ICANN and/or IANA, this section contain the author's
 suggestion.  Due to security considerations it should be considered
 whether or not a https URL or at least a signed file should be used]

 - The Content-Type of the returned list MUST be
 application/subdomain-structure

 - The retrieved specification SHOULD be cached for at least 30 days

 - The TLD owner SHOULD update the list at least 90 days before a new
 sub-domain becomes active.

 - If no specification can be retrieved the user agent MAY fall back to
 alternative methods, depending on the profile.

2.1 Securing the domain information

Individuals with malicious intent may wish to modify the domain list
served by the service location to either classify a domain incorrectly
as a subTLD or to hide a subTLD's classification.

Beside obviously securing the hosting locations, this also means that
the content served will have to be secured.

There are two primary methods to secure then specification:

  1. Digitally sign the specification, using one of the available
  message signature methods, e.g. S/MIME [SMIME].  This will secure the
  content during storage both at the client and the server, as well as
  during transit.  The drawback is that the client must implement
  decoding and verification of the message format which it may not
  already support, which may be problematic for clients having limited
  resources.

  2. Using an encrypted connection, such as HTTP over TLS [HTTP-TLS], 
  which is supported by many clients already.  Unfortunately, this 
  method does not protect the content when stored by the client.

This specification recommends using HTTP over TLS, and the client MUST
use the non-anonymous cipher suites, to secure the transport of the
specification.  The client MUST ensure that the hostname in the
certificate matches the hostname used in the request.

Pettersen                                                       [Page 3]
draft-pettersen-subtld-structure-00.txt                   February 2006

2.2 Domainlist format

The domainlist file can contain a list of subdomains that are considered
top level domains, as well as a special list of names that are not top
level domains.

None of the domain lists specify the TLD name, since that is implicit 
from the request URI.

The domainnames listed MUST be encoded in punycode, according to [IDNA]

2.2.1 Domainlist BNF

ABNF syntax as defined by [RFC2616]

   domainlist = domainspecification-list
                  *([CRLF] domainspecification-list)
   domainspecification-list = (#domain-specification
                  [";" domain-qualifiers]) | comment
   comment = ("#") *<ANY CHAR>
   domain-specification = subdomain | non-tlddomain
   non-tlddomain = "!" subdomain
   subdomain = wildcard | (*(subdomain ".") namecomponent )
   wildcard = "*" levels
   levels = *DIGIT
   domain-qualifiers = #domain-qualifier
   domain-qualifier = qualifier-name "=" qualifier-info
   qualifier-name = token
   qualifier-info = token | qouted-string


domainspecification-list may contain whitespace between the components

comment SHOULD be UTF-8 encoded.

2.2.2 Domainlist interpretation

Each item in the list, unless it is a non-tlddomain, specifies a domain
which MUST be considered a top level-like domain (a subTLD), which also
applies to the parent(s) of the domain (if any).

A non-tlddomain means that a domain is not used for top level-like
purposes, and can be assigned to a thirdparty, even if the
policies for that level specify otherwise.

A wildcard means that it all domains at that level, and the specified 
number of levels below it, have the same status with respect to being a 
top level-like domain.  This means that for the specification 
"*1.example", y.x.example.tld, for any x and y, are considered a top 
level-like domain, while z.y.x.example.tld are not such a domain.

If the wildcard occur in a non-tlddomain specification, all domains at
that level and below are not considered to be top level-like domains.
The level number MUST be ignored for such cases.

Pettersen                                                       [Page 4]
draft-pettersen-subtld-structure-00.txt                   February 2006

The optional domain-qualifiers may provide additional information about
the domain(s) in the preceding domain-specification.  Currently no
qualifiers are defined.

Comments, incorrect specifications and unknown domain-qualifiers must
always be ignored.


3. A TLD Subdomain Structure Protocol profile for Cookies

HTTP State management cookies is one area where it is important, both 
for security and privacy reasons, to ensure that unauthorized services 
cannot set cookies for another service.  Inappropriate cookies can 
affect the functionality of a service, but may also be used to track the 
users across services in an undesirable fashion.

Both the original Netscape cookie specification [NETSC] and [RFC2965]
specify rules for how servers may set cookies, but these rules are not
adequate in many cases.

The [NETSC] rules require only that the target domain must have one 
internal dot (e.g. example.com) if the TLD belong to a list of generic 
TLDs (gTLD), while for all TLDS the domain must contain two internal 
dots (e.g. example.co.uk).  The latter rule was never properly 
implemented, in particular due to the many flat ccTLD domain structures 
that are in use.

[RFC2965] set the requirement that cookies can only be set for the 
server's parent domain.  Unfortunately, this still leave open the 
possibility of setting cookies for a subTLD by setting the cookie from a 
host name example.subtld.tld to the domain subtld.tld, which is by 
itself legal, but not desirable because that means that the cookie can 
be sent to numerous websites either revealing sensitive information, or 
interfering with those other websites without authroization.

As can be seen, these rules do not work satisfactorily, especially when
applied to ccTLDs, which may have a flat domain structure similar to the
one used by the generic .com TLD, a hierarchical subTLD structure like
the one used by the .uk ccTLD (e.g. .co.uk), or a combination of both.
But there are also gTLDs, such as .name, for which cookies should not be
allowed for the second level domains, as these are generally family
names shared between many different users, not service names.

A partially effective method for distinguishing servicenames from
subTLDs by using DNS has been defined in [DNSCOOKIE].  However this
method is not immune to TLD regsitries that uses subTLDs as directories,
or to services that does not define an IP address for the domainname.

Using the TLD Subdomain Structure Protcol to retrieve a list of all
subTLDs in a given TLD will solve both those problems.


Pettersen                                                       [Page 5]
draft-pettersen-subtld-structure-00.txt                   February 2006

3.1 Procedure for using the TLD Subdomain Structure Protcol for cookies

When receiving a cookie the client must first perform all the checks
required by the relevant specification.  Upon completion of these checks
the client then performs the following additional verification checks if
the cookie is being set for the server's parent, grand-parent domain (or
higher):

  1. If the domain structure of the TLD is not known already, or the 
  structure information has expired, the client should retrieve or 
  validate the structure specification from the server hosting the 
  specification, according to section 2.  If retrieval is unsuccessful, 
  and no copy of the specification is known, the client MAY use 
  alternative methods to decide the domain's status, e.g. those 
  described in [DNSCOOKIE], or other heuristics.

  2. Evaluate the specification as specified in section 2.  If the 
  target domain is part of the subTLD structure the cookie MUST be 
  discarded.

  3. If the target domain is not a subTLD, the cookie is accepted.

3.2 Unverifiable transactions

Use of HTTP Cookies, combined with HTTP requests to resources that are 
located in domains other than the one the user actually wants to visit, 
have caused widespread privacy concerns.  The reason is that multiple 
websites can link to the same independent website, e.g. an advertiser, 
who may then use cookies to build a profile of the visitor, that can be 
used to select advertisements that are of interest to the user.

[RFC2965] specified that if the name of the host of an included resource
does not domain match the domain reach (defined as the parent domain of
the host) of the URL of the document the user started loading, loading
the resource is considered an unverifiable transcation, and in such
third party transactions cookies should not be sent or accepted.  The
latter point is not widely implemented, except when selected by
especially interested users.

This means that server1.example.com and server2.example.com can share 
cookies, and either can be referenced automatically (e.g. by including 
an image) by the other without being considered an unverifiable 
transaction, while requests to server3.example2.com would be considered 
an unverifiable transaction.

However, like the normal domain matching rule for cookies, this rule
opens up some holes.  If the host example.co.uk requests a resource from
server4.example3.co.uk, the request to example3.co.uk server would not
be considered an unverifiable transaction because example.co.uk's reach
is co.uk, which domain matches server4.example3.co.uk, a conclusion
which is obviously, to a human with some knowlegde of the .uk domain
structure, incorrect.


Pettersen                                                       [Page 6]
draft-pettersen-subtld-structure-00.txt                   February 2006

To avoid such misclassifications clients SHOULD apply the procedure
specified in 3.1 for the reach domain used to decide if a request is an
unverifiable, and if the reach domain is a subTLD, the reach of the
original host must be changed to become the same as the name of the host
itself, and requests that do not domain match the original host's name
must be considered unverifiable transactions.  That is, the reach for
example.co.uk becomes example.co.uk, not co.uk, and example3.co.uk will
therefore not domain match the resulting reach.

4. Examples

The following examples demonstrates how the TLD Subdomain Structure
Protcol can be used to decide cookie domain permissions.

Specification example 1

   *, !example

   This specification means that all names at the top level are subTLDs,
   except "example" for which cookies are allowed.  Cookies are also
   implicitly allowed for any y.x.tld domains.


Specification example 2

   *1.example1, *1.example2

   This specification means that example1 and example2 and the two
   subdomain levels immediately below are subTLDs for which cookies are
   not allowed, for all other domains under the TLD, cookies are allowed
   (e.g. for example.tld)

Specification example 3

   *1.example1, *1.example2, !example3.example2

   This specification has the same meaning as Specification 2, with the
   exception that cookies are allowed for example3.example2.tld

Specification example 4

   *1.example1, *1.example2, !*.example3.example2

   This specification has the same meaning as Specification 2, with the
   exception that cookies are allowed for all domains below
   example3.example2.tld (but not example3.example2.tld)


Pettersen                                                       [Page 7]
draft-pettersen-subtld-structure-00.txt                   February 2006

5. IANA Considerations

This specification requires that the domain list is retrievable from a
well-known location.  This means that a hostname or group of hostnames
must be assigned to serve the domain list.  Suggestions for where to
located the service are described in section 5.1

The specification also requires that responses are served with a 
specific media type.  Section 5.2 provides the registration of this 
media type.

5.1 Location of the TLD Subdomain Structure specification

The location of the domain list must be located at a location that can
easily be deduced by the client from the name of the TLD.  Several
possibilities exist:

   1. A reserved domain name in the TLD's name space e.g.
   https://www.subdomains.tld/domainlist or
   https://subdomains.nictld.tld/domainlist .

   2. A common repositiory managed by the IANA or another Internet
   governance body, e.g. https://subdomains.example.org/tld/domainlist

The benefit of the first alternative is that the data are not located at
a single repository which makes it more difficult to shut down the
system completely.  On the other hand the TLD registries may find the
overhead of maintaining such a service burdensome, and therefore avoid
implementing it, or let the service lapse.

The second alternative creates a common repository, which may increase 
adoption.  On the other hand, a single location makes it more 
susceptible to denial of service attacks.


Pettersen                                                       [Page 8]
draft-pettersen-subtld-structure-00.txt                   February 2006

5.2 Registration of the application/subdomain-structure Media Type

 Type name : application
 Subtype name: subdomain-structure

 Required parameters: none
 Optional parameters: none

 Encoding considerations:

   The content of this media type is always transmitted in binary form.

 Security considerations:

   See section 6

 Interoperability considerations: none

 Published specification: This document

 Additional information:

     Magic number(s): none
     File extension(s):
     Macintosh file type code(s):

   Person & email address to contact for further information:

     Yngve N. Pettersen
     Email: yngve@opera.com

   Intended usage: common

   Restrictions on usage: none

   Author/Change controller:

     Yngve N. Pettersen
     Email: yngve@opera.com


6. Security considerations

Retrieval of the specifications are vulnerable to denial of service
attacks or loss of network connection.  Hosting the specifications at a
single location can increase this vulnerability, although the exposure
can be reduced by using mirrors with the same name, but hosted at
different network locations.

This protocol is as vulnerable to DNS security problems as any other
[RFC2616] HTTP based service.  Requiring the specifications to be
digitally signed or transmitted over a authenticated TLS connection
reduces this vulnerabity.


Pettersen                                                       [Page 9]
draft-pettersen-subtld-structure-00.txt                   February 2006

Section 3 of this document describe using the domain list defined in
section 2 as a method of increasing security.  The effectiveness of the
domain list for this purpose, and the resulting security for the client
depend both on the integrity of the list, and its correctness.

The integrity of the list depends on how securely it is stored
at the server, and how securely it is transmitted.  This specification
mandates downloading the domain list using HTTP over TLS, which makes
the tranmission as secure as the message authentication mechanism used
(encryption is not required), and the servers should be configured to
use the stronges available key lengths and authentication mechansims.

The correctness of the list depends on how well the TLD registry defined
it.  A list that does not include some subTLDs may expose the client to
potential privacy and security problems, but not any worse than the
situation would be without this protocol and profile, while a subdomain
incorrectly classified as a subTLD can lead to denial of service for the
affected services.  Both of the problems can be prevented by careful
construction and auditing of the lists, both by the TLD registry, and by
interested thirdparties.


7. References:

[RFC2965]: Kristol, Montulli, "HTTP State Management Mechanism", RFC
2965

[RFC2616]: R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P.
Leach, T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC
2616

[IDNA]: P. Faltstrom, P. Hoffman, A. Costello, "Internationalizing
Domain Names in Applications (IDNA)"    RFC 3490

[DNS]: P. Mockapetris, "DOMAIN NAMES - CONCEPTS AND FACILITIES", STD 13,
RFC 1034

[SMIME]: S. Dusse, P. Hoffman, B. Ramsdell, L. Lundblade, L. Repka,
"S/MIME Version 2 Message Specification", RFC2311

[HTTP-TLS] : E. Rescorla, "HTTP Over TLS", RFC2818

[NETSC] "Persistent Client State HTTP Cookies",
http://wp.netscape.com/newsref/std/cookie_spec.html

[DNSCOOKIE]: Yngve N. Pettersen, "Enhanced validation of domains for
HTTP State Management Cookies using DNS". Work in progress.
draft-pettersen-dns-cookie-validate-00.txt


Pettersen                                                      [Page 10]
draft-pettersen-subtld-structure-00.txt                   February 2006

Author's Address

   Yngve N. Pettersen
   Opera Software ASA
   yngve@opera.com

Comments

   Comments are solicited, and should be sent to the author

Full Copyright Statement

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at ietf-
   ipr@ietf.org.

Copyright Notice

   Copyright (C) The Internet Society (2006).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Pettersen                                                      [Page 11]