INTERNET DRAFT Yngve N. Pettersen Opera Software ASA Expires: August 2006 February 2006 The TLD Subdomain Structure Protocol and its use for Cookie domain validation Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Abstract This document defines a protocol that can be used by a client to discover how a Top Level Domain (TLD) is organized in terms of what subdomains are used to place closely related but independent domains, e.g. commercial domains in country code TLDs (ccTLD) like .uk are placed in the .co.uk subTLD domain. This information is then used to limit which domains an Internet service can set cookies for, strengthening the rules already defined by the cookie specifications. 1. Introduction The Domain Name System [DNS] used to name Internet hosts allow a wide range of hierarchical names to be used to indicate what a given host is, some implemented by the owners of a domain, such as creating subdomains for certain tasks or functions, others by the Top Level Domain registry owner to indicate what kind of service the domain is, e.g. commercial, educational, government or geographic location, e.g. city or state. Pettersen [Page 1] draft-pettersen-subtld-structure-00.txt February 2006 While this system makes it relatively easy for TLD administrators to organize online services, and for the user to locate and recognize relevant services, this flexibility causes various security and privacy related problems when services located at different hosts are allowed to share data through functionality administrated by the client, e.g. HTTP state management cookies [RFC2965], [NETSC]. Most information sharing mechanisms make the process of sharing easy, perhaps too easy, since in many cases there is no mechanism to ensure that the servers receiving the information really want it, and it is often difficult to determine the source of the information being shared. To some extent [RFC2965] addresses some of these concerns for cookies, in that clients that supports [RFC2965]-style cookies sends the target domain for the cookie along with the cookie so that the recipient can verify that the cookie has the correct domain. Unfortunatly, [RFC2965] is not widely deployed in clients, or on servers. The recipient(s) can make inappropriate information sharing more difficult by requiring the information to contain data identifying the source and assuring the integrity of the data, e.g. by use of cryptographic technologies. These techniques tend, however, to be computationally costly. There are two problem areas: * Incorrect sharing of information between non-associated services, e.g. example1.com and example2.com or example1.co.uk and example2.co.uk. That is, the information may be distributed to all services within a given Top Level Domain. * Undesirable information sharing within a single service. This is, in particular, a problem for services that sell hosting services to many different customers, such as webhotels, where the service itself has little or no control of the customers actions. While both these problems are in some ways similar, they call for different solutions. This specification will only propose a solution for the first problem area. The second problem area must be handled separately. This specification will first define a TLS Subdomain Structure Protocol that can be used to discover the actual structure of a Top Level Domain e.g. that the TLD have several subTLDs co.tld, ac.tld, org.tld, then it will show how this information can be used to determine when information sharing through cookies is not desirable. 1.1 Requirements The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. Pettersen [Page 2] draft-pettersen-subtld-structure-00.txt February 2006 2. The TLD Subdomain Structure Protocol The TLD Subdomain Structure Protocol is a HTTP service, managed by the TLD owner, and located at a well known URI location that, when queried, returns information about a TLD's domain structure. The client can then use this information to decide what actions are permitted for the protocol data the client is processing. Procedure for use: - The client should retrieve the domain list for the domain tld from https://www.subdomains.tld/domainlist . [The actual location must be decided by ICANN and/or IANA, this section contain the author's suggestion. Due to security considerations it should be considered whether or not a https URL or at least a signed file should be used] - The Content-Type of the returned list MUST be application/subdomain-structure - The retrieved specification SHOULD be cached for at least 30 days - The TLD owner SHOULD update the list at least 90 days before a new sub-domain becomes active. - If no specification can be retrieved the user agent MAY fall back to alternative methods, depending on the profile. 2.1 Securing the domain information Individuals with malicious intent may wish to modify the domain list served by the service location to either classify a domain incorrectly as a subTLD or to hide a subTLD's classification. Beside obviously securing the hosting locations, this also means that the content served will have to be secured. There are two primary methods to secure then specification: 1. Digitally sign the specification, using one of the available message signature methods, e.g. S/MIME [SMIME]. This will secure the content during storage both at the client and the server, as well as during transit. The drawback is that the client must implement decoding and verification of the message format which it may not already support, which may be problematic for clients having limited resources. 2. Using an encrypted connection, such as HTTP over TLS [HTTP-TLS], which is supported by many clients already. Unfortunately, this method does not protect the content when stored by the client. This specification recommends using HTTP over TLS, and the client MUST use the non-anonymous cipher suites, to secure the transport of the specification. The client MUST ensure that the hostname in the certificate matches the hostname used in the request. Pettersen [Page 3] draft-pettersen-subtld-structure-00.txt February 2006 2.2 Domainlist format The domainlist file can contain a list of subdomains that are considered top level domains, as well as a special list of names that are not top level domains. None of the domain lists specify the TLD name, since that is implicit from the request URI. The domainnames listed MUST be encoded in punycode, according to [IDNA] 2.2.1 Domainlist BNF ABNF syntax as defined by [RFC2616] domainlist = domainspecification-list *([CRLF] domainspecification-list) domainspecification-list = (#domain-specification [";" domain-qualifiers]) | comment comment = ("#") * domain-specification = subdomain | non-tlddomain non-tlddomain = "!" subdomain subdomain = wildcard | (*(subdomain ".") namecomponent ) wildcard = "*" levels levels = *DIGIT domain-qualifiers = #domain-qualifier domain-qualifier = qualifier-name "=" qualifier-info qualifier-name = token qualifier-info = token | qouted-string domainspecification-list may contain whitespace between the components comment SHOULD be UTF-8 encoded. 2.2.2 Domainlist interpretation Each item in the list, unless it is a non-tlddomain, specifies a domain which MUST be considered a top level-like domain (a subTLD), which also applies to the parent(s) of the domain (if any). A non-tlddomain means that a domain is not used for top level-like purposes, and can be assigned to a thirdparty, even if the policies for that level specify otherwise. A wildcard means that it all domains at that level, and the specified number of levels below it, have the same status with respect to being a top level-like domain. This means that for the specification "*1.example", y.x.example.tld, for any x and y, are considered a top level-like domain, while z.y.x.example.tld are not such a domain. If the wildcard occur in a non-tlddomain specification, all domains at that level and below are not considered to be top level-like domains. The level number MUST be ignored for such cases. Pettersen [Page 4] draft-pettersen-subtld-structure-00.txt February 2006 The optional domain-qualifiers may provide additional information about the domain(s) in the preceding domain-specification. Currently no qualifiers are defined. Comments, incorrect specifications and unknown domain-qualifiers must always be ignored. 3. A TLD Subdomain Structure Protocol profile for Cookies HTTP State management cookies is one area where it is important, both for security and privacy reasons, to ensure that unauthorized services cannot set cookies for another service. Inappropriate cookies can affect the functionality of a service, but may also be used to track the users across services in an undesirable fashion. Both the original Netscape cookie specification [NETSC] and [RFC2965] specify rules for how servers may set cookies, but these rules are not adequate in many cases. The [NETSC] rules require only that the target domain must have one internal dot (e.g. example.com) if the TLD belong to a list of generic TLDs (gTLD), while for all TLDS the domain must contain two internal dots (e.g. example.co.uk). The latter rule was never properly implemented, in particular due to the many flat ccTLD domain structures that are in use. [RFC2965] set the requirement that cookies can only be set for the server's parent domain. Unfortunately, this still leave open the possibility of setting cookies for a subTLD by setting the cookie from a host name example.subtld.tld to the domain subtld.tld, which is by itself legal, but not desirable because that means that the cookie can be sent to numerous websites either revealing sensitive information, or interfering with those other websites without authroization. As can be seen, these rules do not work satisfactorily, especially when applied to ccTLDs, which may have a flat domain structure similar to the one used by the generic .com TLD, a hierarchical subTLD structure like the one used by the .uk ccTLD (e.g. .co.uk), or a combination of both. But there are also gTLDs, such as .name, for which cookies should not be allowed for the second level domains, as these are generally family names shared between many different users, not service names. A partially effective method for distinguishing servicenames from subTLDs by using DNS has been defined in [DNSCOOKIE]. However this method is not immune to TLD regsitries that uses subTLDs as directories, or to services that does not define an IP address for the domainname. Using the TLD Subdomain Structure Protcol to retrieve a list of all subTLDs in a given TLD will solve both those problems. Pettersen [Page 5] draft-pettersen-subtld-structure-00.txt February 2006 3.1 Procedure for using the TLD Subdomain Structure Protcol for cookies When receiving a cookie the client must first perform all the checks required by the relevant specification. Upon completion of these checks the client then performs the following additional verification checks if the cookie is being set for the server's parent, grand-parent domain (or higher): 1. If the domain structure of the TLD is not known already, or the structure information has expired, the client should retrieve or validate the structure specification from the server hosting the specification, according to section 2. If retrieval is unsuccessful, and no copy of the specification is known, the client MAY use alternative methods to decide the domain's status, e.g. those described in [DNSCOOKIE], or other heuristics. 2. Evaluate the specification as specified in section 2. If the target domain is part of the subTLD structure the cookie MUST be discarded. 3. If the target domain is not a subTLD, the cookie is accepted. 3.2 Unverifiable transactions Use of HTTP Cookies, combined with HTTP requests to resources that are located in domains other than the one the user actually wants to visit, have caused widespread privacy concerns. The reason is that multiple websites can link to the same independent website, e.g. an advertiser, who may then use cookies to build a profile of the visitor, that can be used to select advertisements that are of interest to the user. [RFC2965] specified that if the name of the host of an included resource does not domain match the domain reach (defined as the parent domain of the host) of the URL of the document the user started loading, loading the resource is considered an unverifiable transcation, and in such third party transactions cookies should not be sent or accepted. The latter point is not widely implemented, except when selected by especially interested users. This means that server1.example.com and server2.example.com can share cookies, and either can be referenced automatically (e.g. by including an image) by the other without being considered an unverifiable transaction, while requests to server3.example2.com would be considered an unverifiable transaction. However, like the normal domain matching rule for cookies, this rule opens up some holes. If the host example.co.uk requests a resource from server4.example3.co.uk, the request to example3.co.uk server would not be considered an unverifiable transaction because example.co.uk's reach is co.uk, which domain matches server4.example3.co.uk, a conclusion which is obviously, to a human with some knowlegde of the .uk domain structure, incorrect. Pettersen [Page 6] draft-pettersen-subtld-structure-00.txt February 2006 To avoid such misclassifications clients SHOULD apply the procedure specified in 3.1 for the reach domain used to decide if a request is an unverifiable, and if the reach domain is a subTLD, the reach of the original host must be changed to become the same as the name of the host itself, and requests that do not domain match the original host's name must be considered unverifiable transactions. That is, the reach for example.co.uk becomes example.co.uk, not co.uk, and example3.co.uk will therefore not domain match the resulting reach. 4. Examples The following examples demonstrates how the TLD Subdomain Structure Protcol can be used to decide cookie domain permissions. Specification example 1 *, !example This specification means that all names at the top level are subTLDs, except "example" for which cookies are allowed. Cookies are also implicitly allowed for any y.x.tld domains. Specification example 2 *1.example1, *1.example2 This specification means that example1 and example2 and the two subdomain levels immediately below are subTLDs for which cookies are not allowed, for all other domains under the TLD, cookies are allowed (e.g. for example.tld) Specification example 3 *1.example1, *1.example2, !example3.example2 This specification has the same meaning as Specification 2, with the exception that cookies are allowed for example3.example2.tld Specification example 4 *1.example1, *1.example2, !*.example3.example2 This specification has the same meaning as Specification 2, with the exception that cookies are allowed for all domains below example3.example2.tld (but not example3.example2.tld) Pettersen [Page 7] draft-pettersen-subtld-structure-00.txt February 2006 5. IANA Considerations This specification requires that the domain list is retrievable from a well-known location. This means that a hostname or group of hostnames must be assigned to serve the domain list. Suggestions for where to located the service are described in section 5.1 The specification also requires that responses are served with a specific media type. Section 5.2 provides the registration of this media type. 5.1 Location of the TLD Subdomain Structure specification The location of the domain list must be located at a location that can easily be deduced by the client from the name of the TLD. Several possibilities exist: 1. A reserved domain name in the TLD's name space e.g. https://www.subdomains.tld/domainlist or https://subdomains.nictld.tld/domainlist . 2. A common repositiory managed by the IANA or another Internet governance body, e.g. https://subdomains.example.org/tld/domainlist The benefit of the first alternative is that the data are not located at a single repository which makes it more difficult to shut down the system completely. On the other hand the TLD registries may find the overhead of maintaining such a service burdensome, and therefore avoid implementing it, or let the service lapse. The second alternative creates a common repository, which may increase adoption. On the other hand, a single location makes it more susceptible to denial of service attacks. Pettersen [Page 8] draft-pettersen-subtld-structure-00.txt February 2006 5.2 Registration of the application/subdomain-structure Media Type Type name : application Subtype name: subdomain-structure Required parameters: none Optional parameters: none Encoding considerations: The content of this media type is always transmitted in binary form. Security considerations: See section 6 Interoperability considerations: none Published specification: This document Additional information: Magic number(s): none File extension(s): Macintosh file type code(s): Person & email address to contact for further information: Yngve N. Pettersen Email: yngve@opera.com Intended usage: common Restrictions on usage: none Author/Change controller: Yngve N. Pettersen Email: yngve@opera.com 6. Security considerations Retrieval of the specifications are vulnerable to denial of service attacks or loss of network connection. Hosting the specifications at a single location can increase this vulnerability, although the exposure can be reduced by using mirrors with the same name, but hosted at different network locations. This protocol is as vulnerable to DNS security problems as any other [RFC2616] HTTP based service. Requiring the specifications to be digitally signed or transmitted over a authenticated TLS connection reduces this vulnerabity. Pettersen [Page 9] draft-pettersen-subtld-structure-00.txt February 2006 Section 3 of this document describe using the domain list defined in section 2 as a method of increasing security. The effectiveness of the domain list for this purpose, and the resulting security for the client depend both on the integrity of the list, and its correctness. The integrity of the list depends on how securely it is stored at the server, and how securely it is transmitted. This specification mandates downloading the domain list using HTTP over TLS, which makes the tranmission as secure as the message authentication mechanism used (encryption is not required), and the servers should be configured to use the stronges available key lengths and authentication mechansims. The correctness of the list depends on how well the TLD registry defined it. A list that does not include some subTLDs may expose the client to potential privacy and security problems, but not any worse than the situation would be without this protocol and profile, while a subdomain incorrectly classified as a subTLD can lead to denial of service for the affected services. Both of the problems can be prevented by careful construction and auditing of the lists, both by the TLD registry, and by interested thirdparties. 7. References: [RFC2965]: Kristol, Montulli, "HTTP State Management Mechanism", RFC 2965 [RFC2616]: R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616 [IDNA]: P. Faltstrom, P. Hoffman, A. Costello, "Internationalizing Domain Names in Applications (IDNA)" RFC 3490 [DNS]: P. Mockapetris, "DOMAIN NAMES - CONCEPTS AND FACILITIES", STD 13, RFC 1034 [SMIME]: S. Dusse, P. Hoffman, B. Ramsdell, L. Lundblade, L. Repka, "S/MIME Version 2 Message Specification", RFC2311 [HTTP-TLS] : E. Rescorla, "HTTP Over TLS", RFC2818 [NETSC] "Persistent Client State HTTP Cookies", http://wp.netscape.com/newsref/std/cookie_spec.html [DNSCOOKIE]: Yngve N. Pettersen, "Enhanced validation of domains for HTTP State Management Cookies using DNS". Work in progress. draft-pettersen-dns-cookie-validate-00.txt Pettersen [Page 10] draft-pettersen-subtld-structure-00.txt February 2006 Author's Address Yngve N. Pettersen Opera Software ASA yngve@opera.com Comments Comments are solicited, and should be sent to the author Full Copyright Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf- ipr@ietf.org. Copyright Notice Copyright (C) The Internet Society (2006). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Pettersen [Page 11]