Internet Engineering Task Force S. Ruby, Ed. Internet-Draft IBM Intended status: Informational L. Masinter Expires: June 20, 2015 Adobe December 17, 2014 Problem Statement: URL draft-ruby-url-problem-00 Abstract This document lays out the problem space of possibly conflicting standards between multiple organizations for URLs and things like them, and proposes some actions to resolve the conflicts. From a user or developer point of view, it makes no sense for there to be a proliferation of definitions of URL nor for there to be a proliferation of incompatible implementations. This shouldn't be a competitive feature. Therefore there is a need for the organizations involved to update and reconcile the various Internet Drafts, Recommendations, and Standards in this area. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on June 20, 2015. Copyright Notice Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents Ruby & Masinter Expires June 20, 2015 [Page 1] Internet-Draft Problem Statement: URL December 2014 carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Brief History of URL standards . . . . . . . . . . . . . . . 2 2. Current Organizations and Specs in Development . . . . . . . 3 2.1. IETF . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2. WHATWG . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.3. W3C . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.4. WebPlatform . . . . . . . . . . . . . . . . . . . . . . . 4 2.5. Unicode Consortium . . . . . . . . . . . . . . . . . . . 4 3. Problem Statements . . . . . . . . . . . . . . . . . . . . . 4 4. Outline of Potential Solution . . . . . . . . . . . . . . . . 5 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 5 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5 7. Security Considerations . . . . . . . . . . . . . . . . . . . 5 8. Informative References . . . . . . . . . . . . . . . . . . . 5 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 7 1. Brief History of URL standards This section contains a very compressed history of URL standards, in sufficient detail to set some context. The first standards-track specification for URLs was [RFC1738] in 1994. (That spec contains more background material.) It defined URLs as ASCII only. Although it was quickly determined that it was desirable to allow non-ASCII characters, shoehorning utf-8 into ASCII-only systems was unacceptable; at the time Unicode was not so widely deployed. The tack was taken to leave "URI" alone and define a new protocol element, "IRI"; [RFC3987] was published in 2005 (in sync with the [RFC3986] update to the URI definition). The IRI-to-URI transformation specified in [RFC3987] had options; it wasn't a deterministic path. The URI-to-IRI transformation was also heuristic, since there was no guarantee that %xx-encoded bytes in the URI were actually meant to be %xx percent-hex-encoded bytes of a utf8 encoding of a Unicode string. To address issues and to fix URL for HTML5, a new IRI working group was established in IETF in 2009. Despite years of development, the IRI group was closed in 2014, with the consolation that the documents that were being developed in the IRI working group could be updated as individual Ruby & Masinter Expires June 20, 2015 [Page 2] Internet-Draft Problem Statement: URL December 2014 submissions or within the "applications area" working group. In particular, one of the IRI working group items was to update [appsawg-uri-scheme-reg], which is currently under development in IETF's application area. Independently, the HTML specifications in the WHATWG and W3C redefined "URL" in an attempt to match what some of the browsers were doing. This definition was moved out into the "URL - Living Standard" [URL-LS] . The world has also moved on. ICANN has approved non-ASCII top level domains, but IDNA specs ([RFC3490] and [RFC5895]) did not fully addressed IRI processing. Subsequently, the Unicode consortium produced [UTS-46]. 2. Current Organizations and Specs in Development There are multiple umbrella organizations which have produced multiple documents, and it's unclear whether there's a trajectory to make them consistent. This section tries to enumerate currently active organizations and specs. Organizations include the IETF [2], the WHATWG [3], the W3C [4], Web Platform.org [5], and the Unicode Consortium [6]. Relevant specs under development in each organization include: 2.1. IETF [appsawg-uri-scheme-reg] and [kerwin-file-scheme] are under active development. The IRI working group closed, but work can continue in the Applications Area working group. Documents sitting needing update, abandoned now, are three drafts ([iri-3987bis], [iri-comparison], and [iri-bidi-guidelines]), which were originally intended to obsolete [RFC3987]. In addition, there's quite a bit of activity around URNs and library identifiers in the URN working group, including some expressions of desire to update RFC 3986 to better accomodate desired URN semantics. 2.2. WHATWG The [URL-LS] is being developed as a living standard [7]. It primarily focuses on specifying what is important for browsers. The means by which new schemes might be registered is not yet defined. This work is based on [UTS-46], and is intented to obsolete both [RFC3986] and [RFC3987]. Ruby & Masinter Expires June 20, 2015 [Page 3] Internet-Draft Problem Statement: URL December 2014 2.3. W3C The Web Applications Working Group [8], in conjuction with the W3C TAG [9], sporadically have been republishing the WHATWG work with no technical content differences as [W3C-URL]. There is a [url-workmode] proposal to formalize this relationship. 2.4. WebPlatform [WP-URL] is being developed on a develop [10] GitHub branch based on [URL-LS]. It currently contains work that has yet to be folded back into the [URL-LS], primarily to rewrite the parser logic in a way that is more understandable and approachable. The intent is to merge this work once it is ready, and to actively work to keep the two versions in sync. 2.5. Unicode Consortium [UTS-46] defines parameterized functions for mapping domain names. [URL-LS] builds upon this work, specifying particular values to be used for these parameters. 3. Problem Statements The main problem is conflicting specifications that overlap but don't match each other. Additionally, the following are issues that need to be resolves to make URL processing unambiguous and stable. o Nomenclature: over the years, a number of different sets of terminology has been used. URL / URI / IRI is not the only difference. [tantek-slice] chronicles a number of differences. o Parameterization: standards in this area need to define such matters as normalization forms and values for parameters such as UseSTD3ASCIIRules. o Interoperability: even after accounting for the above, there is a demonstrable lack of interoperability across popular libraries and browsers. [whatwg-interop] identifies a number of such differences. o Specific scheme definitions: some UR* scheme definitions are woefully out of date, incomplete, or don't correspond to current practice, but updating their definitions is unclear. This includes "file:", for which there is a current effort, but there are others which need review (including 'ftp:', 'data'). Ruby & Masinter Expires June 20, 2015 [Page 4] Internet-Draft Problem Statement: URL December 2014 4. Outline of Potential Solution This problem clearly requires a cross-organizational solution, specifically: o Build a plan to update or obsolete [RFC3986], [RFC3987], [RFC5895], and [kerwin-file-scheme] to be consistent with [URL-LS] and [UTS-46]. This may involve working to get the other specifications updated, if only to clarify nomenclature. o Change the [URL-LS] goals to only obsolete specifications listed above that are not updated. Presuming that [RFC3986] is updated, explicitly state that canonical URLs (i.e., the outout of the URL parser) not only round trip, but also are valid URIs. o Reconcile how [appsawg-uri-scheme-reg] and [URL-LS] handle currently unknown schemes, update [appsawg-uri-scheme-reg] to state that registration applies to both URIs and URLs, and update [URL-LS] to indicate that [appsawg-uri-scheme-reg] is how you register schemes. o Have the W3C adopt [url-workmode]. o Other than responsing to any feedback that may be provided, no changes to any Unicode Consortium product is required. 5. Acknowledgements Helpful comments and improvements to this document have come from Anne van Kesteren and Graham Klyne. 6. IANA Considerations This memo currently includes no request to IANA, although an updated [appsawg-uri-scheme-reg] might add some additional requirements and information to IANA URI scheme registry [11] to make clear that the schemes serve as URL schemes and IRI schemes as well as URI schemes. 7. Security Considerations In addition to the security exposures created when URLs work differently in different systems, all of the security considerations defined in [RFC3490], [RFC3986], [RFC3987], and [RFC5895] apply to URLs. 8. Informative References Ruby & Masinter Expires June 20, 2015 [Page 5] Internet-Draft Problem Statement: URL December 2014 [RFC1738] Berners-Lee, T., Masinter, L., and M. McCahill, "Uniform Resource Locators (URL)", RFC 1738, December 1994. [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, "Internationalizing Domain Names in Applications (IDNA)", RFC 3490, March 2003. [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC Text on Security Considerations", BCP 72, RFC 3552, July 2003. [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, January 2005. [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource Identifiers (IRIs)", RFC 3987, January 2005. [RFC5895] Resnick, P. and P. Hoffman, "Mapping Characters for Internationalized Domain Names in Applications (IDNA) 2008", RFC 5895, September 2010. [URL-LS] van Kesteren, A. and S. Ruby, "URL Living Standard", 2014, . [UTS-46] Davis, M. and M. Suignard, "Unicode IDNA Compatibility Processing", 2014, . [W3C-URL] van Kesteren, A. and S. Ruby, "URL Working Draft", 2014, . [WP-URL] van Kesteren, A. and S. Ruby, "URL Standard", 2014, . [appsawg-uri-scheme-reg] Thaler, D., Hansen, T., Hardie, T., and L. Masinter, "Guidelines and Registration Procedures for New URI Schemes", 2014, . [iri-3987bis] Duerst, M., Suignard, M., and L. Masinter, "Internationalized Resource Identifiers (IRIs)", 2012, . [iri-bidi-guidelines] Duerst, M., Masinter, L., and A. Allawi, "Guidelines for Internationalized Resource Identifiers with Bi-directional Ruby & Masinter Expires June 20, 2015 [Page 6] Internet-Draft Problem Statement: URL December 2014 Characters (Bidi IRIs)", 2012, . [iri-comparison] Masinter, L. and M. Duerst, "Comparison, Equivalence and Canonicalization of Internationalized Resource Identifiers", 2012, . [kerwin-file-scheme] Kerwin, M., "The file URI Scheme", 2014, . [tantek-slice] Celik, T., "How many ways can you slice a URL and name the pieces?", 2011, . [url-workmode] Ruby, S., "URL WorkMode", 2014, . [whatwg-interop] Ruby, S., "URL test results", 2014, . Authors' Addresses Sam Ruby (editor) IBM Raleigh USA Email: rubys@intertwingly.net URI: http://intertwingly.net/ Larry Masinter Adobe 345 Park Ave San Jose, CA 95110 USA Email: masinter@adobe.com URI: http://larry.masinter.net/ Ruby & Masinter Expires June 20, 2015 [Page 7]