Internet Draft Tim Kindberg Document: draft-kindberg-tag-uri-03.txt Hewlett-Packard Corporation Expires: February 1, 2003 Sandro Hawke World Wide Web Consortium August 2002 The 'tag' URI scheme and URN namespace STATUS OF THIS MEMO This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. This document replaces draft-kindberg-tag-uri-02.txt. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-draft will expire on February 1, 2003. Copyright Notice Copyright (C) The Internet Society (2001). All Rights Reserved. DISCLAIMER. The views and opinions of authors expressed herein do not necessarily state or reflect those of the World Wide Web Consortium, and may not be used for advertising or product endorsement purposes. This proposal has not undergone technical review within the Consortium and must not be construed as a Consortium recommendation. ABSTRACT This document describes the "tag" Uniform Resource Identifier (URI) scheme and the "tag" Uniform Resource Name (URN) namespace, for identifiers that are unique across space and time. Tag URIs (also Kindberg Informational - Expires February 1, 2003 [Page 1] Internet-Draft tags August 2002 known as "tags") are distinct from most other URIs in that they are intended for uses that are independent of any particular method for resource location or name resolution. A tag URI may be used purely as an entity identifier. It may also be presented to services for resolution into a web resource or into one or more further URIs, but no particular resolution scheme is implied or preferred by a tag URI itself. Unlike UUIDs or GUIDs such as "uuid" URIs and "urn:oid" URIs, which also have some of the above properties, tag URIs are designed to be tractable to humans. Furthermore, they have many of the desirable properties that "http" URLs have when used as identifiers, but none of the drawbacks. 0. TERMINOLOGY The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. 1. INTRODUCTION A tag is a type of Uniform Resource Identifier (URI) [RFC2396] designed to meet the following requirements: 1) Identifiers are unique across space and time and come from a practically inexhaustible supply; 2) identifiers are relatively convenient for humans to mint (create), read, type, remember etc.; 3) zero registration cost, at least to holders of domain names or email addresses; and negligible cost to mint new identifiers; 4) independence of any particular resource-location or identifier- resolution scheme. For example, the above requirements may apply in the case of a user who wants to place identifiers on their documents: A) They want to be sure that the identifier is unique. Global uniqueness is valuable because it guarantees that no identifier can conflict with another, no matter how resources become shared. B) It is useful for the identifier to be tractable to humans: they should be able to mint new identifiers conveniently, and to type them into emails and forms. C) They do not want to have to communicate with anyone else in order to mint identifiers for their documents. D) As a good net citizen, the user does not want to use an identifier that might be assumed by software to imply the existence of a corresponding resource in a default binding scheme – so that an attempt to retrieve that resource is likely but doomed to failure. Kindberg Informational - Expires February 1, 2003 [Page 2] Internet-Draft tags August 2002 Of course, this leaves them free to exploit the identifier in particular applications and services, where the context is clear. Existing identification schemes satisfy some but not all of the general requirements 1-4. For example: UUIDs [UUID, ISO11578] are hard for humans to read. OIDs [OID, RFC3061] and Digital Object Identifiers [DOI] require naming authorities to register themselves, even if they already hold a domain name registration. URLs (in particular, "http" URLs) are sometimes used as identifiers that satisfy most of our requirements. Many users and organisations have already registered a domain name, and the use of the domain name to mint identifiers comes at no additional cost. But there are drawbacks to URLs-as-identifiers: 1) Software might try to dereference a URL-as-identifier, even though there is no resource at the "location". 2) The new holder of a domain name can't be sure that they are minting new names. If Smith registers champignon.net and then Jones registers it, how can Jones know, in general, whether Smith has already used http://champignon.net/99? 1.1 OUTLINE Section 2 gives a specification for tags: their syntax and the rules governing their creation and comparison. Section 3 revisits the requirements outlined above and shows that the tag specification meets them. Section 4 explains how tags are embedded in the URN namespace, by adopting "tag" as a URN namespace identifier. Section 5 covers security considerations. Appendix A contains a URN namespace registration request for "tag". 1.2 FURTHER INFORMATION Information about the tag URI scheme additional to this document -- motivation, genesis and discussion -- can be obtained from http://www.taguri.org. Kindberg Informational - Expires February 1, 2003 [Page 3] Internet-Draft tags August 2002 2. THE 'TAG' URI SCHEME Examples of tag URIs are: tag:hpl.hp.com,2001:tst.1234567890 tag:hp.com,2000-12-30:tst.1234567890 tag:exploratorium.edu,2001-06:pi.99 tag:fred@flintstone.biz,2001-07-02:rock.123 tag:sandro@w3.org,2001:Sandro tag:myIDs.com,2001-09-01:TimKindberg/doc.101 Each tag consists of a "tag authority" followed, optionally, by a specific identifier. The tag authority consists of an "authority name" -- a fully qualified domain name or an email address containing a fully qualified domain name -- followed by a date. The tag authority is globally unique because domain names and email addresses are assigned to at most one entity at a time. That entity can be sure of minting unique identifiers. The date specifies, according to the Gregorian calendar, any particular day on which the authority name was assigned to the minting entity at 00:00 UTC (the start of the day). The date is specified using one of the "YYYY", "YYYY-MM" and "YYYY-MM-DD" formats allowed by RFC 3339 [RFC3339]. The tag specification permits no other formats. In the interests of brevity, the month and day default to "01". A day value of 01 MAY be omitted; a month value of 01 MAY be omitted unless it is followed by a day value other than 01. For example, "2001-07" is the date 2001-07-01 and "2000" is the date 2000-01-01. All dates specify a moment (00:00) of a single day; they MUST NOT be taken as periods of a day or more, such as "the whole of July 2001" or "the whole of 2000". It is RECOMMENDED that tag authorities use only one formulation for a given date, since alternative formulations of the same date will be counted as distinct and hence tags containing them will be unequal. For example, tags beginning "tag:hp.com,2000:" are never equal to those beginning "tag:hp.com,2000-01-01:", even though they refer to the same date. A tag authority mints specific identifiers that are unique within its context, in accordance with any internal scheme that uses only URI characters. Some tag authorities (e.g. corporations, mailing lists) consist of many people, in which case group decision-making and record-keeping procedures are required to achieve uniqueness. Kindberg Informational - Expires February 1, 2003 [Page 4] Internet-Draft tags August 2002 Entities that were assigned an authority name at 00:00 UTC -- the start -- of a given date MAY mint tags rooted at that date-qualified name. An entity MUST NOT mint tags under an authority name that was assigned to a different entity on the given date, and it MUST NOT mint tags under a future date. An entity that acquires an authority name immediately after a period during which the name was unassigned MAY mint tags as if the entity was assigned the name during the unassigned period. This practice has considerable potential for error and MUST NOT be used unless the entity has substantial evidence that the name was unassigned during that period. The authors are currently unaware of any mechanism that would count as evidence, other than daily polling of the "whois" registry. For example, Hewlett-Packard holds the domain registration for hp.com and may mint any tags rooted at that name with a current or past date when it held the registration. It must not mint tags such as "tag:champignon.net,2001:" under domain names not registered to it. It must not mint tags dated in the future, such as "tag:hp.com,2999:". If it obtains assignment of "extremelyunlikelytobeassigned.org" on 2001-05-01, then it must not mint tags under "extremelyunlikelytobeassigned.org,2001-01-01" unless it has evidence proving that that name was continuously unassigned between 2001-01-01 and 2001-05-01. The general syntax of a tag URI, in ABNF, is: tagURI = "tag:" tagAuthority ":" [specific] Where: tagAuthority = authorityName "," date authorityName = DNSname / emailAddress date = 4*dig ["-" 2*dig ["-" 2*dig ]] ; [RFC3339] DNSname = DNScomp / DNSname "." DNScomp ; [RFC1035] DNScomp = lowAlphaNum [*(lowAlphaNum /"-") lowAlphaNum] emailAddress = 1*(lowAlphaNum /"-"/"."/"_") "@" DNSname lowAlphaNum = dig / lowAlpha specific = 1*(URIchars) ; URIchars defined in [RFC2396] lowAlpha = %x61-7A ; any char in the range "a" through "z" dig = %x30-39 ; any char in the range "0" through "9" The component "tagAuthority" is the name space part of the URI. To avoid ambiguity, this MUST be expressed in lower case; the domain name in "authorityName" (whether an email address or a simple domain name) MUST be fully qualified. Kindberg Informational - Expires February 1, 2003 [Page 5] Internet-Draft tags August 2002 Authority names could, in principle, belong to any syntactically distinct namespaces whose names are assigned to a unique entity at a time. Those include, for example, certain IP addresses, certain MAC addresses, and telephone numbers. However, to simplify the tag scheme, we restrict authority names to be domain names and email addresses. Future standards efforts may allow use of other authority names following syntax that is disjoint from this syntax. To allow for such developments, software that processes tags MUST NOT reject them on the grounds that they are outside the syntax defined above. The component "specific" is the name-space-specific part of the URI: it is any string of valid URI characters [RFC2396] chosen by the minter of the URI. It is RECOMMENDED that specific identifiers should be human-friendly. 2.1 EQUALITY OF TAGS As with URIs in general, different tags may identify the same thing. Tags themselves, however, are simply strings of characters and are considered equal if and only if they are completely indistinguishable in their machine representations. That is, one can compare tags for equality by comparing the numeric codes of their characters, in sequence, for numeric equality. This equality-criterion allows for simplification of tag-handling software, which does not have to transform tags in any way to compare them. However, it puts the onus on minting authorities not to inadvertently create tags that were intended to be identical but are unequal, such as "tag:sandro@w3.org,2001-01-01:Sandro" and "tag:sandro@w3.org,2001:Sandro". Organizations may choose to define equivalances of tags, saying for instance that "tag:sandro@w3.org,2001-01-01:Sandro" and "tag:sandro@w3.org,2001:Sandro" should be treated as synonyms, but the two tags themselves remain distinct. Conventions for asserting equivalences like this are beyond the scope of this document. 3. MEETING REQUIREMENTS 1-4 Requirement 2 of Section 1 -- convenience for humans -- is met by the URL-like syntax for tag authorities. However, the onus is on individual naming authorities to use human-friendly specific identifiers. Requirement 3 -- negligible costs -- follows from use of domain names and email addresses. Those identifiers are already held by many Kindberg Informational - Expires February 1, 2003 [Page 6] Internet-Draft tags August 2002 individuals and organisations and are cheap to obtain. Specific identifiers may be minted without communication with any other entity. Requirement 4 -- independence of resolution schemes -- is asserted by definition. However, this state of affairs is subject to actual usage conventions. Requirement 1 specifies uniqueness over space and time. Tag URIs meet that requirement by using uniquely assigned authority names and by handling transfers of their assignment, e.g. the transfer of a domain name's registration from one entity to another. The date is used to guarantee uniqueness of "tagAuthority" across assignments of the authority name. For example, suppose that on November 2, 2001, the champignon.net domain registration becomes assigned to a new entity. That entity must qualify the domain name with a date on which it is or was assigned to it, to ensure that its tag authority is and will remain unique. In particular, it must take care not to use defaults in such a way as to specify an earlier date. For example, the new assignee of champignon.net may use 2001-11-02, 2001-12 or 2002 (assuming it retains the assignment) but not 2001 or 2001-11. 4. TAG URNS The tag namespace lends itself to embedding in the URN namespace [RFC2141] by adopting "tag" as a URN namespace identifier, to form a set of tag-based URIs that inherit the URN property of denoting one resource persistently in any given naming context. The syntax for tag URNs is thus: "urn:tag:" tagAuthority ":" [specific] with syntactic components as defined in Section 2. For example, urn:tag:timothy@hpl.hp.com,2001:fred is a tag URN. Any binding of that URI must be guaranteed to be persistent. Appendix A contains a completed URN namespace identifier registration template, requesting assignment of "tag". 5. SECURITY CONSIDERATIONS Minting a tag, by itself, is an operation internal to the minting entity with no external consequences. The consequences of using an Kindberg Informational - Expires February 1, 2003 [Page 7] Internet-Draft tags August 2002 improperly minted tag (due to malice or error) in an application depends on the application, and must be considered in the design of any application that uses tags. There is a significant possibility of minting errors by people who fail to apply the rules governing minting dates, or who use a shared (organizational) authority-name without prior organization-wide agreement. Tag-aware software MAY help catch and warn against these errors. (As stated in Section 2, however, to allow for future expansion, software MUST NOT reject tags which do not conform to the syntax specified in Section 2. It MAY however have a "lint" mode, where it checks URIs for conformance to particular specifications and notifies the person requesting the check of abnormalities.) REFERENCES [DOI] Norman Paskin (1997). Information Identifiers. Learned Publishing, Vol. 10, No. 2, pp. 135-156, April. See also www.doi.org. [ISO11578] ISO (International Organization for Standardization). ISO/IEC 11578:1996. "Information technology - Open Systems Interconnection - Remote Procedure Call (RPC)" [OID] ITU-T recommendation X.208 (ASN.1). See also RFC 1778. [RFC822] David H. Crocker (1982). Standard for the format of ARPA Internet text messages. [RFC1035] P. Mocapetris (1987). Domain Names - implementation and specification. [RFC2141] R. Moats (1997). URN syntax. [RFC2396] T. Berners-Lee, R. Fielding, L. Masinter (1998). Uniform Resource Identifiers (URI): Generic Syntax. [RFC3061] M. Mealling (2001). A URN Namespace of Object Identifiers. [RFC3339] G. Klyne, C. Newman (2002). Date and Time on the Internet: Timestamps. [UUID] Paul Leach, Rich Salz (1997). UUIDs and GUIDs. Internet- Draft Draft-leach-uuids-01. Kindberg Informational - Expires February 1, 2003 [Page 8] Internet-Draft tags August 2002 AUTHORS' ADDRESSES Tim Kindberg Hewlett-Packard Laboratories 1501 Page Mill Road, MS 1138 Palo Alto, CA 94304, USA Tel: +1 650 857-5609 Email: timothy@hpl.hp.com Sandro Hawke World Wide Web Consortium 200 Technology Square Cambridge, MA 02139, USA Tel: +1 617 253-7288 Email: sandro@w3.org APPENDIX A: REQUEST FOR REGISTRATION OF 'TAG' URN NAMESPACE IDENTIFIER Namespace ID: tag Registration Information: Version 3 Date: 1 August 2002 Declared registrant of the namespace: Name: Tim Kindberg E-mail: timothy@hpl.hp.com Affiliation: Hewlett-Packard Company Address: 1501 Page Mill Rd Palo Alto, Ca 94304, USA Declared registrant of the namespace: Name: Sandro Hawke E-mail: sandro@w3.org Affiliation: World Wide Web Consortium Address: 200 Technology Square Cambridge, MA 02139, USA Declaration of structure: Kindberg Informational - Expires February 1, 2003 [Page 9] Internet-Draft tags August 2002 The identifier structure is as follows: "urn:tag:" authorityName "," date ":" [specific] (see Section 2) Relevant ancillary documentation: See Sections 1-4 above and References section below. Identifier uniqueness considerations: Guaranteed by uniqueness of assignment of a domain name or email address on a given date -- as long as the "specific" string is locally unique. See Section 3 above. Identifier persistence considerations: A tag URN persistently designates the resource to which the minter of the tag bound it. This Draft does not mandate any particular protocol for effecting that binding. Process of identifier assignment: Assignment of these URNs is delegated to entities that hold domain names or email addresses (See Section 2). Process for identifier resolution: No resolution procedures are mandated. Rules for Lexical Equivalence: Character-by-character equality. See section 2.1 above. Conformance with URN Syntax: No special considerations. Validation mechanism: None specified. Scope: Global. Kindberg Informational - Expires February 1, 2003 [Page 10]