Network Working Group Sam X. Sun INTERNET-DRAFT CNRI Expires May 19, 1998 November 14, 1997 draft-sun-handle-system-00.txt Handle System: A Persistent Global Naming Service Overview and Syntax Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." To learn the current status of any Internet-Draft, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Abstract The Handle System (r) is a comprehensive system for assigning, managing, and resolving persistent identifiers, known as ''handles,'' for digital objects and other resources on the Internet. Handles can be used as Uniform Resource Names (URNs). The Handle System includes:(a) an open set of protocols, (b) a namespace, and (c) an implementation of the protocols. The protocols enable a distributed computer system to store handles of digital resources and resolve those handles into the information necessary to locate and access the resources. This associated information can be changed as needed to reflect the current state of the identified resource without changing the handle, thus allowing the name of the item to persist over changes of location and other state information. Combined with a centrally administered naming authority registration service, the Handle System provides a general purpose, distributed global naming service for the reliable management of information on networks over long periods of time. (Note that in this document we do not attempt to distinguish between the terms 'name' and 'indentifier' and will use them interchangably.) The Handle System has been implemented and is currently in use in a number of prototype projects, including efforts with the Library of Congress, the Association of American Publishers, the Defense Technical Information Center, and the United States Information Agency. This is the first of a series of planned documents that will specify the handle protocol and services, and relate the Handle System to other IETF activities in URN/URI/URL working groups. This document provides a concise overview of the system and the syntax of handles. Additional information can be found on CNRI and related project web sites [4, 5, 6, 8, 16, 17, 18, 19]. 1. Objectives The Handle System is a distributed information system that provides a persistent naming service for use on networks such as the Internet. Handles can be used to identify any network resources. The Handle System resolves handles into state information about the named objects, e.g., location. This state information may of course be constrained by its intended application, but is otherwise entirely determined by the party responsible for naming the object. The system ensures that every handle is unique within the context of the Handle System and can be retained and resolved over long time periods. The resolution information associated with each handle can be changed as needed, allowing the handle to persist over changes in location and other states of the named resource. Specifically, the Handle System was designed to address the following problems in network resource identification: * Persistence A named resource can outlast any specific computer system or organization. Any resource name which is inextricably linked with a specific system or name of an organization will not be able to survive the demise or radical change of that computer system or organization. By separating the object's name from location, ownership, and other state information, the Handle System allows that identifier to persist over time. * Location independence With handles, the name of the item is unrelated to the location of the item. This allows easy reorganization of information. Handles make it possible to transfer resources from one organization to another without affecting or breaking the existing user references (i.e., handles) to those collections. This is not possible using location based references. * Multiple instances of an item A single handle can refer to more than one instance of an item. Care must be taken, of course, to ensure that these are multiple instances of the same item. That said, the issue of what constitutes multiple instances of an item as opposed to multiple items is beyond the scope of this document. 2. Handle Syntax Every handle in the Handle System is defined in two parts: its naming authority, otherwise know as its prefix, and a unique handle string under that naming authority, otherwise known as its suffix. The naming authority identifies the administrative unit of the underlying handles. It is globally unique and persistent once obtained. Naming authorities are defined using a subset of ASCII characters, including alphanumeric characters, and some special characters listed below. Characters in the naming authority segment are case insensitive. The handle string is Unicode based and uses UTF8 [1,2] character set encoding. Handles can consist of any characters of any national language, and the Unicode based encoding ensures its global uniqueness and no ambiguity. ACSII characters within the handle string are case insensitive. The following is the handle syntax described in Extended Backus-Naur Form (EBNF) notation: ::= "/" ::= ( | "." | "-" | "_" ) + ::= ( < octet from UTF-8 encoding > ) + ::= [a-zA-Z0-9] And here are some examples of valid handles: cnri.dlib/july95-arms 10.1002/0002-8231(199601)47:1<1:SPOTEO>2.3.TX;2-K any-printable-characters/a-zA-Z0-9!@#$%^&*()_"<>,.?/`~|\ handles-in-germany/Universität-Karlsruhe It will be useful in some contexts (for example, when used in URL) to specify operations, or service requests, on the objects named by handles, e.g., designating a fragment. There will be various approaches to this, depending on context, but one that is compatible with many current Internet applications and which influences syntax is to associate a modifier string directly with a handle string. For this we specify placing the modifier before the handle and using the "@" character as delimeter: ::= [ "@" ] For example, a handle in a URL context could be combined with a modifier as follows: HREF="hdl:action=verify@4263537/certificate-1234567" Here, the is used to describe the desired operations on the digital object specified by the handle. The syntax of the may be context specific and driven by application domains. Meta-level formats will be described in a future document. Note that the is positioned in front of the naming authority of the underlying . This is to avoid defining any reserved characters in the namespace of , by taking advantage of the already highly constrained character set of segment. It is important to distinguish from , which is a Handle_with_Modifier. The is a mechanism allowing specification of customized usage of the underlying handles. It is not part of the name of the underlying resource, nor is it part of the handle resolution process. The implementation and interpretation of is left to the client software using the Handle System. 3. Using Handles with the World Wide Web 3.1 URL representation of handles Handles can be used in place of URLs in Web browsers or in HTML mark-up documents to refer to persistent Internet resources. The direct approach is to use the handle preceded with proposed URL scheme "hdl:". This is called Handle URL Syntax: ::= "hdl:" The Handle URL Syntax has two reserved characters, % and ", from the native handle namespace. The character % is used for hex encoding, which is necessary to allow any handles specified from the standard keyboard. And the character " is reserved to allow handles to be separated from the surrounding text in HTML documents. Reserved characters must be hex encoded when used in the URL context. The choice of % and hex encoding is also compatible with the current URL practice. Because some browser implementations drop the # character when processing the URL of any scheme, hex encoding of character # is also strongly recommended. Examples of handles using Handle URL Syntax are: hdl:cnri.dlib/july95-arms (which refers to handle "cnri.dlib/july95-arms") and hdl:handle-with-hex-encoding/handle%25abc (which refers to handle "handle-with-hex-encoding/handle%abc") Handles can also be resolved using a URL that references a proxy service. This is called Handle Proxy Syntax: ::= "http://" "/" In this case, the is the domain name of a proxy server that will be responsible for resolving the and directing the resolution results back to the client. Since this is a URL any excluded characters in the URL specification [3] will have to be hex encoded when used in the handle name. For example, a handle in a URL context could be combined with a modifier as follows: http://hdl.handle.net/cnri.dlib/july95-arms (which refers to handle "cnri.dlib/july95-arms") and http://hdl.handle.net/10.1000/14 (which refers to handle "10.1000/14") Here "hdl.handle.net" is a global proxy service provided by CNRI to allow handles to be resolved using the "http:" protocol. It's worth noting that the handle namespace by itself does not impose any hex or escape encoding, nor does the underlying Handle System. The reserved characters and hex encoding are introduced only when handles are used in the URL context. It is the client software's responsibility to decode any hex encoding in the handle URL before sending the handles out for resolution. And on systems where other character set encoding is used, it is also the client software's responsibility to convert a natively displayed handle to its UTF8 encoding before sending it out for resolution. 3.2 Handle Resolution service from Web browsers Handles specified using Handle URL Syntax (ie, hdl:) can be resolved from a Web browser directly using the Handle System Resolver [4]. The resolver is a freely available extension to the current popular Web browsers. It resolves handles into corresponding URLs, which are then retrieved by the browser in the normal fashion. This is the suggested way to resolve the handles in the future, because it provides better performance, is more scalable, and is locally configurable. Handles can also be resolved using proxy services using Handle Proxy Syntax (ie, http:/). In this case, the proxy server performs the handle resolution task, and sends the resulting URL to the client browser for processing. Currently, CNRI provides global handle proxy service through "hdl.handle.net", and "dx.doi.org". It is worth noting that even though using the proxy server approach is straight-forward and doesn't require any customer software customization, it has the effect of connecting the handles with the proxy server's URL location. Hence the selection of a proxy server should be made with care; this is another reason for using the Handle System Resolver instead of the proxy server. 3.3 Creating handles for network resources The Handle System allows handles to be created in a distributed fashion. Organizations in need of providing a naming service for their persistent internet resources will be able to contact CNRI or other organizations to register for their own handle naming authority, as well as their own local handle services. This will enable them to create handles for their own organizational use. Policies and procedures for Naming Authority registration are currently under development. As an initiative for general public service, CNRI has established a public handle registration service for the IETF community. This service provides an open channel to allow individuals to create handles and experiment with the handle system.  The service is provided for testing purposes only. Future availability of this service is not guaranteed. Details on how to use this service, as well as its terms and conditions can be obtained from http://www.handle.net/ietf/handle/register_handle.html. 4. Distributed service model The Handle System is distributed, scalable, and designed for widespread deployment. The current implementation consists of one global service and many local handle services. Each handle service consists of one of more physically distributed handle servers. (Currently, the global service consists of two servers in Virginia and two in California. A European location is planned.) And each handle server can have one or more secondary servers for mirroring. In addition, handle caching servers are provided for faster resolution service for a local environment, and they can also be used to provide proxy service through firewalls. 4.1 Handle services The Handle System consists of many services. Each service is responsible for part of the handle namespace. One specific service, called the Global Handle Registry, is globally unique, and has a special function, which is to know of the existence, location, and namespace responsibilities of all other public services, or local handle services. There can be an unlimited number of local handle services, managed by various organizations. In the current implementation each local handle service is registered with the Global Handle Registry to ensure efficient resolution. Policies and procedures for disconnected local handle services are under development. The primary issue here is to guarantee identifier uniqueness in disconnected systems. 4.2 Handle servers within a service Each handle service consists of one or more handle servers. Typically, each handle server runs on a separate computer but multiple handle servers can run on a single computer. Within a handle service, the distribution of handles across its constituent servers is determined by a hash table such that each of N servers within a service will be responsible for 1/N handles. The number of servers can be adjusted as required to meet the needs of a service. 4.3 Server replication Additionally, it may be desirable to mirror the contents of any of the handle servers within a service, presumably on a separate computer. This is referred to as replication and is accomplished by creating one or more additional servers whose sole purpose is to mirror the contents of the original server. Within each set of replicated servers, the initial server is called the primary server and all others are called secondary servers. The creation and administration of handles always takes place on the primary server, but resolution can use either the primary or any of its secondaries. This provides fault tolerance, as well as the potential for performance improvement. 4.4 Caching Server The Handle System Caching Server has been built to reduce the network traffic between handle clients and handle services and its use is strongly encouraged. Caching handle data or routing information on the caching server allows some handle resolution to be performed within an organization's local area network. 4.5 Proxy Server The Handle System Proxy Server has been developed to act as a client to the Handle System, allowing handles to be resolved using Handle Proxy Syntax (ie, http:/). Using this syntax, the browser passes a handle to the proxy server, which in turn passes the handle to the appropriate handle service for resolution. If the handle can be resolved into one or more URLs, a URL is returned from the handle server to the proxy, and from the proxy to the client browser. 4.6 Handle System Resolver The Handle System Resolver [4] is a software component which extends Netscape or Microsoft Web browsers, and allows handles to be resolved using Handle URL Syntax (ie, hdl:). Using this syntax, the browser passes the handle directly to the appropriate handle service for resolution. If the handle can be resolved into one or more URLs, one of the URLs is returned to the browser which then transparently retrieves and displays the intended content. 5. Handle data A handle within the Handle System is associated with, and can be resolved to, one or more elements of typed data. Examples of data types in use include URLs, object request brokers, and other URNs. Other examples might include e-mail addresses or public key certificates. There is a controlled set of named types accepted by the system. This list can be extended as needed at the system level. Additionally, for ad hoc use, any numeric value will be accepted as an undefined data type. Administrative data, e.g., permissions to create handles or edit handle data, is also associated with each handle. This data is not returned as part of the handle resolution but is used for handle administration only. Other than the relationship between the Global Handle Registry and local handle services described above, there are no hierarchical relationships assumed among handle records. Note, however, that handles can include in their associated data references to other handles, thus allowing hierarchical or other relationships to be constructed as needed. 6. Handle resolution Handle clients and handle services use the Handle Resolution Protocol [5] to conduct resolution transactions. The Handle Resolution protocol uses registered port number 2641. By default, a handle resolution request will be answered with all of the typed data associated with a handle, with the exception of the administrative data. It is also possible to request data only of a certain type. Handle clients that do not know which handle service to query for a given handle start with the Global Handle Registry, which is guaranteed to know which service contains a given handle. Within a given service, a client uses the hash table specific to the service to discover the individual server, or set of replicated servers, which can resolve the given handle. A number of handle resolution clients have been constructed, all of which utilize the Handle Client Library [6], which is currently implemented as a C library. The clients include a Web proxy server, the Handle System Resolver [4], and the Grail Web browser [7]. 7. Handle administration Handle System administration is carried out using the Handle System Administration Protocol [8]. This protocol allows the creation and administration of handles and their associated data within the Handle System. A series of APIs currently under construction on top of this protocol will be made publicly available. 8. Security Consideration The Handle System has been designed to enable secure transactions between clients and servers and to allow secure and stable storage of handle data. Development and documentation of secure practices and policies is underway. A handle does not in itself pose a security threat. When specified or used in URL context, it is subject to all the security considerations in the URL specification [3]. 9. Handle System and URL/URN/URI While the Handle System is designed to be usable in many contexts and is not a subset or extension of current UR* schemes, it can be used in conjunction with those schemes. When used within those schemes it is, of course, subject to their constraints. The Handle System is designed to provide all the fundamental requirements outlined in the URN/URI specifications [9,10]. On the other hand, the Handle System differs from the current proposed URN implementations [11,12,13] discussed in the IETF URN working group in the following ways. First of all, the Handle System defines a namespace independent of URL, and is not subject to the current namespace constraints of URL. The namespace of handles is Unicode based, and imposes no reserved or excluded characters on the handle string. This allows handles to be specified in any national language natively in a globally unique and unambiguous manner. The elimination of any reserved characters also allows any legacy naming system, such as SICI [14], to be used with no or minimum change. The Handle System is designed to support, instead of exclude, the use of user friendly names in any native language. There are situations in which using descriptive names may hurt the persistence of the name once the identified object changes its association. Objects of this nature may be better served using non-descriptive names; for example, digits only. On the other hand, there are objects for which descriptive names are desirable. The current URN/URI was defined "generally to be for machine, rather than human, consumption" [20]. It uses a subset of ASCII character set, and requires a set of reserved/excluded characters. A Human Friendly Name Service is expected to work with it. Finally, Handle URL representation is not restricted to the particular URL scheme "urn:", but can use other URL schemes such as "hdl:". The system further provides a distributed, scalable implementation of a globally unique, persistent naming service independent of DNS service. 10. History and Acknowledgment The initial design and implementation of the Handle System was part of the Computer Science Technical Reports project, funded by the Defense Advanced Projects Agency (DARPA) under Grant No. MDA-972-92-J-1029. One aspect of this project was to develop a framework for the underlying infrastructure of digital libraries. It is described in a paper by Robert Kahn and Robert Wilensky [15]. The first implementation was created at CNRI in the fall of 1994. Subsequent work on the Handle System has been supported in part by the Advanced Research Projects Agency under Grant No. MDA972-92-J-1029. The following people have contributed to the Handle System design and implementation: David Ely, William Arms, Navjeet Chabbewal, Judith Grass, Robert Kahn, Timothy Kendall, Connie McLindon, Charles Orth, Ed Overly, Varna Puvvada, John Stewart, Allison Yu-McNamara, Ron Ely, Catherine Rey, Jane Euler, Larry Lannom, and Sam Sun. We also want to acknowledge the contribution of the other members of the Computer Science Technical Reports project. 11. Author's Address Sam X. Sun 1895 Preston White Dr. Suite 100 Reston, VA 20191-5434 (703) 620-8990 ssun@cnri.reston.va.us 12. References [1] The Unicode Consortium, "The Unicode Standard, Version 2.0", Addison-Wesley Developers Press, 1996. ISBN 0-201-48345-9 [2] Yergeau, Francois, "UTF-8, A Transform Format for Unicode and ISO10646", RFC2044, October 1996. http://ds.internic.net/rfc/rfc2044.txt [3] Berners-Lee, T., Masinter, L., McCahill, M., et al., "Uniform Resource Locators (URL)", RFC1738, December 1994. http://ds.internic.net/rfc/rfc1738.txt [4] Handle System Resolver. http://www.handle.net/resolver/ [5] Handle System Client Library download site. http://www.handle.net/download.html [6] Handle Resolution Protocol. http://www.handle.net/client_spec.html [7] The Grail Internet Browser. http://grail.cnri.reston.va.us/grail/ [8] Handle Administration Protocol. http://www.handle.net/handle_admin.html [9] Sollins, K. and L. Masinter, "Functional Requirements for Uniform Resource Names", RFC 1737, December 1994. http://ds.internic.net/rfc/rfc1737.txt [10] Berners-Lee, T., "Universal Resource Identifiers in WWW" RFC 1630, June 1994. http://ds.internic.net/rfc/rfc1630.txt [11] Daniel, Ron and Michael Mealling, "Resolution of Uniform Resource Identifiers using the Domain Name System", RFC 2168, June 1997. http://ds.internic.net/rfc/rfc2168.txt [12] Daniel, Jr., Ron, "A Trivial Convention for using HTTP in URN Resolution", RFC-2169, June 1997. http://ds.internic.net/rfc/rfc2169.txt [13] Moats, Ryan, "URN Syntax", RFC-2141, May 1997. http://ds.internic.net/rfc/rfc2141.txt [14] Serial Item and Contribution Identifier (SICI) Standard. http://sunsite.berkeley.edu/SICI/ [15] Kahn, Robert and Wilensky, Robert. "A Framework for Distributed Digital Object Services", May, 1995. http://www.cnri.reston.va.us/tmp_hp/k-w.html [16] Digital Object Identifier System. http://hdl.handle.net/10.1000/1 [17] National Digital Library Program. http://hdl.handle.net/4263537/003 [18] The CNRI Registry. http://hdl.handle.net/4263537/001 [19] Defense Virtual Library. http://hdl.handle.net/4263537/002 [20] Sollins, K., "Architectural Principles of Uniform Resource Name Resolution", September 26, 1997, Work in Progress. ftp://ftp.ietf.org/internet-drafts/ draft-ietf-urn-req-frame-04.txt INTERNET-DRAFT draft-ietf-handle-system-00.txt Expires May 19, 1998