Internet Draft Lewis Girod draft-girod-urn-res-require-00.txt Karen R. Sollins MIT LCS Expires December 13, 1996 June 13, 1996 Requirements for URN Resolution Systems Status of this draft This document is an Internet Draft. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts. Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a "working draft" or "work in progress." Please check the I-D abstract listing contained in each Internet Draft directory to learn the current status of this or any other Internet Draft. This Internet Draft expires December 13, 1996. Abstract This paper presents a set of requirements for systems that resolve URNs into hints for locating resources. Hints are a type of metadata that provide information about locating a resource, including but not limited to URLs and pointers to other resolution systems. The requirements fall into three broad areas: usability, security, and evolvability. With those in mind, the paper reviews what we have been able to learn about the NAPTR proposal. The NAPTR proposal has grown out of a set of several different URN resolution proposals; it uses the existing DNS infrastructure to store and serve resolution information. To this end they introduce a new kind of DNS entry called a NAPTR, or Naming Authority PoinTeR. The review of the NAPTR proposal includes our wishlist for extensions and modifications to it that would add evolutionary paths to the design without sacrificing ease of implementation. The paper then presents a sketch of a more extensive URN resolution system, presented in order to demonstrate the need for evolution in any URN resolution proposal. Introduction .................................................. 1 The goals and issues .......................................... 2 Usability and Feature Set Requirements ...................... 2.1 The Publisher ............................................. 2.1.1 The Client ................................................ 2.1.2 The Management ............................................ 2.1.3 Security and Privacy Requirements ........................... 2.2 Evolution ................................................... 2.3 Assessment of the NAPTR Scheme ................................ 3 Brief Explanation ........................................... 3.1 Definition of the New DNS Records ........................... 3.2 Assessment .................................................. 3.3 Assessing Usability ....................................... 3.3.1 Long Term Effects of Rewrite Rules ...................... 3.3.1.1 Usability for Clients ................................... 3.3.1.2 Usability for Publishers ................................ 3.3.1.3 Usability for Naming Authorities ........................ 3.3.1.4 Assessing Security and Privacy ............................ 3.3.2 Security ................................................ 3.3.2.1 Privacy ................................................. 3.3.2.2 Resistance to Attack .................................... 3.3.2.3 Assessing Evolutionary Requirements ....................... 3.3.3 Ease of Deployment ...................................... 3.3.3.1 Naming Without Semantics ................................ 3.3.3.2 Extensibility ........................................... 3.3.3.3 An Alternate Model ............................................ 4 High Level System Architecture .............................. 4.1 Clients ................................................... 4.1.1 Distributed Client Interfaces ............................. 4.1.2 Name Registries ........................................... 4.1.3 Other Parts ............................................... 4.1.4 The Architecture of a Global Namespace ...................... 4.2 Marketing vs. Distribution Models ......................... 4.2.1 Conceptual Extension of the Root Namespace Registry ....... 4.2.2 Economic and Political Implications of Registries ......... 4.2.3 Architecting Distributed Client Interfaces .................. 4.3 Distributing and Replicating a Flat Namespace ............. 4.3.1 Resolution in a Distributed System ........................ 4.3.2 Support for Unofficial Hint Information ................... 4.3.3 Conclusion ................................................ 4.3.4 A Wish List: Some Useful Additions to the NAPTR Proposal ...... 5 Notes ......................................................... 6 References .................................................... 7 Contact Information ........................................... 8 1 Introduction This document sets out some requirements which apply to URN resolution systems. By URN resolution system we mean the whole process of resolution, beginning with a URN with no hints as to how to resolve it, and ending with one or more hints describing with reasonable certainty and possibly with verifiable authenticity the location of the named resource. We also include in such a system the mechanism by which entities publish named resources. It is important to note that this system is not the only method of resolving URNs. In fact, documents containing URN references would typically be bundled with a collection of hints when they arrive from the server. In most cases these hints would provide a direct resolution; however, in some cases the hints will be out of date. It is these cases, along with the cases where a URN is entered directly by the user, in which the system we are concerned with here is used. The requirements applying to a URN resolution system center around three important design goals. While it may be neither feasible nor necessary that initial implementations support every requirement, every implementation must support evolution to systems that do support every requirement. * USABILITY: URNs are long lived identifiers. It is not sufficient for a URN resolution system merely to make it _possible_ for URNs to have long lifespans; a URN resolution system must _encourage_ the maintenance of long-lived names by virtue of its design, performance, and the economic structure it imposes. This is a very broad and openly interpretable requirement. * SECURITY: Today the amount of information published electronically is growing and it is predicted to continue. An acceptable security model must extend beyond the individual servers that provide such published information. It must support distributed security policies for not only the published information itself but also for hint information. This extends to authenticity, security from unauthorized modification, and privacy. * EVOLVABILITY: It is imperative that any deployed URN system have extensibility built into the design at many levels of the system. At the lowest levels this means that systems must be designed to be able to take advantage of new or improved transport protocols. At the middleware levels, where this work falls, functionality may be increased or improved over time. Thus within a URN resolution system, there must be a path for evolution of the resolution model. This may include both the co-existence of earlier and newer resolution models, and the eventual phasing out of older schemes gracefully. The Internet is beyond the point where a globally deployed system with broad usage can support a ``flag day'', when a transition occurs abruptly from one model to another. At the highest levels, the level of applications themselves, evolution may place changing requirements on the middleware levels. For example, if applications store, retrieve and exchange files of a limited number of types such as text and binary, then there is no need for an extensible typing system to support them. In contrast, if applications define and exchange objects of new and evolving types, the middleware substrate will be required to support that exchange. We see evolution occurring at all levels of the network. This paper proceeds as follows. First, we will expand on the set of goals listed above, describing them in more detail. This will be followed with a discussion of two URN resolution system proposals. The first is the NAPTR proposal that comes out of followup discussions from the now defunct URI working group of the IETF and a collection of proposals for URN resolution that were developed in that context. The second is our own proposal. We view the first as an interim solution that does not address all the issues, but can be made at least to support evolution. It has the advantage that it probably has a shorter deployment path. The second addresses more of the issues more effectively, but probably has a longer deployment path, and therefore could be viewed as the planned successor to the NAPTR scheme. 2 The goals and issues A URN resolution must provide the following components of functionality: * Publication of pieces of information, which implies some mechanism or set of mechanisms for making the identity of a piece of information known to an audience. * Hint publication, which implies the ability to make known one or more paths to locating a piece of information * Hint discovery, which implies the ability to learn hints, by more or less trustworthy mechanisms. There are some auxiliary functions that are important to identify as well, in particular, authentication and access control of the hint information and pricing mechanisms for storage and management of hint information. If the hint resolution mechanism is to be useful, it must be credible, and in order to achieve that there must be some degree of guarantee that the information stored, managed, and distributed is correct. This will require an authentication and access control mechanism. In addition, there may be problems related to the volume and rate of change of hint information. The more information there is and the more frequently it changes the more difficult it will be to provide a valuable, well functioning service. Hence a pricing mechanism may be needed to put some amount of negative pressure on updates to the central information sources. Both the security and pricing issues will be discussed further below. 2.1 Usability and Feature Set Requirements Usability can be evaluated from three distinct perspectives: those of a publisher wishing to make a piece of information public, those of a client requesting URN resolution, and those of the provider or manager of resolution information. We will separately address the usability requirements from each of these three perspectives. 2.1.1 The Publisher The publisher must be able to make URNs known to potential customers. From the perspective of a publisher, it is of primary importance that URNs be correctly and efficiently resolvable by potential clients. Publishers also stand to gain from long-lived URNs, since they increase the chance that references continue to point to their published resources. The publisher must also be able to choose easily among a variety of potential services that might translate URNs to location information. In order to allow for this mobility among resolution services, the architecture for resolution services specified within the IETF should not result in a scenario in which changing from one resolution service to another is an expensive operation. The publisher should be able to arrange for multiple access points to a published resource. For this to be useful, resolution services should be prepared to provide different resolution or hint information to different clients, based on a variety of information including location and the various access privileges the client might have. For example, companies might arrange for locally replicated copies of popular resources, and would like to provide access to the local copies only for their own employees. This is distinct from access control on the resource as a whole, and may be applied differently to different copies. The publisher should be able to provide both long and short term information about accessing the resource. Long term information is likely to be such information as the long term location of the resource or the location or identity of a resolution service with which the publisher has a long term relationship. One can imagine that the arrangement with such a long term ``authoritative'' resolution service might be a guarantee of reliability, resiliency to failure, and atomic updates. Shorter term information is useful for short term changes in services or to avoid short lived congestion or failure problems. For example, if the actual repository of the resource is temporarily inaccessible, the resource might be made available from another repository. This short term information can be viewed as temporary refinements of the longer term information, and as such should be more easily and quickly made available, but may be less reliable. Lastly, the publishers will be the source of much hint information that will be stored and served by the manager of the infrastructure. Since many publishers will not understand the details of the URN resolution mechanism, it must be easy and straightforward to install hint information. The publisher must be able not only to express hints, but also to verify that what is being served by the manager is correct. Furthermore, to the extent that there are security constraints on hint information, the publisher must be able to both express them and verify compliance to them easily. 2.1.2 The Client From the perspective of the client, simplicity and usability are paramount. Of critical importance to serving clients effectively is that there be an efficient protocol through which the client can acquire hint information. Since resolving the name is only the first step on the way to getting access to a resource, the amount of time spent on it must be minimized. Furthermore, it will be important to be able to build simple, standard interfaces to the resolution service so that both the client and applications on the client's behalf can interpret hints and subsequently make informed choices. The client, perhaps with the assistance of the application, must be able to specify preferences and priorities and then apply them. If the ordering of hints is only partial, the client may become directly involved in the choice and interpretation of them and hence they must be understandable to that client. On the other hand, in general it should be possible to configure default preferences, with individual preferences viewed as overriding any defaults. 2.1.3 The Management Finally, we must address the usability concerns with respect to the management of the hint infrastructure itself. What we are terming ``management'' is a service that is distinct from publishing. It involves the storage and provision of hints to the clients, so that they can find published resources. It also provides security to the extent that there is a commitment for provision of such security; this is addressed below. The management of hints must be as unobtrusive as possible. First, its infrastructure (hint storage servers and distribution protocols) should have as little impact as possible on other network activities. It must be remembered that this is an auxiliary activity and must remain in the background. Second, in order to make hint management feasible, there will need to be a system for economic incentives and disincentives. Recovering the cost of running the system is only one reason for levying charges. The introduction of payments often has a beneficial impact on social behavior. It may be necessary to discourage certain forms of behavior that when out of control have serious negative impact on the whole community. At the same time, payment policies should encourage behavior that benefits the community as a whole. Thus, for example, a small one-time charge for authoritatively storing a hint will encourage conservative use of hints. If we assume that there is a fixed cost for managing a hint, then the broader its applicability accross the URN space, the more cost effective it is. That is, when one hint can serve for a whole collection of URNs, there will be an incentive to submit one general hint over a large number of more specific hints. Similar policies can be instituted to discourage the frequent changing of hints. In these ways and others, cost effective behavior can be encouraged. Lastly, symmetric to issues of usability for publishers, it must also be simple for the management to configure the mapping of URNs to hints. It must be easy both to understand the configuration and to verify that configuration is correct. With respect to management, this requirement may have an impact not only on the information itself but also on how it is partitioned among network servers that collaboratively provide the management service. For example, it should be straightforward to bring up a server and verify that the data it is managing is correct. Since we are discussing a global and probably growing service, encouraging volunteer participants requires that, as with the DNS, such volunteers can feel confident about the service they are providing and its benefit to both themselves and the rest of the community. 2.2 Security and Privacy Requirements Although much of the information we are discussing in this document might be considered ``meta-information'', there are some important security and privacy concerns that must be addressed by a service supporting that information. By first considering the sorts of attacks that are of concern, we can then focus on the security and privacy issues that are important. The reader will notice that integrity plays less of a role here than might be expected. To the extent that servers provide access control, the information they manage will have certain integrity guarantees. Beyond that we must recognize that we are dealing merely with hint information about the location of possibly interesting resources. Therefore we believe that the benefit of providing integrity guarantees beyond those provided by the servers themselves does not outweigh the cost. Because the majority of the activity will be the distribution of hint information, the threats of concern are those affecting the maintenance of correct information to distribute and the availability of the sources of information. Although it may not be completely centralized, it is clear that hint information of the sort being discussed here will need to be concentrated in order to facilitate its discovery by potential customers. Hence the vulnerable points are the sources of the information and the distribution network among them. If one assumes that there will be principals of some sort that are responsible for the information about each URN entry in the URN resolution service, then one major threat is an attacker that masquerades as a valid principal and inserts incorrect information into the service. A second threat vector results from the fact that the service itself will be implemented by a set of servers that collaborate and share the hint information critical to their activities. By masquerading as a valid server in this pool, an attacker can both provide incorrect information to clients and provide incorrect information to other servers, which those servers will then distribute. A third threat is that if the resolution service is too centralized, service can be denied by a variety of network attacks ranging from flooding the service with queries to causing various network problems that will reduce access to the service. We can turn each of these into a security goal. * ACCESS CONTROL ON HINTS: There needs to be an authoritative version of each hint, and it must support access limited only to those principals with the right to modify it. * SERVER AUTHENTICITY: Servers and clients must be able to learn the identity of the servers with which they communicate. This will be a matter of degree and it is possible that there will be more trustworthy, but less accessible servers, supported by a larger cluster of less authenticatable servers that are more widely available. In the worst case, if the client receives what appears to be invalid information, the client should assume that the hint may be inaccurate and confirmation of the data should be sought from more reliable but less accessible data. * SERVER AVAILABILITY: Broad availability will provide resistance to denial of service. It is only to the extent that the services are available that they provide any degree of trustworthiness. _Ensuring_ privacy for clients and publishers is in some respects essentially impossible. Fortunately, assuring a reasonable degree of privacy is possible. The privacy of clients is primarily threatened by packet sniffers and servers that log requests. A server or a packet sniffer can without much difficulty record the contents of queries as they pass by and compile the information into a relation between URNs and clients. This can be combatted in two ways: by anonymizing queries through a gateway and by encryption. The gateway solution will be the most effective protection but will involve an extra step and another potential bottleneck. The encryption solution will work to a degree, but because the queries will probably be processed by widely distributed systems the decryption key will need to be widely known, seriously diminishing the protection afforded by the encryption. The reason for this is that the client's query will probably have to be parsed by several different servers in the system, and the client does not know beforehand which ones will be involved -- hence for query encryption to work the servers must all share a single key. On the other hand, to the degree that the search process is distributed, packet sniffing at a single point is less likely to reveal data about a specific person, and is hence less threatening to privacy. Furthermore, if clients have flexibility in terms of the specific services they choose to use, they can regularly switch services in the hopes of foiling a packet sniffer watching their usual access point. The privacy of publishers is much easier to safeguard. Since they are trying to publish something, in some situations privacy is probably not desired. However, publishers do have information that they might like to keep private: information about who their clients are, and information about what names exist in their namespace. The information about who their clients are may be difficult to collect depending on the implementation of the resolution system. For example, if the resolution information relating to a given publisher is widely replicated, the hits to _each_ replicated copy will need to be recorded. Of course, determining if a specific client is requesting a given name can be approached from the other direction, by watching the client as we saw above. The other privacy issue for publishers has to do with access control over URN resolution. This issue is dependent on the implementation of the publisher's authoritative URN server. URN servers can be designed to require proof of identity in order to be issued resolution information; if the client does not have permission to access the URN requested, the service denies that such a URN exists. An encrypted protocol can also be used so that both the request and the response are obscured. Encryption is possible in this case because the identity of the final recipient is known (i.e. the URN server). 2.3 Evolution One of the lessons of the Internet that we must incorporate into the development oqf mechanisms for resolving URNs is that we must be prepared for change. Such changes may happen slowly enough to be considered evolutionary modifications of existing services or dramatically enough to be considered revolutionary. They may permeate the Internet universe bit by bit, living side by side with earlier services or they may take the Internet by storm, causing an apparent complete transformation over a short period of time. There are several directions in which we can predict the need for evolution, even at this time, prior to the deployment of any such service. At the very least, the community and the mechanisms proposed should be prepared for these. First, we expect there to be additions and changes to the mechanisms. The community already understands that there must be a capacity for new URN schemes. A URN scheme will define URNs that meet the URN requirements document[Sollins94], but may have further constraints on the internal structure of the URN. The requirements document would allow for an overall plan in which URN schemes are free to specify parts of the URN that are left opaque in the larger picture. In fact, a URN scheme may choose to make public the algorithms for any such ``opaque'' part of the URN. For example, although it may be unnecessary to know the structure of an ISBN, the algorithm for understanding the structure of an ISBN has been made public. Other schemes may either choose not to make their algorithms public, or choose a scheme in which knowledge of the scheme does not provide any significant semantics to the user. In any case, we must be prepared for a growing number of URN schemes. Often in conjunction with a new URN scheme, but possibly independently of any particular URN scheme, new resolution services may evolve. For example, one can imagine a specialized resolution service based on the particular structure of ISBNs that improves the efficiency of finding documents given their ISBNs. Alternatively, one can also imagine a general purpose resolution service that trades performance for generality; although it exhibits only average performance resolving ISBNs, it makes up for this weakness by understanding all existing URN schemes, so that its clients can use the same service to resolve URNs regardless of naming scheme. In this context, there will always be room for improvement of services, through improved performance, better adaptability to new URN schemes, or lower cost. In any case, new models for URN resolution will evolve and we must be prepared to allow for their participation in the overall resolution of URNs. If we begin with one overall plan for URN resolution, into which the enhancements described above may fit, we must also be prepared for an evolution in the authentication schemes that will be considered either useful or necessary in the future. There is no single globally accepted authentication scheme, and there may never be one. Even if one does exist at some point in time, there will always be threats to it, and so we must always be prepared to move on to newer and better schemes, as the old ones become too easily spoofed or guessed. Lastly, in terms of mechanism, although we may develop and deploy a global model supported by a global scheme, we must be prepared for that top level model to evolve. Thus, if the top level model supports an apparently centralized (from a policy standpoint) scheme for inserting and modifying authoritative information, over time we must be prepared to evolve to a different model, perhaps one that has a more distributed model of authority and authenticity. If the model has no core but rather a cascaded partial discovery of information, we may find that this becomes unmanageable with an increase in scaling. Whatever the core of the model, we must be prepared for it to evolve with changes in scaling, performance, and policy constraints such as security and cost. In addition to the evolution of resolution mechanisms, we expect that the community will follow an evolutionary path towards the separation of semantics from identification. The URN requirements document suggested this path as well, and there has been general agreement in much of the community that such a separation is desirable. This is a problem that the public at large has generally not understood. Today we see the problem most clearly with the use of URLs for identification. When a web page moves, its URL becomes invalid. Suppose such a URL is embedded in some page, stored in long term storage. There are three possible outcomes to this scenario. One possibility is that the client is be left high and dry with some message saying that the page cannot be found. Alternatively, a ``forwarding pointer'' may be left behind, in the form of an explicit page requesting the client to click on a new URL. Although this will allow the client to find the intended page, the broken link cannot be fixed because the URL is embedded in a file outside of the client's control. A third alternative is that the target server supplies an HTTP redirect so that the new page is provided for the client automatically. In this case, the client may not even realize that the URL is no longer correct. The real problem with both of these latter two situations is that they only work as long as the forwarding pointer can be found at the old URL. Semantics, in this case location information, was embedded in the identifier, and the resolution system was designed to depend on the semantics being correct.[1] There are few cases in which we can expect semantics of any sort to remain valid for a long time, but in many cases references need to have long lifespans. Most documents are only useful while their references still function. We expect the evolution to separation of semantics from identification to move along at least three paths. The first will be to develop temporary aliases to capture the semantics currently embedded in identifiers. This will require additional translation, but it will allow for the development of semantics-free URNs. Second, we expect locally shared or private aliases to arise, again supported by a translation mechanism and allowing for the long-term storage of global, semantics-free URNs. Such an aliasing scheme may be used to permit local aliases for named resources as well as to present these aliases to users in lieu of the URNs themselves. Lastly, we expect there may be a development of global aliases. These will be more user friendly ``names'' that would be shared on a much larger scale, and might be defined in some global registry. This may include trademarked names as well as names in extremely common use. As with the other alias systems, a facility for translation is needed. However, in this case, since the system of aliases is of global scope, the translation facility will be very slow if each time an alias is translated it needs to query a centralized or even reasonably distributed global registry. In order to achieve acceptable speeds, the translation facility will need to maintain a local cache, possibly in cooperation with other nearby alias caches. Clearly this is all postulation at present, but it is provided here to demonstrate some of the scope of evolution for which we must be prepared. A third evolutionary requirement is even more mechanical than the others. At any point in time, the community is likely to be supporting a compromise position with respect to resolution. We will probably be operating in a situation balanced between feasibility and the ideal, perhaps with policy controls used to help stabilize the service. Ideally, the service would be providing exactly what the customers wanted and they in turn would not request more support than they need. Since we will always be in a situation in which some service provision resources will be in short supply, some form of policy controls will always be necessary. For example, suppose hint entries are being submitted in such volume that the hint servers are using up their excess capacity and need more disk space. An effective solution to this problem would be a mechanism such as a pricing policy. This pricing policy has the dual effect of both encouraging conservative use of resources and collecting revenue for the improvement and maintenance of the system. As technology changes and the balance of which resources are in short supply changes, the mechanisms and policies for controlling their use must evolve as well. To conclude, we find that there are three broad areas in which we have requirements for URN resolution: usability, security and privacy, and evolution. Usability can be viewed from three perspectives, namely that of the publisher of a resource, of the client wishing to obtain access to the resource, and of the manager of information needed to make the published material accessible to the client. With respect to security, we find that there are requirements with respect to control of access to URN resolution information for purposes of storing and modifying it, privacy of the information, and denial of service attacks. Lastly, a URN resolution service must be prepared for several sorts of evolutionary development. At the most abstract level, the architecture itself may evolve. At a more concrete level, the evolution toward the goal of separation of semantics from identification may lead to functional changes. Lastly, with the progress of supporting technologies, we expect that there will be necessary changes in the realization of the service. Any scheme must be prepared to co-exist with revisions of itself or new approaches. With these requirements in mind, we will investigate several alternative proposals. 3 Assessment of the NAPTR Scheme What we are calling the ``NAPTR Proposal'' is a proposal for constructing a top-level URN resolution system. The term ``NAPTR'' stands for Naming Authority PoinTeR, which is a type of database record around which this proposed system revolves. This proposal represents an approximate consensus of the people working on URN resolution in the IETF at the time the URI Working Group was dissolved. As of the writing of this document, we have not seen a full written description of the NAPTR proposal, so the following description is based on conversations and email (in addition to some pseudo-code) with various members of the NAPTR group. This description represents the best of our ability to understand that proposal, but we make no guarantees of its correctness. 3.1 Brief Explanation A URN has been defined to be of the format: URN:[NSI]:[OS] where NSI is a valid NameSpace Identifier and OS is an Opaque String which is interpretable only within the context defined by the NSI. The proposed model is that NSIs are allocated by some central authority to identify naming schemes. Within each naming scheme there would be many naming authorities that are charged with distributing globally unique names from within their individual namespaces. Each naming scheme would also have a global plan for the delegation of naming authorities and the namespaces they manage. Given that a client has a URN that needs to be resolved, the resolution process proceeds as follows: 1. The client pulls the NSI token out of the URN, and issues a DNS lookup on ``[NSI].urn.net''. This query returns an NAPTR record or an SRV record. The intention of an NAPTR record is that it tells how to find a Naming Authority and lists the services offered there. In order to determine the Naming Authority for a given URN, an NAPTR contains a rewrite rule which is used to rewrite the URN into a string identifying the naming authority. An SRV record describes an authoritative URN resolution server and supplies the necessary information to contact it. 2. If the client received an SRV record, it connects to the service described on the appropriate port and attempts to resolve the URN. If this attempt is successful, then the resolution is complete. 3. If the client received an NAPTR record, it applies the enclosed rewrite rule to the URN. This results in a string that is issued in another DNS lookup. This lookup returns one or more NAPTR records and SRV records, which are processed with steps (2) and (3). There is an algorithm for preventing looping and for selecting which rewrites to try first. 3.2 Definition of the New DNS Records These are the structures involved, from the Perl code implementing the client side of this system, as of April 1, 1996: # This structure represents one of the NAPTR records returned fron the registry $naptr_rec = { services => $string, # List of resolution services offered? pattern => $string, # regexp substitution pattern }; # This structure represents one of the SRV records returned from DNS $srv_rec = { priority => $string, # preference weight => $int, # weight if preference is equal port => $int, # protocol port target => $string # host to contact }; # This structure represents the parsed values in the Service field of the # NAPTR record. $service_rec = { flag => $string, # flag if not a real value # 0 = I don't know, keep going # 1 = I know all # 2 = normal case protocol => $string, # network protocol ServiceClass => [ @list ] # list of service classes }; # This is a builtin list that contains the valid Services that this # client speaks. The API to the potential library should have some way # for the client to add/remove things from this list. %valid_ServiceClasses= ("n2l" => 1, # URN to URL "n2ls" => 1, # URN to URLs "n2r" => 1, # URN to resolver "n2rs" => 1, # URN to resolvers "n2c" => 1 # URN to URC ); 3.3 Assessment The NAPTR scheme has the very useful property of being implementable right away using existing infrastructure, i.e. the DNS. This ease of implementation and deployment makes it imperative that there be an evolutionary path making possible the satisfaction of any requirements that are not completely satisfied. We will proceed through the various requirements, assessing the extent to which the requirements are satisfied, and examining key areas in which the requirements are either not satisfied or not addressed by the NAPTR specification. 3.3.1 Assessing Usability From the client's perspective, the NAPTR system seems to meet most of the requirements. Since its protocol is based on DNS lookups, it is well known and already widely implemented. Building a user interface for the client side is not too difficult, and essentially involves the question: how does the user control the parts of the resolution process that involve decisions? For example, the NAPTR system allows the client to select sites that return certain types of information, for example, whole resources versus only their URCs, and this control could be passed to the client. Another issue surrounds the fact that the result of the resolution will likely be a collection of SRV records; perhaps some heuristics in the browser might select the order in which to try these records, taking into account locality or other specialized information. In some cases it may be desirable that the client make this decision. However, when we consider performance, it is hard to tell how fast the system will operate. In part, the reason is that the specification of the system is not clear about how the system will be organized in typical situations, despite that fact that performance will be tightly coupled to the structure of the namespaces. Certainly when the service is initially deployed the performance ought to be pretty good regardless of structure. The topmost level of dispatch (i.e. the registry of NSIs) can be cached locally for long periods of time, so at first not more than one or two DNS lookups should be necessary to locate an authoritative service. 3.3.1.1 Long Term Effects of Rewrite Rules The effect of the rewrite rules in the NAPTR system is not easily characterized. In general systems based on rewrite rules tend to be difficult to configure and are not generally understood by ``mere mortals''. There is no question that the rules are flexible enough to accomplish the tasks required by a top-level URN resolution system. On the other hand, the desire of the publisher and user to maintain long-lived URNs may be compromised by the quirkiness and difficulty of managing rewrite rule databases. Initially, we expect that most of the URNs within a namespace will be found using a simple, unified set of rules. However, as time goes on we can expect the namespace to become fragmented to an increasing degree, as sets of long-lived URNs are served by different authoritative services. Fragmentation will have a direct impact on performance of this system, because it will mean sending additional records back to the client and in many cases will also require additional DNS lookups. We expect that these performance problems will not effect the users of the system because, rather than suffer performance problems, the administrators of the system will simply discourage the sorts of activities that would cause performance problems. While this may solve the performance issues, it has the net effect of more tightly binding naming authority to name resolution and of restricting the publisher's mobility. This inability to gracefully respond to fragmentation is the result of the way the rewrite rules work. The rewrite rules are a very flexible technique for _generalizing_ classes of URNs; however in this system all _discrimination_ of URNs is done on the client side. For example, consider the URN ``URN:INET:mit.edu/lcs/ana/mesh/paper.ps''. Suppose that MIT's URN resolution system is set up so that each lab runs its own server; hence LCS would run one to cover its collection of URNs. Then the resolution process would select out ``mit.edu'' as the naming authority, and do the DNS lookup ``mit.edu.urn.net''. This would return another record with a rewrite rule that selects out the lab name, in this case ``lcs'', and perhaps looks up ``lcs.mit.edu.urn.net''. This returns an SRV record indicating the authoritative URN server for LCS. (Keep in mind that this is only one way of setting up these rules!) Now suppose that the ANA group decides to run an experimental new kind of server. There are two ways to cause the top-level system to point to this new server, and both of these methods result in an incremental cut in performance. First, a new record could be added to the system so that when ``lcs.mit.edu.urn.net'' is looked up, two records are returned, one SRV record that points to the authoritative LCS server, and a new NAPTR record that rewrites ANA URNs to ``ana.lcs.mit.edu.urn.net''. When that is looked up in the DNS, a SRV pointing to the experimental service is returned. This is the process of installing an _exception_ into the top level resolution mechanism (i.e. all LCS URNs _except_ the ANA URNs go to server A, while the ANA URNs go to server B.) While having one exception is not necessarily a problem, if any number of exceptions are installed, a large number of records will be sent back in response to the DNS lookup. In order to get around this problem, the second method must be used. The second method pushes the generalization back a level, and changes the rewrite rule at the ``mit.edu'' level to select out the whole ``lcs/ana'' token rather than simply the ``lcs'' token. This is not always possible, depending on how the namespace is set up; in some cases two separate rewrite rules will be required at the ``mit.edu'' level, one covering labs that do not have sub-departments and one covering the labs that do. In order to make the system work, each department under LCS might need a separate SRV record, many redundantly pointing to the generic LCS server. Then when the ``lcs/ana'' token is matched, the URN is rewritten into ``ana.lcs.mit.edu.urn.net'', and the correct server is located with a single additional DNS lookup. In summary, the cost of exceptions can be borne in two ways: first, in additional records passed back, additional client-side computation, and additional DNS lookups; or second, in a considerable expansion of the size of the DNS database for that particular part of the namespace. 3.3.1.2 Usability for Clients From the perspective of a client, the system should work well. The client side of the protocol is fairly easy to implement, the only difficulties being loop avoidance and heuristics for choosing which servers to try first. The performance of the system could become poor, but as we speculated above it is more likely that other positive attributes will be sacrificed prior to performance. One way that the system could be improved from the client's perspective would be to implement some security features. For example, the resolution system could set aside room for authentication information in the SRV and NAPTR records. The client could then verify the authenticity of the records using whatever security infrastructure is available (the authentication protocol itself should probably remain separate from the resolution system). 3.3.1.3 Usability for Publishers From the perspective of a publisher, many important issues are left unspecified by the NAPTR system, leaving it up to each specific naming authority to select or invent a policy. * Publishers who want to use a different URN resolution service will need to communicate this somehow to the naming authority who gave them the names, so that they can modify the NAPTR and SRV records in their registry. Nothing has been specified explaining how this is done or how easy it is. In theory entities can run their own naming authority registry but that has the net effect of slowing down the system. * Publishers who acquire names from a given naming authority are forced to contract with that naming authority to ensure that the naming authority continues to support the publisher's namespace in their registry. This must be done for as long as the names remain valid; if for some reason the naming authority ceases to function, someone else must take over that service. Except for the owners of the top-level namespaces (i.e. directly located from the first rewrite rule, and not delegated), this scenario is not guaranteed to foster the maintenance of long-lived names. * Publishers do not in general have direct control of the top-level resolution of their names, and, like the DNS, there is no authentication model for validating information obtained from the system. Since the system is intended to be global, it stands to reason that there will be no way for the publisher to maintain tight control over the process of resolution. However, with the right design, it is possible to make it very difficult to misdirect clients with forged hint information, by implementing an authentication protocol. Unfortunately, a facility for authentication is not built into the NAPTR system and will not be easy to add. * Publishers do not have any means at their disposal to make temporary changes to the resolution process. In order to make any kind of change, that change must be registered with their parent Name Authority, a process which may take time, may be subject to a time-consuming review to ensure that it is a legal change (i.e. not a security violation), and may cost money. All of these potential hassles are undesirable when a change must only be made for a brief time. The NAPTR system does not specify how temporary changes would be handled, instead leaving it at the discretion of the NA Registry management to define a policy. 3.3.1.4 Usability for Naming Authorities From the perspective of a naming authority, exceptions will be expensive for many reasons. First, it means an increase in the size of the registry. Second, and more important, it means an increase in the complexity of the registry. It will be very important for naming authorities to understand the tangle of rewrite rules that make up their registry. It is likely that naming authorities will need to set down very specific guidelines delineating a rewrite rule policy and security policy in order to keep the registry working. But the structure of these policies is neither specified nor suggested by the NAPTR system as we understand it. It is unclear whether rewrite rules provide increased flexibility or merely an increased need for security. Given a system of arbitrary rewrite rules, it is difficult to determine the layout of the namespace, i.e. who has authority to specify which rewrite rules, unless the rules are constrained in some systematic way. These constraints may make the rules no more flexible than a simpler system -- but this is difficult to assess because the necessary constraints have not been suggested. Many naming authorities may proceed on the assumption that exceptions and other things that make their lives more complex are simply not that important. However, easy implementation of exceptions is vital to the maintenance of long lived URNs (by long lived we mean perhaps ten or a hundred years). There are a number of ways in which exceptions are likely to arise: naming authorities split up into pieces, naming authorities go out of business and are split up, corporations merge and split, students at universities take collections of URNs with them, etc. It is hard to predict the sorts of uses to which these systems will be put, but fragmentation over the course of time is a given. In the NAPTR system, the installation of exceptions is possible but it will be difficult enough that in many cases URNs will be thrown away rather than bear the cost and hassle of maintaining them. This will have the singular effect of negating their distinctive feature of being immutable over the lifetime of the named resource. 3.3.2 Assessing Security and Privacy Like many systems designed for the Internet, issues of security have not been given a full treatment in the design of the NAPTR system. Part of the difficulty stems from the fact that it is built on top of the DNS, which is well known to lack ``security considerations''. 3.3.2.1 Security Because the NAPTR system is designed around the DNS, issues of security have not been addressed very thoroughly. The systems of rewrite rules involved with the NAPTR system also have the side effect of muddying the waters surrounding security. In part the reason for this is that the security restrictions are defined on an individual name authority basis, and hence do not imply a global security policy. The NAPTR system does not clearly and securely specify delegation of naming authority, except at the uppermost level of the NSI registry; presumably some authoritative entity regulates new entries to the NSI registry. At all other levels collections of rewrite rules crafted by individual name authorities specify delegation of authority, if any. In some sense authority is never actually delegated, because any name authority higher up in the resolution path has the power to revoke authority to those sub-authorities below them by inserting or deleting the appropriate rewrite rules. Furthermore, the specification of delegation is at best difficult to understand because the systems of rewrite rules can be arbitrarily complex. In summary, (1) the ``higher level'' name authorities have too much power (which evokes the question of whether hierarchies of naming authority are likely to arise at all), and (2) the rewrite rules make understanding the delegation of name authority difficult for the _operators_ of the name authority registry, as well as the publisher. 3.3.2.2 Privacy In general, the NAPTR system does a good job of providing privacy to publishers. The information on URN resolution servers is entirely private and need not be known at any of the higher levels in the resolution process. Name allocation by publishers within their designated block of namespace is unconstrained and private. The data stored outside of the publisher's physical control is kept to a minimum, and represents a minimal invasion of privacy. The privacy of clients of NAPTR is reasonably well protected. In general, all of the network traffic associated with the client will be the result of DNS lookups, so a traffic analysis performed at a point outside the client's LAN will not be able to determine which principal on the client's LAN is performing DNS lookups. However, there is no way to foil a traffic analysis performed on the LAN itself. 3.3.2.3 Resistance to Attack Because it is based on the DNS, the NAPTR system inherits its susceptibility to many kinds of attack. Falsified information can be provided by fake servers, and no authentication model has been proposed. Furthermore, incorrect information can be provided by real servers that have been corrupted. Provision of falsified information can be done at any level of the name authority hierarchy, potentially denying service to both publishers and clients. Denial of service is one issue that is not addressed effectively by the NAPTR system. Resolution pathways alternative to those defined by the rewrite rules do not really exist. The NAPTR system implements a model of distributed authority which is intended to make it difficult to corrupt the system from above. That is, the power to make alterations to a NA Registry is alone held by the maintainer of the registry. However, in the event that a registry is corrupted, the publisher whose data has been corrupted has no recourse or means to repair the damage outside of complaining to the maintainer of the corrupted NA registry. If the maintainer caused the problem deliberately, this may not be effective. Finally, packet sniffing can be used to make records of queries, although this is true of a wide variety of protocols. 3.3.3 Assessing Evolutionary Requirements The most critical requirements for a URN resolution system are those that make it possible to update and upgrade the system in the future. It is unreasonable to expect the first implementation of a URN resolution system to have all of the features that will eventually be desired; in many cases the need for such features has not yet arisen. Soon after a URN resolution system is accepted, a large base of compatible client software will be developed. After this initial implementation effort, it will be much more difficult to update the client software to understand new protocols, because the need is much less immediate. If the capacity to evolve is not built into the initial client protocol, future systems will be much harder to deploy. The requirements for URN resolution systems focus on three key areas in which evolvability is important: ease of deployment, evolution toward the separation of semantics from naming, and extensibility at a variety of levels. The NAPTR system's strong suit is ease of deployment; it uses the DNS as the backbone of its distributed registry and requires only modifications to the DNS software. 3.3.3.1 Ease of Deployment There is no question that the NAPTR system can be deployed quickly. * The client-side protocol uses a modified version of existing DNS software to locate the proper URN resolution server. * The construction of Name Authority registries requires a modified version of existing DNS server software. Initially, the registries will be extremely simple, requiring the addition of only a few new records to existing DNS server databases. * The new records will be located in their own portion of the DNS namespace so they need not interfere with the existing DNS hierarchy (although it is not required that they reside in the ``*.urn.net'' domain -- this depends on the rewrite rule configuration). With the pressures of cost recovery, commercialization, and social management, economic mechanisms are coming to the Internet. While no economic model is specifically suggested, it is easy to see how a tenable economic model would arise in the context of the NAPTR system. Given the NAPTR system, it seems likely that Name Authorities would hand out blocks of their namespaces and assess incremental charges per unit time during which they keep the registry information for each block of namespace valid. Additional charges might be levied in the event of complicated exceptions, etc. This model seems likely because it parallels the incremental cost of maintaining the DNS servers and registry databases. This model also makes being a Name Authority a lucrative business, since the publisher whose URNs reside there must contract with that specific Authority or else change their URNs. Given such a model, the naming authority market would be relatively monopolistic. Although it would encourage rapid deployment, at the same time it would tend to discourage long-lived URNs. In contrast, the market in URN resolution servers should be a highly competitive one under the NAPTR system. 3.3.3.2 Naming Without Semantics In order to meet the long term goal of long-lived URNs, it is important that semantics be separated from naming to as great a degree as possible. This is not a new concept; it is interesting to note that the original design of the DNS called for three distinct layers of naming: a routing-conscious layer (i.e. the IP address space), a unique long-lived naming layer (i.e the current DNS namespace), and a layer of user friendly aliases which was never implemented. The NAPTR system as specified neither encourages nor discourages semantics-laden names. We do not suggest that restrictions should be placed on URNs -- certainly publishers must be allowed to choose whatever names they want -- but providing support for user friendly names that cannot be used as embedded references would at least encourage the separation of semantics from naming. The issues surrounding aliases are considerable, and will not be solved right away. There are plans to include support for user friendly names in the NAPTR system, but at present it is unclear exactly what support will be provided. 3.3.3.3 Extensibility Several kinds of extensibility are required of a URN resolution system. * A URN resolution system must be able to support future naming schemes. The NAPTR system exhibits a great deal of flexibility in supporting new naming schemes through the use of arbitrary regular expression based rewrite rules. * A URN resolution system must be able to support new authentication protocols. The NAPTR system does not specify any form of authentication. In order to add authentication protocols later, either the client support would need to be modified or the protocols would have to be implemented as a separate system, with distinct client support. In either case, modifications to the client will be required. * A URN resolution system must be able to support new URN resolution services. The NAPTR system provides this extensibility using the SRV records, which specify host, port, and protocol. The SRV records also provide information about the types of services provided by the specified service. This design is highly extensible to new services. * A URN resolution system must include a gateway protocol that allows other ``top level'' URN resolution systems to be used. Without such a feature designed into a client-side implementation, it will be very difficult to upgrade to a system that offers better features. It will also make it impossible to allow clients to choose from multiple competing top-level systems. As currently specified, the NAPTR system does not require clients to support a gateway protocol. Without such a protocol designed-in it will be extremely difficult to phase in new URN resolution systems. In part the reason for this is that the client side of the NAPTR system exclusively uses DNS queries to locate resolution information. In order to cause a NAPTR client to proxy to another system, the DNS would need to be egregiously hacked. Fixing this problem is simple; it should be specified that clients implement, in addition to the normal NAPTR client protocol, a very simple gateway protocol which sends the URN to a proxy and gets back a list of hints, perhaps in the form of SRV records. In summary, the NAPTR system is extensible in a many of the required modes, but presently lacks the most important extensibility, a protocol for using another top level system through a gateway. It also fails to leave room for authentication protocols which may become desirable. This problem could easily be corrected; for example, SRV and NAPTR records could contain a field for authentication information. 4 An Alternate Model Not all the requirements outlined in this draft are of immediate concern. However, many of them will become more important as the extent and importance of URN resolution grows. Since URNs derive their utility from being long-lived, it stands to reason that one or more usable URN systems will be needed for as long as the URNs are used, and that entirely new systems may evolve as well. In this section we present a different URN resolution model as an example of a system that provides more of the flexibility specified in the requirements. In section 5 we assess exactly what kinds of extensibility are needed to implement such a system. 4.1 High Level System Architecture In this section we lay out a high level architecture that is open and extensible. It is based on five types of entities, connected to each other by flexible gateway protocols. New developments in any of these parts should entail a fairly smooth transition. * Authoritative URN resolution servers. These servers are operated by or on the behalf of publishers. They provide authoritative resolution of specific collections of URNs. These servers are the endpoints of the system we are describing here. The implementation and protocols involved are not discussed in this document. * Name authority registries. These are servers that maintain the authoritative map of ownership over a given root namespace. For each owned namespace, authoritative URN resolution servers can be specified to handle the URNs in that namespace. There would likely be separate registries for ISBN, INET, and ISO. * Distributed client interface systems. These are distributed systems that serve the information stored in name authority registries, specifically hints for finding appropriate authoritative URN resolution servers. Through distribution and replication they provide an efficient interface through which clients can locate the right authoritative URN server. Hints may be injected by clients into these distributed systems as well, for many reasons which will be explained later. Hints are returned to the client along with the official hints from the registry. When a distributed client interface receives a URN that falls outside of the namespace it serves, it may be a good idea to require that system to pass the URN on to a system that does serve that namespace. This way the client doesn't need to worry about choosing a distribution system based on the URN. * Top Level ``URN:*'' Registry. In order to maintain order in the ``URN:*'' namespace, there will need to be a registry that allocates non-conflicting naming schemes. This registry would also identify for each naming scheme the official name authority registry for that scheme, as well as a list of distributed client interfaces that serve that scheme. * Clients. These are users with URN-enabled applications, such as browsers. The application sends a query containing the URN to a local distributed client interface, and eventually a collection of hint information is returned. This hint information typically points to URN resolution servers which may resolve the URN in question. The system we are trying to describe here has primarily to do with the name registries and the distributed client interfaces; the other elements, that is, the clients and the Authoritative URN resolution services, are more on the fringe of the system. In the next few sections we will go into more detail about how the name registries and client interfaces might work. In section 4.2, we specify a new architecture for a global namespace. This architecture builds on the decisions already reached in the URN community, and presents an expanded architecture for delineating naming authority that maintains compatibility with existing ``legacy'' namespaces while adding expressiveness and power. The name authority registries will implement this architecture. In section 4.3, we sketch out the workings of one possible implementation of a distributed client interface system. Here we discuss how hints are intended to work, why they are a good idea, and what additional protocols are necessary to implement them. In the remainder of this section, we will lay out the overall architecture of this alternate model. The effect we want to achieve is a decoupling of the keeping of registries from the production and maintenance of distributed client interfaces. This way, new types of distributed system can be installed more easily, and new registries (i.e. new namespaces) can be added more easily. Toward this end several protocols need to be implemented between the parts of the system. 4.1.1 Clients In this system, clients typically send their queries to the distributed system specified in their application's configuration. This configuration may specify a backup system in the event that the usual system is down or sluggish. The queries contain URNs to resolve, and may also specify various information about the perferred type or format of the data returned. If the system they use does not serve the URN they want directly, the system is obliged to proxy the query to a system that does. Information about which systems serve which naming schemes is available from the central ``URN:*'' registry. When the hints are returned to the client, the client may authenticate them to make sure they are genuine. The hints are then used to continue the resolution process by looking up the specified authoritative URN resolution servers and querying them about the URN in question. It is important to note here that we are assuming that all of these systems are used fairly infrequently. They form the last resort -- after all other hint information has failed. We assume that servers of documents would generally maintain collections of hints to supply with each document. These hints would as a rule be sufficient to locate the document; if the hint telling the location is out of date, the hint telling the most recent authoritative resolver might not be. However, when a URN is discovered but all its associated hints are outdated, the systems we are discussing here are necessary to make sense of it. When more up-to-date hint information comes back to the client, the client must then use that information to complete the resolution process. This typically involves contacting the suggested authoritative URN resolution services, using whatever protocol is convenient. As a rule, the ``official'' hint information will be sufficient. An authenticator included with each hint can be used to verify the identity of the hint's author. In the event that the official information fails or is for some reason suspect, the unofficial hint information can be used. In some cases, there will be unofficial hints that are authentically written by the owner of the namespace or the author of a document; these would be most likely to be correct. The authentication can proceed via whatever infrastructure is convenient. At the moment authentication is not in wide use, but hints can easily be specified to have a signature or certificate field. 4.1.2 Distributed Client Interfaces In this system, distributed client interfaces maintain their internal organization through their private protocols, and apart from that maintain three gateway interfaces with the rest of the world. * Standard Client Interface. This does not rule out the possibility that a distributed system would understand a specialized client interface as well (with extra features, etc.) It merely specifies that all distributed client systems should understand the same minimal gateway protocol, for example, sending a URN and getting back a collection of hints. Such a protocol makes proxying possible. * Hint Passing Interface. This interface is necessary so that hints injected into one distributed system migrate to the others. When a submitted hint is found to lie within a given naming scheme, that hint is forwarded to all other distributed client systems that also serve that naming scheme. The information about which other systems serve a given naming scheme is retrieved from the ``URN:*'' registry. * Bulk Transfer Interface. This interface makes it possible to receive a name registry's database and subsequent updates in bulk. The data can optionally be translated to have a particular direction of hierarchy so that naming schemes with incompatible hierarchy schemes can coexist. 4.1.3 Name Registries In this system, the name registries support the other half of the bulk transfer interface. Some registries may want to select the systems that serve their databases. Because transferring the database involves a reasonable amount of bandwidth, it will probably be necessary to avoid sending it to impostors, etc. The database itself has a value as information as well, so it may even be worth while to encrypt the bulk transfers and updates. The name registry for ``URN:*'' is special; it is the meta-registry. There is exactly one of these and it is analogous to the NSI registry in the NAPTR proposal. The main difference is that the client never accesses it unless it is necessary to locate a distributed client interface to use. In general, its main purpose is to store for each registry information about which systems are serving that database. The registry also contains contact information for the systems and registries and the ports on which they support the various standard gateway interfaces. 4.1.4 Other Parts There are other pieces of infrastructure that we don't address here. First, we don't specify very much about how interfaces to Authoritative URN services work. In part the reason for this is that there are many unresolved questions about how to handle meta-data, versioning, format selection, content negotiation, etc., which are all essentially irrelevant to the top level of URN resolution. Second, we don't specify how authentication works. We are designing room for authentication protocols into the system, but we leave what the security infrastructure does with the authentication certificates open. A workable security infrastructure would be highly useful, and hopefully one will be developed. 4.2 The Architecture of a Global Namespace Our alternative system centers around a more sophisticated model for designating naming authority. Currently, models of name delegation are implemented and managed by neutral parties and standards organizations. For example, delegation in the DNS namespace is administered by IANA, ISO administers the ISO namespace, and ISBN is administered by an association of publishers. While each namespace has its own set of rules and policies, the data they maintain is structurally very similar. Unfortunately, all three of these current models have shortcomings. The DNS is the namespace most integrated into popular computer culture, but this sudden success has begun to undermine its elegant distribution model. The ISO naming scheme is flexible, powerful, and has a good distribution model, but is underutilized, in part because the names are formed from numbers and do not have a simple mnemonic form. The ISBN namespace is very effective at naming books but is rarely used outside that domain. 4.2.1 Marketing vs. Distribution Models There is a need for an effective model of name delegation, and that effectiveness must be maintained in the context of market forces. The DNS is a good example of a well thought-out system that is currently succumbing to the desires of an increasingly commercial Internet. Because its distribution model is dependent on subdelegation and the formation of hierarchies, as the root of the DNS namespace grows disproportionately to the levels beneath it an increasing fraction of the namespace gets replicated monolithically rather than being distributed. Unfortunately, subdelegation turns out to be an unpopular activity in the ``*.com'' domain, for example, resulting in a very flat namespace, for several reasons: * The desire for semantically significant names, the need for complete control over the ``look and feel'' of the names (that is, except for the ``.com'' part...) and the threat of competitors stealing them * Lack of any obvious organizational hierarchical structure * Reluctance to share a common root (other than ``.com'') with a competitor, thus becoming dependent on a DNS server over which they share authority Given these reasons for the failure of the DNS to adapt to changing situations, how can a new system be built that avoids these pitfalls? One key to answering this question is in the recognition that the present demand for semantic nicknames is not going to disappear. Furthermore, as long as names have a mnemonic or semantic component it will be impossible to expect the namespace to have any particularly convenient structure. We therefore must resign ourselves to keeping a big flat namespace which must be broken up into convenient-sized pieces, distributed and replicated. It is important to note that this assumption does not belittle the importance of semantics-free namespaces! On the contrary, semantics-free namespaces are important to the goal of having long-lived names. However, for a semantics-free namespace to be popular there must also be a semantics-laden layer of aliases or nicknames which serves as an ephemeral and dynamic human interface to the semantics-free namespace. These nicknames would never be included in machine-processable references; whenever a link or bookmark is kept, the semantics-free permanent name is substituted for the nickname. In conclusion, we need to bite the bullet and produce a distributed client interface system capable of satisfying lookup queries against a large catalog of names that does not necessarily exhibit a dependable hierarchical structure. Such a system must be both distributed and replicated, preferably automatically. To the extent that the system is based on a large centrally maintained master list there is a need for alternate maintenance processes in order to guarantee that the system would operate despite failures of the centralized portion of the system. Such a system is more complex than the DNS, but, especially in the light of recent improvements in cross-platform compatibility, it is within the realm of possibility. 4.2.2 Conceptual Extension of the Root Namespace Registry We are also suggesting an extension and standardization of the current notion of ``root'' namespace delegation. Rather than have central registries that record only a few top level naming authorities and expect those authorities to support sub-delegation with their own resources, registries compliant with this extended model would be designed not only to accommodate a large number of delegations at the top level, but also to provide means of ``officially'' registering sub-delegations that fit into the standard delegation model. That is, when a naming authority _officially_ delegates a portion of its namepace to another entity, that entity must register authority over that portion with the central registry. Note that official delegation is never required, but the option to do so always exists. For example, suppose MIT owns the ``URN:INET:edu.mit/*'' namespace, which in turn contains the namespaces of its various labs and departments, including ``URN:INET:edu.mit/lcs/*'' and ``URN:INET:edu.mit/eecs/*''. MIT has the option either to delegate those parts of their namespace to the labs that use them officially, or to formulate its own internal security policy to determine who has permission to alter what portions of their URN resolution information. A system set up this way has a number of advantages: 1. It makes it possible for existing portions of a naming authority's namespace to be transferred to a new owner with no strings attached. When ownership is transferred officially, the original owner is no longer in control of any part of the resolution process for that space of URNs; in return the original owner is absolved from all responsibility over that space. When naming authority is delegated in this model, queries are referred to the current owner of a namespace directly from the top level rather than being proxied by the previous owner, as would be the case with the DNS. 2. Naming authority is clearly and securely delineated. The mapping of URNs to URN resolvers is performed in a two step search, first determining which entity owns a given URN, and second using whatever information is provided by that entity to locate a resolution service. This guarantees that resolution information provided by an official naming authority cannot interfere with URNs owned by other official naming authorities. 3. Registering namespaces ``offically'' with the root namespace registry has the effect of flattening the namespace. However, as we have argued previously, semantics in naming will tend to keep namespaces relatively flat anyway. We expect that the additional entries introduced when the namespace is flattened in this way will not increase the size of the root namespace by a significant factor. Standing alone, this model is insufficient. There are two major objections to it which must be addressed. First, there is an implicit assumption that name authority is to be designated on a certain ``shape'' of boundary. This shape is analogous to a hierarchical file system, where lexical prefixes such as ``URN:INET:*'' are analogues of subdirectories. Write permission on a directory and write permission on the files contained within it can be the property of different owners. That is, IANA might own the subspace ``URN:INET:*'' but would delegate authority over subspaces such as ``URN:INET:edu.mit/*'' to other entities. Most legacy namespaces seem to be easily transformed into this form (for example, the DNS names we have used have had their domain names reversed.) If some new namespace arises which uses an incompatible model for delegation of authority, a gateway protocol can be employed to resolve those URNs. The second objection is not so easily answered. This objection has to do with the potential danger of having a large centralized registry. Even if distributed systems can be built to serve it efficiently, there remains the danger that the centralized portion breaks or becomes corrupt. We believe that these problems are mitigated by the hint system described later in this document. Essentially, if the centralized registry breaks, the distributed client interface systems will continue to serve the most recently obtained data. By injecting authenticatable hints into the distributed systems, incorrect data can be corrected and missing data can be supplied. Since the system allows for impromptu and easy hint correction and update, incorrect behavior at an central service can be mitigated, as will be seen below. 4.2.3 Economic and Political Implications of Registries In order to ensure consistency, some authority will need to be in charge of maintaining each registry. The implementation of a distributed client interface can be entirely decoupled from the maintenance of the registry itself, and can furthermore be done by multiple competing services. The choice of which top-level system to use would be offered to the user, making new technology and competing services available to existing clients. Different namespaces (i.e. DNS vs. ISBN) might have separate official registries, but in general the distributed client interface systems would either distribute all the data from each of the registries or proxy the queries that are not handled to a different distributed system that does. The authority that keeps the registry is not required to be the authority which maintains the root of that namespace. For example, in the Internet today IANA manages the root DNS namespace (that is, acts as a naming authority), while the NIC maintains the root DNS servers (that is, acts as a client interface). Today, in order to allocate a new domain name, an applicant must pay $50 per year to IANA, which then enters that new domain into its databases. Further names within that sub-namespace are free, but the owner of the namespace is responsible for maintaining the servers that resolve them. In this new system, the owner of the root namespace, (suppose it is IANA) can sub-delegate subspaces of the root to other entities. Those entities must then register the subspaces with the ``URN:INET'' registry, possibly paying a registration fee at that time. Once a sub-namespace is registered, any names falling in that subspace are resolved using the information supplied by the owner of the namespace; so registration is only necessary when a namespace is delegated to a different owner, or when an owner wants to change their information in the registry. The creation of new names in a registered namespace is not regulated in any way by the registry. To make it simpler to construct new distributed systems and to add registries for new namespaces, it makes sense to define a standard protocol between registries and distributed systems. This protocol would allow bulk access to the database, returning the data in a canonical format. It will also provide updates to subscribers using the same canonical format. This canonical format would present hierarchy in a standardized way by making all the trees branch in a uniform direction. Once the hierarchies are all branching the same way, the entries in the database can simply be listed in lexical order, and the process of breaking it up for distribution can begin. For example, DNS names would need to be rearranged so that the hostname tokens are reversed, i.e. ``lcs.mit.edu'' -> ``edu.lcs.mit''. Similar techniques can be found that allow most legacy namespaces to fit this canonical form. Many namespaces, such as ISO, need no transformation since they are already hierarchicalized in that order. Many distributed systems might choose to maintain their own replicated copy of those registries that they distribute and serve, in order to figure out how to best organize the distribution. 4.3 Architecting Distributed Client Interfaces As we have indicated in the previous sections, in our model we assume an essentially flat namespace. It may be the case that parts of the namespaces we serve do exhibit a useful form of hierarchy; however since we cannot depend on this we assume a worst case. Given this assumption, in order to provide scalable access to the naming system, we must build a system of cooperating name servers which distribute and replicate the database. We call systems providing this function (i.e. an interface between the client and the name authority registries) distributed client interface systems. There may be many distributed client interfaces to the same namespace. This will pave the way for new client interfaces and new distribution mechanisms to be put into place. Each distributed client interface system acquires the registry databases in bulk using the aforementioned bulk transfer interface. It is advised that all client interface systems provide access to the entire space of URNs; for those spaces in which authoritative information is not served directly, resolution requests should be proxied to another service. While building these systems is not trivial, recent improvements in cross-platform compatibility mean that the largest impediment to constructing such a system in the near future will be the existing infrastructure, should it be too inflexible to accommodate a new resolution mechanism. We have been working out some of the details of a distributed client interface, and hope to have a testbed implementation in the next few months. The next section sketches out our basic strategy for distribution and replication. 4.3.1 Distributing and Replicating a Flat Namespace Our distribution system relies on the fact that the database to be distributed can be collected in one place temporarily.[2] Once the database is assembled, it is broken up into pieces of approximately equal size. These ``chunks'' of the database must be of a reasonable size, since each participating server will be responsible for storing and serving one or more ``chunks'', as well as for having a certain amount of excess capacity to store updates and hints. Once the database is broken up, new nodes are constructed to form an n-ary search tree. In order to make the nodes more uniformly sized, ``leaf elements'' may be shifted upwards into the nodes forming the search tree. Once the search tree is planned out, it is installed on a set of servers in the system. Once it is installed and running, clients begin to use it. As the installed servers become loaded, those parts of the tree experiencing the heaviest loads are replicated on other available servers. This process of replication results in the ``growth'' of a robust distributed system from a planted seedling. Actually getting this growth process to work requires careful analysis and simulation, as well as the discovery and implementation of a working set of protocols. However, there are benefits: * If a major rebalancing is needed, a new system can be ``grown''. This means that the update procedures can be made simpler (i.e. no need to redistribute the data in the field). * The system as a whole must be robust. In some sense, it is always in a state of dynamic balance, some parts trying to grow while others shrink. Perturbations on the scale of individual servers are part of the day to day process. * All configuration should be as automated as possible. Furthermore, in terms of security, in a highly dynamic and self-configuring system composed of relatively untrusted nodes, the dynamicism counteracts many of the security weaknesses brought about by the fact that the cooperating servers are untrusted. The reason for this is that it is impossible to predict which portion of the database a given server will be serving, thus reducing the likelihood of gaining advantage from serving a particular portion. Replication is handled using a two-tiered structure. Groups of nearby replicated servers form _replication_groups_. A leader is chosen from each replication group to join a _supergroup_ which represents the totality of the system responsible for serving one particular part of the database. Updates to the database are flooded to the members of each replication group in a way similar to floodd[Danzig]. The administration of the system is accomplished by the leaders of the groups and supergroups. The agent responsible for generating the system in the first place (i.e. collecting the data, etc.) would also maintain the supergroup leaders. Note that the system can continue to function despite the loss of the ``leaders'', since their primary duty is to maintain the load balance. In fact, the hard part about implementing a system like this is figuring out a protocol for doing load balancing. That is, from an abstract perspective, the system has the logical shape of a search tree (i.e. supergroups point to other supergroups), but in reality individual servers link to other individual servers. In the event that a server becomes overloaded, servers that link to that server should move their links to point to less loaded servers. In order to implement this there must be ways of discovering the underutilized servers of a given target supergroup (i.e. other servers in a given replication group or in a neighboring group in the same supergroup). In the event that all servers of a given supergroup are sufficiently loaded, new servers should be assigned to that supergroup, possibly after being removed from a less loaded supergroup. 4.3.2 Resolution in a Distributed System The resolution process starts when a client sends a query containing a URN to be resolved to one of the participating servers. There may be a requirement that all clients who want access to the system do so through their own participating server, in order to push the cost of running the system out to the users. When a server receives a query, it first checks to see if the part of the database it is storing includes the URN in the query. If not, it routes the query to its parent in the search tree. If the URN in question is contained within the bounds of the database chunk, then that URN matches either a pointer to a server lower in the search tree or a record of namespace ownership. If the URN matches a pointer the query is routed according to that pointer. If the URN falls into one of the URN spaces listed in the database chunk, then the owner of that URN has been determined. Next, the official hints provided by the owner of the namespace must be searched, hopefully determining which authoritative URN resolution service handles that URN. Typically, the amount of information provided by the owner is fairly limited, and is authenticated by the owner of the namespace and possibly by the root name authority. This information can also be augmented by unofficial hints, as is described in the next section. The most apparently relevant hints (i.e. those relevant to the smallest URN space containing the URN in question) are then returned to the client. Depending on the name scheme, policy may require that official hints be authenticated by the root name registry. If there are unofficial hints as well, some of them may be authentically issued by the owner of the namespace (for example to correct a temporary outage), some may be authentically issued by the owner of the resource being named, and some may be completely unauthentic and misleading. The application will need to decide which hints to try first; for example, it may automatically use the official information by default, but raise a flag when other (possibly useful) information is available. If the user suspects that unofficial or inauthentic hints might be helpful, it should be easy to try them. The servers in the distributed system use a protocol in which the state of the search is passed from one server to the next. When a query is received it is processed and either an answer is sent back to the client or the query is passed off to the next server in the search path. This passing process can be done with acknowledged UDP packets. This means that some state about the transaction must be kept from the time the query is received until the time the returning ACK is received. 4.3.3 Support for Unofficial Hint Information It is advisable that these distribution systems incorporate the facility to store short-lived unofficial hint information along with the official hints from the name registry database for the following reasons: 1. The distributed systems and the official registry may be slow and/or expensive to change. Hence in many instances it may be desirable to inject hints directly into the various distributed resolution systems. The idea would be that injected hints are relatively fast-acting and cheap or free, but that the guaranteed quality of service is low. Probably a shared system can be constructed, in which users run a server in order to have an access point and that server is used in the distribution process. However, unlike the DNS, a user's personal information would not necessarily be stored on their node. 2. Topological variation can be implemented by injecting special localized hints that do not get copied onto topologically distant servers. 3. By providing an alternate pathway for information to be stored and served, the system becomes more tolerant of errors and outages in the registry. It also provides a way for publishers to fight denial of service; that is, if the registry revokes a publisher's authority or changes registered information without authorization, hints can be submitted which correct this. In order to make this work, authentication protocols must be implemented so that the hints can be verified to be produced by the publisher.[3] It is important to remember that this hint information is a best-effort service. No guarantees are made about the reliability of the hint system. The policies governing hints, such as how much room there is to store hints, whether hints are allowed at all, whether hints cost money and whether hints require authentication, are all at the discretion of the designer of the distributed system. The only necessary agreement between designers is some kind of standard hint-passing protocol for disseminating new hints to the distributed systems that accept them. In general, once a distributed system of the sort we describe has been implemented, hints are fairly easy to manage. Since the database is already distributed in the structure of a search tree, it is a simple operation to figure out on which servers a given hint belongs, by simply tracing the same route as a client's query. Once the right supergroup has been found, new hints can be distributed by a flooding protocol in a fashion similar to that in which updates to the database progress. Of course, hints are placed at a much lower priority in terms of consistency, but this suggests that they can be deployed more rapidly. One of the important benefits of a hint system is that while it is less careful and less certain, it is much harder to subvert on a global scale. There is no centralized source for hints and no consistent distribution path. Interfering with a particular hint requires either gaining access to each individual server and deleting the hint, or finding a clever way to subvert the hint policy, such as flooding the system with bogus hints in the same part of the database. Hint policies, such as requiring payment for hint submission, can be constructed to make these tricks less appealing. 4.3.4 Conclusion This concludes a vague sketch of the sort of distributed system we would need to serve a large flat namespace. Our objective here is not to lay out a detailed and polished design. We merely wish to suggest that something along these lines might have promise; we hope that whatever systems are put in place to answer the needs of the present include sufficient flexibility and extensibility to allow new and interesting systems to be tested. 5 A Wish List: Some Useful Additions to the NAPTR Proposal Specifically, what kinds of extensibility would be most useful? If we are to assume that something like the NAPTR proposal is going to be implemented, it would be useful to add several points of extensibility to its design. 1. In addition to the current client protocol, a simple gateway protocol which sends the URN to a configured address and receives in return a collection of hint records in some format. These records might be similar to the SRV and NAPTR records, but would include additional fields such as owner, date, authentication, etc. The user could choose between the gateway protocol and the regular client protocol. 2. An authentication protocol could be built into the NAPTR system as well. Even if initially it is rarely used, by building parts of it in future security infrastructure can be integrated more easily. 3. The rewrite rules in the NAPTR system make it confusing. With unconstrained rules it quickly becomes difficult to specify and verify which URNs are whose property. One thing that would mitigate this problem would be to formulate a scenario for the typical or expected way that the system would be used (i.e. how the rules are set up). Another helpful piece of work would be the formulation of sample policies for restricting rewrite rules and for managing name authority servers. 6 Notes [1] In some systems, semantics in the names is used during the initial attempt at resolution, and in cases where the semantics is found to be invalid a slower fallback method is used. While broken semantics will not cause the system to fail, such designs will degrade over time as an increasing fraction of the resolutions invoke the ``special'' case. If names are intended to last well beyond the lifetime of the semantic content, this type of design will not perform well in the long run. [2] A more complex algorithm might do away with this requirement, but for simplicity we will assume it here. [3] A careful analysis of the issues involved with denial of service to publishers reveals a great deal of complexity. However, a system in which authenticated hints may be injected for free and limits are placed on the number produced by any one entity provides a reasonable level of protection. 7 References [Sollins94] Sollins, K, and Masinter, L, Functional Requirements for Uniform Resource Names, RFC 1738, December 1994. [Danzig] Danzig, P, DeLucia, D, Obraczka, K, Massively Replicating Services in Wide-Area Internetworks, Computer Science Dept., University of Southern California. 8 Contact Information Lewis D. Girod Research Programmer (617) 253-3440 girod@lcs.mit.edu Karen R. Sollins Research Scientist (617) 253-6006 sollins@lcs.mit.edu Expires December 13, 1996