Network Working Group M.M. Mealling Internet-Draft Network Solutions, Inc. Expires: March 20, 2001 September 19, 2000 Dynamic Delegation Discovery System (DDDS) draft-ietf-urn-ddds-01 Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on March 20, 2001. Copyright Notice Copyright (C) The Internet Society (2000). All Rights Reserved. Abstract This document describes a the Dynamic Delegation Discovery System or DDDS which, when applied to a unique string will produce a series of rules that describe the various delegations that may exist based on the syntax of that string. Since the DDDS is an abstract algorithm, this specification does not define either an application or a rule database. Mealling Expires March 20, 2001 [Page 1] Internet-Draft DDDS September 2000 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 5 3.1 Components of a Rule . . . . . . . . . . . . . . . . . . . . . 5 3.2 Substitution Expression Syntax . . . . . . . . . . . . . . . . 5 3.3 The Complete Algorithm . . . . . . . . . . . . . . . . . . . . 7 4. Specifying An Application . . . . . . . . . . . . . . . . . . 9 5. Specifying A Database . . . . . . . . . . . . . . . . . . . . 11 6. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 6.1 An Automobile Parts Identification System . . . . . . . . . . 12 6.2 A Document Identification Service . . . . . . . . . . . . . . 13 References . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Author's Address . . . . . . . . . . . . . . . . . . . . . . . 15 Full Copyright Statement . . . . . . . . . . . . . . . . . . . 16 Mealling Expires March 20, 2001 [Page 2] Internet-Draft DDDS September 2000 1. Introduction The Dynamic Delegation Discovery System is used to map some unique string to data stored within the DDDS by iteratively applying string transformation rules until a terminal condition is reached. This document describes the general algorithm, not any particular application or usage scenario. It is up to other documents to describe how they use the DDDS algorithms. The DDDS's history is an evolution from work done by the Uniform Resource Name Working Group. When Uniform Resource Names[1] were originally formulated there was the desire to locate an authoritative server for a URN which by design contained no information about network locations. A system was formulated that could use a database of rules that could be applied to a URN to find out information about specific chunks of syntax. This system was originally called the Resolver Discovery System[2] and only applied to URNs. Over time other systems began to apply this same algorithm and infrastructure to other, non-URN related, systems. This caused some of the underlying assumptions to change and need clarification. This document, which is one of a series, is an update of those original URN specifications in order to allow new applications and rule databases to be developed in a standardized manner. A direct result of these clarifications and generalizations is that this document, along with [4] and [5], obsoletes RFC 2168[6] and updates RFC 2276[2]. Mealling Expires March 20, 2001 [Page 3] Internet-Draft DDDS September 2000 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. Application Unique String A string that is the target of the rewrite rules. Each possible value of the string must be unique within the space for which it is valid in order for the rewrite rules to apply. Rewrite Rule An object containing several pieces of data that, when combined and applied to an Application Unique String, produces a new key that exists in the Rule Database. Also simply known as a Rule. First Well Known Rule This is a rewrite rule that is defined by the application and not actually in the Rule Database. It is used to produce the first valid key. Terminal Rule This Rule is one where the Flags specify that the iterative process is over and that the output of applying this Rule to the Application Unique String will be the intended output of the entire process. Application A set of protocols and specifications that specify actual values for the various generalized parts of the DDDS algorithm. An Application must define the syntax and semantics of the Application Unique String, the First Well Known Rule, and which Databases are valid for the Application. Database Any store of Rules such that a unique key can identify a set of Rules that specify the delegation step used when that particular Key is used. Mealling Expires March 20, 2001 [Page 4] Internet-Draft DDDS September 2000 3. The Algorithm The DDDS algorithm is based on the concept of Rewrite Rules. These rules are given unique Keys that are collected into a database that is known as a DDDS Rule Database. A given Rule, when applied to an Application Unique String, transforms that String into new Key that can be used to retrieve a new Rule from the Rule Database. This new rule is then re-applied to the original Application Unique String and the cycle repeats itself until a terminating condition is reached. 3.1 Components of a Rule A Rule is made up of 4 pieces of information: A Priority Simply a number used to show which of two otherwise equal rules may have precedence. This allows the database to express rules that are equivalent but weighted for load balancing reasons. A set of Flags Flags are used to specify attributes of the rule that determine if this rule is the last one to be applied. The last rule is called the terminal rule and its output should be the intended result for the application. A description of Services Services are used to specify attributes of a particular delegation branch. For example, two rules may equally apply to a specific delegation decision for a string. One rule can lead to a terminal rule that produces information for use in high availability environments while another may lead to an archival service that may be slower but is more stable over long periods of time. A Substitution Expression This is the actual string modification part of the rule. It is a combination of a POSIX Extended Regular Expression[3] and a replacement string similar to SED search-replace function. See Section 3.2. 3.2 Substitution Expression Syntax The syntax of the Substitution Expression part of the rule is a sed-style substitution expression. True sed(1) substitution expressions are not appropriate for use in this application for a variety of reasons, therefore the contents of the regexp field MUST follow this grammar: Mealling Expires March 20, 2001 [Page 5] Internet-Draft DDDS September 2000 subst_expr = delim-char ere delim-char repl delim-char *flags delim-char = "/" / "!" / ... (Any non-digit or non-flag character. All ocurances of a delim_char in a subst_expr must be the same character.) ere = POSIX Extended Regular Expression repl = string / backref / repl string / repl backref string = anychar escapeddelim / escapeddelim anychar anychar = ; any character other than delim-char escapeddelim = "\" delim-char backref = "\" POS_DIGIT flags = "i" POS_DIGIT = "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9" The result of applying the substitution expression to the String MUST result in a key which obeys the rules of the Database. Since it is possible for the regular expression to be improperly specified, such that a non-conforming key can be constructed, client software SHOULD verify that the result is a legal database key before using it. Backref expressions in the repl portion of the substitution expression are replaced by the (possibly empty) string of characters enclosed by '(' and ')' in the ERE portion of the substitution expression. N is a single digit from 1 through 9, inclusive. It specifies the N'th backref expression, the one that begins with the N'th '(' and continues to the matching ')'. For example, the ERE (A(B(C)DE)(F)G) has backref expressions: \1 = ABCDEFG \2 = BCDE \3 = C \4 = F \5..\9 = error - no matching subexpression The "i" flag indicates that the ERE matching SHALL be performed in a case-insensitive fashion. Furthermore, any backref replacements MAY be normalized to lower case when the "i" flag is given. The first character in the substitution expression shall be used as the character that delimits the components of the substitution expression. There must be exactly three non-escaped occurrences of the delimiter character in a substitution expression. Since escaped occurrences of the delimiter character will be interpreted as occurrences of that character, digits MUST NOT be used as delimiters. Backrefs would be confused with literal digits were this allowed. Similarly, if flags are specified in the substitution Mealling Expires March 20, 2001 [Page 6] Internet-Draft DDDS September 2000 expression, the delimiter character must not also be a flag character. The character set(s) that the substitution expression is in and can act on are dependent both on the Application and on the Database beind used. An Application must define what the allowed character sets are for the Application Unique String. A DDDS Database specification must define what character sets are required for producing its keys and for how the substitution expression itself is encoded. The grammar required characters from above only have meaning once a specific character set is defined for the Database or Application. 3.3 The Complete Algorithm The following is the exact DDDS algorithm: 1. The First Well Known Rule is applied to the Application Unique String which produces a Key 2. That Application asks the Database for the ordered set of Rules that are bound to that Key 3. The Substitution Expression of each Rule in the list of Rules is applied to the String in the order in which they were received. In some applications and/or databases the result set can express the case where two or more Rules are considered equal. These Rules are treated as the same Rule, each one possibly having a Priority which is used to weight a random selection among the equivalent Rules (this allows for Rules to 'load balance' themselves). 4. The first/next Rule with a Substitution Expression that produces anything other than the empty string is examined to see if the parameters in the Services part of the Rule meet the requirements of the Application. 5. If the parameters in the Service part of the Rule do not match those required by the Application then go back to Step 4. 6. If the Flags part of the Rule designate that this Rule is Terminal, then apply the Substitution Expression to the String and then go to Step 8. 7. Apply the Substitution Expression to the String. The output of this rewrite becomes the new Key. To begin the next iteration, return to Step 2 and use this new Key as the Key in that step. 8. Notify the Application that the process has finished and provide Mealling Expires March 20, 2001 [Page 7] Internet-Draft DDDS September 2000 the Application with the Flags and Services part of the Rule along with the output of the last Substitution Expression. Mealling Expires March 20, 2001 [Page 8] Internet-Draft DDDS September 2000 4. Specifying An Application In order for this algorithm to have any usefulness, a specification must be written describing an application and one or more databases. In order to specify an application the following pieces of information are required: Application Unique String: This is the original string that the rewrite rules will apply to. The string must have some regular structure and be unique within the application such that anyone applying Rules taken from the same Database will end up with the same Keys. For example, the URI Resolution application would define the Application Unique String to be a URI. No application is allowed to define an Application Unique String such that the Key obtained by a rewrite rule is treated as the Application Unique String for input to a new rule. This leads to sendmail style rewrite rules which are fragile and error prone. First Well Known Rule: This is the first rule that, when applied to the Application Unique String, produces the first valid Key. It can be expressed in the same form as a Rule or it can be something more complex. For example, the URI Resolution application might specify that the rule is that the sequence of characters in the URI up to but not including the first colon (the URI scheme) is the first Key. Valid Databases: The application can define which Databases are valid. For each Database the Application must define how the First Well Known Rule's output (the first Key) is turned into something that is valid for that Database. For example, the URI Resolution application could use the Domain Name System (DNS) as a Database. The operation for turning this first Key into something that was valid for the database would be to to turn it into some DNS-valid domain-name. Additionally, for each Database an Application defines, it must also specify what the valid character sets are that will produce the correct Keys. In the URI Resolution example, the character set of a URI is 7 bit ASCII which matches fairly well with DNS's 8 bit limitation on characters in its zone files. Expected Output: The Application must define what the expected output of the Terminal Rule should be. For example, the URI Resolution application is concerned with finding servers that contain authoritative data about a given URI. Thus the output of the terminal rule would be information (hosts, ports, protocols, etc) that would be used to contact that authoritative server. Mealling Expires March 20, 2001 [Page 9] Internet-Draft DDDS September 2000 Mealling Expires March 20, 2001 [Page 10] Internet-Draft DDDS September 2000 5. Specifying A Database Additionally, any Application must have at least one corresponding Database from which to retrieve the Rules. A Database specification must include the following pieces of information: General Specification: The Database must have a general specification. This can reference other standards (SQL, DNS, etc) or it can fully specify a novel database system. This specification must be clear as to what allowed character sets exist in order to know in which character set the Rules and subsequence substitution expression are encoded in. Lookup Procedure: This specifies how a query is formulated and submitted to the database. In the case of databases that are used for other purposes (such as DNS), the specification must be clear as to how a query is formulated specifically for the database to be a DDDS database. For example, a DNS based Database must specify which Resource Records or Query Types are used. Key Format: If any operations are needed in order to turn a Key into something that is valid for the database then these must be clearly defined. For example, in the case of a DNS database, the Keys must be constructed as valid domain-names. Rule Format: The specification for the output format of a rule. Rule Insertion Procedure: A specification for how a Rule is inserted into the database. This can include policy statements about whether or not a Rule is allowed to be added. Mealling Expires March 20, 2001 [Page 11] Internet-Draft DDDS September 2000 6. Examples The examples given here are for pedagogical purposes only. They are specifically taken from fictious applications that have not been specified in any published document. 6.1 An Automobile Parts Identification System In this example imagine a system setup where by all automobile manufacturers come together and create a standardized part numbering system for the various parts (nuts, bolts, frames, instruments, etc) that make up the automobile manufacturing and repair process. The problem with such a system is that the auto industry is a very distributed system where parts are built by various third parties distributed around the world. In order to find information about a given part a system must be able to find out who makes that part and contact them about it. To facilitate this distributed system the identification number assigned to a part is assigned hierarchically such that the first 5 digits make up a parts manufacturer ID number. The next 3 digits are an auto line identifier (Ford, Toyota, etc). The rest of the digits are assigned by the parts manufacturer according to rules that the manufacturer decides. The auto industry decides to use the DDDS to create a distributed information retrieval system that routes queries to the actual owner of the data. The industry specifies a database and a query syntax for retrieving rewrite rules (the APIDA Network) and then specifies the Auto Parts Identification DDDS Application (APIDA). The APIDA specification would define the following: o Application Unique String: the part number o First Well Known Rule: take the first 5 digits (the manufacturers ID number) and use that as the Key o Valid Databases: The APIDA Network o Expected Output: EDIFAC information about the part The APIDA Network Database specification would define the following: o General Specification: a network of EDI enabled databases and services that, when given a subcomponent of a part number will return an XML encoded rewrite rule o Lookup Procedure: following normal APIDA Network protocols, ask Mealling Expires March 20, 2001 [Page 12] Internet-Draft DDDS September 2000 the network for a rewrite rule for the Key. o Key Format: no conversion is required o Rule Format: see APIDA Network documentation for the XML DTD o Rule Insertion Procedure: determined by the authority that has control over each section of the part number. I.e. in order to get a manufacturer ID you must be a member of the Auto Parts Manufacturers Association In order to illustrate how the system would work, imagine the part number "4747301AB7D". The system would take the first 5 digits, '47473' and ask the network for that Rewrite Rule. This Rule would be provided by the parts manufacturers database and would allow the manufacturer to either further sub-delegate the space or point the querier directly at the EDIFAC information in the system. In this example lets suppose that the manufacturer returns a Rule that states that the next 3 digits should be used as part of a query to their service in order to find a new Rule. This new Rule would allow the parts manufacturer to further delegate the query to their parts factories for each auto line. In our example part number the number '01A' denotes the Toyota line of cars. The Rule that the manufacturer returns further delegates the query to a supply house in Japan. This rule also denotes that this Rule is terminal and thus the result of this last query will be the actual information about the part. 6.2 A Document Identification Service This example is very similar to the last since the documents in this system can simply be thought of as the auto part in the last example. The difference here is that the information about the document is kept very close to the author (usually on their desktop). Thus there is the probability that the number of delegations can be very deep. Also, in order to keep from having a large flat space of authors, the authors are organized by organizations and departments. Let's suppose that the Application Unique String in this example looks like the following: The Application specification would look like this: o Application Unique String: the Document ID string given above o First Well Known Rule: the characters up to but not including the first '-' is treated as the first Key. Mealling Expires March 20, 2001 [Page 13] Internet-Draft DDDS September 2000 o Valid Databases: the DIS LDAP Directory o Expected Output: a record from an LDAP server containing bibliographic information about the document in XML The Database specification for the DIS LDAP Directory would look like this: o General Specification: the Database uses the LDAP directory service. Each LDAP server has a record that contains the Rewrite Rule. Rules refer to other LDAP servers using the LDAP URL scheme. o Lookup Procedure: using standard LDAP queries, the client asks the LDAP server for information about the Key. o Key Format: no conversion is necessary. o Rule Format: See the LDAP Rewrite Rule specification o Rule Insertion Procedure: See the procedures published by the entity that has authority over that section of the DIS tree. The first section, the organization, is owned by the DIS Agency. In this example, the first lookup is for the organization's Rule. At that point the organization may point the client directly at some large, organization wide database that contains the expected output. Other organizations may decentralize this process so that Rules end up delegating the query all the way down to the authors document management environment of choice. Mealling Expires March 20, 2001 [Page 14] Internet-Draft DDDS September 2000 References [1] Moats, R., "URN Syntax", RFC 2141, May 1997. [2] Sollins, K., "Architectural Principles of Uniform Resource Name Resolution", RFC 2276, January 1998. [3] The Institute of Electrical and Electronics Engineers, "IEEE Standard for Information Technology - Portable Operating System Interface (POSIX) - Part 2: Shell and Utilities (Vol. 1)", IEEE Std 1003.2-1992, ISBN 1-55937-255-9, January 1993. [4] Mealling, M.M., "A DDDS Database Using The Domain Name System", Internet-Draft draft-ietf-urn-dns-ddds-database-00.txt, May 2000. [5] Mealling, M.M., "URI Resolution using the Dynamic Delegation Discovery System", Internet-Draft draft-ietf-urn-uri-res-ddds-00.txt, July 2000. [6] Danie1, R. and M. Mealling, "Resolution of Uniform Resource Identifiers using the Domain Name System", RFC 2168, June 1997. Author's Address Michael Mealling Network Solutions, Inc. 505 Huntmar Park Drive Herndon, VA 22070 US Phone: +1 770 935 5492 EMail: michaelm@netsol.com URI: http://www.netsol.com Mealling Expires March 20, 2001 [Page 15] Internet-Draft DDDS September 2000 Full Copyright Statement Copyright (C) The Internet Society (2000). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implmentation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Acknowledgement Funding for the RFC editor function is currently provided by the Internet Society. Mealling Expires March 20, 2001 [Page 16]