Network Working Group M. Mealling draft-ietf-urn-naptr-rr-00.txt Network Solutions, Inc. Category: Standards Track R. Daniel Expires: May, 1999 DATAFUSION, Inc. Nov 1998 The Naming Authority Pointer (NAPTR) DNS Resource Record Status of this Memo =================== This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." To view the entire list of current Internet-Drafts, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). Abstract: ========= This document describes a DNS Resource Record which specifies a rewrite rule that, when applied to an existing string will produce a new domain. Reasons for rewriting a domain vary from URN Resource Discovery Systems to moving out of date services to new domains. This document updates the portions of RFC2168 specifically dealing with the definition of the NAPTR record. Introduction: ============= This RR was originally produced by the URN [3] Working Group as a way to encode rule-sets in DNS so that the delegated sections of a URI could be decomposed in such a way that they could be changed and re-delegated over time. The result was a Resource Record that included a regular expression which would be used by a client program to rewrite a string into a domain-name. Regular expressions were chosen for their compactness to expressivity ratio allowing for a great deal of information to be encoded in a rather small DNS packet. The function of rewriting a string according to the rules in a the record has usefullness in several different applications. This document defines the basic assumptions that all of those applications must adhere to. It does not define the reasons for why the rewrite is used or what the expected outcomes are or what they are used for. Those are specified by applications that define how they use the NAPTR record and algorithms within their contexts. Mealling & Daniel [Page X1] RFC nnnn The NAPTR Resource Record Type November 1998 Flags and other fields are also specified in the RR to control the rewrite procedure in various ways or to provide information on how to communicate with the host at the domain-name that was the result of the rewrite. The final result is a RR that has several fields which interact in a non-trivial but implementable way. This document specifies those fields and their values. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. NAPTR RR Format =============== The format of the NAPTR RR is given below. The DNS type code for NAPTR is 35. [1] [2] Domain TTL Class Order Preference Flags Service Regexp Replacement Domain The domain name this resource record refers to. This is the 'key' for this entry in the rule database. This value will either be the first well known key (.uri.net for example) or a new key that is the output of a replacement or regexp rewrite. Beyond this it has the standard DNS requirements. [1] TTL Standard DNS meaning. [1] Class Standard DNS meaning [1] Order A 16-bit integer specifying the order in which the NAPTR records MUST be processed to ensure the correct ordering of rules. Low numbers are processed before high numbers, and once a NAPTR is found whose rule "matches" the target, the client MUST NOT consider any NAPTRs with a higher value for order (except as noted below for the Flags field). Preference A 16-bit integer which specifies the order in which NAPTR records with equal "order" values SHOULD be processed, low numbers being processed before high numbers. This is similar to the preference field in an MX record, and is used so domain administrators can direct clients towards more capable hosts or lighter weight protocols. A client MAY look at records with higher preference values if it has a good reason to do so such as no understanding the preferred protocol or service. Flags A String giving flags to control aspects of the rewriting and interpretation of the fields in the record. Flags are single characters from the set [A-Z0-9]. The case of the alphabetic characters is not significant. Mealling & Daniel [Page X2] RFC nnnn The NAPTR Resource Record Type November 1998 At this time only three flags, "S", "A", and "P", are defined. "S" means that the next lookup should be for SRV [4] records instead of NAPTR records. "A" means that the next lookup should be for A records. The "P" flag says that the remainder of the application side algorithm shall be carried out in a Protocol-specific fashion. The new set of rules is identified by the Protocol specified in the Services field. The record that contains the 'P' flag is the last record that is interpreted by the rules specified in this document. The new rules are dependent on the application for which they are being used and the protocol specified. For example, if the application is a URI RDS and the protocol is WIRE then the new set of rules are governed by the algorithms surrounding the WIRE HTTP specification and not this document [17]. The remaining alphabetic flags are reserved for future versions of the NAPTR specification. The numeric flags may be used for local experimentation. The S, A, and P flags are all mutually exclusive, and resolution libraries MAY signal an error if more than one is given. (Experimental code and code for assisting in the creation of NAPTRs would be more likely to signal such an error than a client such as a browser). We anticipate that multiple flags will be allowed in the future, so implementers MUST NOT assume that the flags field can only contain 0 or 1 characters. Finally, if a client encounters a record with an unknown flag, it MUST ignore it and move to the next record. This test takes precedence even over the "order" field. Since flags can control the interpretation placed on fields, a novel flag might change the interpretation of the regexp and/or replacement fields such that it is impossible to determine if a record matched a given target. The "S" and "A" flags are called 'terminal' flags since they halt any looping rewrite algorithms. If those flags are not present then clients may assume that another NAPTR RR exists at the domain-name produced by the current rewrite rule. Since the "P" flag specifies a new algorithm, it may or may not be terminal, thus the client cannot assume that another NAPTR exists since this case is determined elsewhere. DNS servers MAY interpret these flags and values and use that information to include appropriate SRV and A records in the additional information portion of the DNS packet. Clients are encouraged to check for additional information but are no required to do so. Service Specifies the service(s) available down this rewrite path. It may also specify the particular protocol that is used to talk with a service. A protocol MUST be specified if the flags field states that the NAPTR is terminal. If a protocol is specified, but the flags field does not state that the NAPTR is terminal, the next lookup MUST be for a NAPTR. The client MAY choose not to perform the next lookup if the protocol is unknown, but that behavior MUST NOT be relied upon. The service field may take any of the values below (using the Augmented BNF of RFC 2234[5]): Mealling & Daniel [Page X3] RFC nnnn The NAPTR Resource Record Type November 1998 service_field = [ [protocol] *("+" rs)] protocol = ALPHA *31ALPHANUM rs = ALPHA *31ALPHANUM // The protocol and rs fields are limited to 32 // characters and must start with an alphabetic. i.e. an optional protocol specification followed by 0 or more resolution services. Each resolution service is indicated by an initial '+' character. Note that the empty string is also a valid service field. This will typically be seen at the beginning of a series of rules, when it is impossible to know what services and protocols will be offered by a particular service. The actual format of the service request and response will be determined by the resolution protocol, and is the subject for other documents. Protocols need not offer all services. The labels for service requests shall be formed from the set of characters [A-Z0-9]. The case of the alphabetic characters is not significant. The list of "valid" protocols for any given NAPTR record is any protocol that implements some or all of the services defined for a NAPTR application. Currently, THTTP [6] is the only protocol that is known to make that claim at the time of publication. Any other protocol that is to be used must have documentation specifying: * how it implements the services of the application * how it is to appear in the NAPTR record (i.e., the string id of the protocol) The list of valid Resolution Services is defined by the documents that specify individual NAPTR based applications. One example is RFCXXXX, "Resolution of Uniform Resource Identifiers using the Domain Name System" [7], from which this document was extracted. It is worth noting that the interpretation of this field is subject to being changed by new flags, and that the current specification is oriented towards telling clients how to talk with a URN resolver. Regexp A STRING containing a substitution expression that is applied to the original string held by the client in order to construct the next domain name to lookup. The grammar of the substitution expression is given in the next section. The regular expressions MUST NOT be used in a cumulative fashion, that is, they should only be applied to the original string held by the client, never to the domain name produced by a previous NAPTR rewrite. The latter is tempting in some applications but experience has shown such use to be extremely fault sensitive, very error prone, and extremely difficult to debug. Replacement The next NAME to query for NAPTR, SRV, or A records depending on the value of the flags field. This MUST be DNS compressed. Mealling & Daniel [Page X4] RFC nnnn The NAPTR Resource Record Type November 1998 Substitution Expression Grammar: ================================ The content of the regexp field is a substitution expression. True sed(1) substitution expressions are not appropriate for use in this application for a variety of reasons, therefore the contents of the regexp field MUST follow the grammar below: subst_expr = delim-char ere delim-char repl delim-char *flags delim-char = "/" / "!" / ... (Any non-digit or non-flag character other than backslash '\'. All occurances of a delim_char in a subst_expr must be the same character.) ere = POSIX Extended Regular Expression (see [6], section 2.8.4) repl = dns_str / backref / repl dns_str / repl backref dns_str = 1*DNS_CHAR backref = "\" 1POS_DIGIT flags = "i" DNS_CHAR = "-" / "0" / ... / "9" / "a" / ... / "z" / "A" / ... / "Z" POS_DIGIT = "1" / "2" / ... / "9" ; 0 is not an allowed backref value domain name (see RFC-1123 [7]). The result of applying the substitution expression to the original URI MUST result in a string that obeys the syntax for DNS host names [7]. Since it is possible for the regexp field to be improperly specified, such that a non-conforming host name can be constructed, client software SHOULD verify that the result is a legal host name before making queries on it. Backref expressions in the repl portion of the substitution expression are replaced by the (possibly empty) string of characters enclosed by '(' and ')' in the ERE portion of the substitution expression. N is a single digit from 1 through 9, inclusive. It specifies the N'th backref expression, the one that begins with the N'th '(' and continues to the matching ')'. For example, the ERE (A(B(C)DE)(F)G) has backref expressions: \1 = ABCDEFG \2 = BCDE \3 = C \4 = F \5..\9 = error - no matching subexpression The "i" flag indicates that the ERE matching SHALL be performed in a case-insensitive fashion. Furthermore, any backref replacements MAY be normalized to lower case when the "i" flag is given. Mealling & Daniel [Page X5] RFC nnnn The NAPTR Resource Record Type November 1998 The first character in the substitution expression shall be used as the character that delimits the components of the substitution expression. There must be exactly three non-escaped occurrences of the delimiter character in a substitution expression. Since escaped occurrences of the delimiter character will be interpreted as occurrences of that character, digits MUST NOT be used as delimiters. Backrefs would be confused with literal digits were this allowed. Similarly, if flags are specified in the substitution expression, the delimiter character must not also be a flag character. The Basic NAPTR Algorithm ============================================ The behavior and meaning of the flags and services assume an algorithm where the output of one rewrite is a new key that points to another rule. This looping algorithm allows NAPTR records to incrementally specify a complete rule. These incremental rules can be delegated which allows other entities to specify rules so that one entity does not need to understand _all_ rules. The algorithm starts with a string and some known key (domain). NAPTR records for this key are retrieved, those with unknown Flags or inappropriate Services are discarded and the remaining records are sorted by their Order field. Within each value of Order, the records are further sorted by the Preferences field. The records are examined in sorted order until a matching record is found. A record is considered a match iff: 1) it has a Replacement field value instead of a Regexp field value. or 2) the Regexp field matches the string held by the client. The first match MUST be the match that is used. Once a match is found, the Services field is examined for whether or not this rule advances toward the desired result. If so, then the rule is applied to the target string. If not, the process halts. The domain that results from the regular expression is then used as the domain of the next loop through the NAPTR algorithm. Note that the same target string is used throughout the algorithm. This looping is extremely important since it is the method by which complex rules are broken down into manageable delegated chunks. The flags fields simply determine at which point the looping should stop (or other specialized behavior). Since flags are valid at any level of the algorithm, the degenerative case is to never loop but to lookup the NATPR and then stop. In many specialized cases this is all that is needed. Implementors should be aware that the degenerative case should not become the common case. Application Specifications ========================== It should be noted that the NAPTR algorithm is the basic assumption about how NAPTR works. The reasons for the rewrite and the expected output and its use are specified by documents that define what applicatiions the NAPTR record and algorithm are used for. Any document that defines such an application must define the following: * The first known key or how to build it * The valid Services and Protocols * What the expected use is for the output of the last rewrite * The validity and/or behavior of any 'P' flag protocols. * The general semantics surrounding why and how NAPTR and its algorithm are being used. Currently the only example of such a document is RFCXXXX, "Resolution of Uniform Resource Identifiers using the Domain Name System" [7]. Mealling & Daniel [Page X6] RFC nnnn The NAPTR Resource Record Type November 1998 Examples ============================================ NOTE: These are examples only. They are taken from ongoing work and may not represent the end result of that work. They are here for pedagogical reasons only. Example 1 --------- NAPTR was originally specified for use with the a Uniform Resource Name Resolver Discovery System. This example details how a particular URN would use the NAPTR record to find a resolver service. Consider a URN namespace based on MIME Content-Ids. The URN might look like this: urn:cid:199606121851.1@mordred.gatech.edu (Note that this example is chosen for pedagogical purposes, and does not conform to the CID URL scheme.) The first step in the resolution process is to find out about the CID namespace. The namespace identifier [3], cid, is extracted from the URN, prepended to urn.net. 'cid.urn.net' then becomes the first 'known' key in the NAPTR algorithm. the NAPTR for cid.urn.net looked up and returns a record: cid.urn.net ;; order pref flags service regexp replacement IN NAPTR 100 10 "" "" "/urn:cid:.+@([^\.]+\.)(.*)$/\2/i" . We have only one NAPTR response, so ordering the responses is not a problem. The replacement field is empty, so we check the regexp field and use the pattern provided there. We apply that regexp to the entire URN to see if it matches, which it does. The \2 part of the substitution expression returns the string "gatech.edu". Since the flags field does not contain "s" or "a", the lookup is not terminal and our next probe to DNS is for more NAPTR records where the new domain is 'gatech.edu' and the string is the same string as before. Note that the rule does not extract the full domain name from the CID, instead it assumes the CID comes from a host and extracts its domain. While all hosts, such as mordred, could have their very own NAPTR, maintaining those records for all the machines at a site as large as Georgia Tech would be an intolerable burden. Wildcards are not appropriate here since they only return results when there is no exactly matching names already in the system. Mealling & Daniel [Page X7] RFC nnnn The NAPTR Resource Record Type November 1998 The record returned from the query on "gatech.edu" might look like: gatech.edu IN NAPTR ;; order pref flags service regexp replacement IN NAPTR 100 50 "s" "z3950+N2L+N2C" "" z3950.tcp.gatech.edu IN NAPTR 100 50 "s" "rcds+N2C" "" rcds.udp.gatech.edu IN NAPTR 100 50 "s" "http+N2L+N2C+N2R" "" http.tcp.gatech.edu Continuing with our example, we note that the values of the order and preference fields are equal in all records, so the client is free to pick any record. The flags field tells us that these are the last NAPTR patterns we should see, and after the rewrite (a simple replacement in this case) we should look up SRV records to get information on the hosts that can provide the necessary service. Assuming we prefer the Z39.50 protocol, our lookup might return: ;; Pref Weight Port Target z3950.tcp.gatech.edu IN SRV 0 0 1000 z3950.gatech.edu IN SRV 0 0 1000 z3950.cc.gatech.edu IN SRV 0 0 1000 z3950.uga.edu telling us three hosts that could actually do the resolution, and giving us the port we should use to talk to their Z39.50 server. Recall that the regular expression used \2 to extract a domain name from the CID, and \. for matching the literal '.' characters separating the domain name components. Since '\' is the escape character, literal occurances of a backslash must be escaped by another backslash. For the case of the cid.urn.net record above, the regular expression entered into the zone file should be "/urn:cid:.+@([^\\.]+\\.)(.*)$/\\2/i". When the client code actually receives the record, the pattern will have been converted to "/urn:cid:.+@([^.]+\.)(.*)$/\2/i". Example 2 --------- Even if URN systems were in place now, there would still be a tremendous number of URLs. It should be possible to develop a URN resolution system that can also provide location independence for those URLs. This is related to the requirement that URNs be able to grandfather in names from other naming systems, such as ISO Formal Public Identifiers, Library of Congress Call Numbers, ISBNs, ISSNs, etc. The NAPTR RR could also be used for URLs that have already been assigned. Assume we have the URL for a very popular piece of software that the publisher wishes to mirror at multiple sites around the world: http://www.foo.com/software/latest-beta.exe We extract the prefix, "http", and lookup NAPTR records for http.uri.net. This might return a record of the form Mealling & Daniel [Page X8] RFC nnnn The NAPTR Resource Record Type November 1998 http.uri.net IN NAPTR ;; order pref flags service regexp replacement 100 90 "" "" "!http://([^/:]+)!\1!i" . This expression returns everything after the first double slash and before the next slash or colon. (We use the '!' character to delimit the parts of the substitution expression. Otherwise we would have to use backslashes to escape the forward slashes, and would have a regexp in the zone file that looked like "/http:\\/\\/([^\\/:]+)/\\1/i".). Applying this pattern to the URL extracts "www.foo.com". Looking up NAPTR records for that might return: www.foo.com ;; order pref flags service regexp replacement IN NAPTR 100 100 "s" "http+L2R" "" http.tcp.foo.com IN NAPTR 100 100 "s" "ftp+L2R" "" ftp.tcp.foo.com Looking up SRV records for http.tcp.foo.com would return information on the hosts that foo.com has designated to be its mirror sites. The client can then pick one for the user. Example 3 --------- A non-URI example is where a NAPTR is used to specify the available mappings from a domain-name to telephony based endpoints. In this example the regular expression field is not used since the important information is encoded within the services field. 0.0.0.4.6.2.6.5.8.6.4.e164.int. IN NAPTR 100 10 "s" "h323call+N2R" "" tele2.se. IN NAPTR 102 10 "s" "potscall+N2R" "" tele2.se. IN NAPTR 102 10 "s" "smtp+N2R" "" tele2.se. In these examples the domain is an encoded E164 telephone number. The services field specifies that, for this particular telephone number, the services that are available are h323call, potscall and smtp; and that "tele2.se" is the target that provides those services. Since the flag is "s" then the next step should be a query for an SRV record which will contain specific information about the "tele2.se" domain. Advice to domain administrators: ================================ Beware of regular expressions. Not only are they a pain to get correct on their own, but there is the previously mentioned interaction with DNS. Any backslashes in a regexp must be entered twice in a zone file in order to appear once in a query response. More seriously, the need for double backslashes has probably not been tested by all implementors of DNS servers. Mealling & Daniel [Page X9] RFC nnnn The NAPTR Resource Record Type November 1998 The "a" flag allows the next lookup to be for A records rather than SRV records. Since there is no place for a port specification in the NAPTR record, when the "A" flag is used the specified protocol must be running on its default port. The URN Syntax draft defines a canonical form for each URN, which requires %encoding characters outside a limited repertoire. The regular expressions MUST be written to operate on that canonical form. Since international character sets will end up with extensive use of %encoded characters, regular expressions operating on them will be essentially impossible to read or write by hand. Notes: ====== - A client MUST process multiple NAPTR records in the order specified by the "order" field, it MUST NOT simply use the first record that provides a known protocol and service combination. - When multiple RRs have the same "order", the client should use the value of the preference field to select the next NAPTR to consider. However, because of preferred protocols or services, estimates of network distance and bandwidth, etc. clients may use different criteria to sort the records. - If the lookup after a rewrite fails, clients are strongly encouraged to report a failure, rather than backing up to pursue other rewrite paths. - Note that SRV RRs impose additional requirements on clients. Acknowledgments: ================= The editors would like to thank Keith Moore for all his consultations during the development of this draft. We would also like to thank Paul Vixie for his assistance in debugging our implementation, and his answers on our questions. Finally, we would like to acknowledge our enormous intellectual debt to the participants in the Knoxville series of meetings, as well as to the participants in the URI and URN working groups. References: =========== [1] Mockapetris, P., "Domain names - implementation and specification", STD 13, RFC 1035, November 1987. [2] Mockapetris, P., "Domain names - concepts and facilities", STD 13, RFC 1034, November 1987. [3] Moats, Ryan, "URN Syntax", RFC-2141, May 1997. [4] Gulbrandsen, A. and P. Vixie, "A DNS RR for specifying the location of services (DNS SRV)", RFC-2052, October 1996. [5] Crocker, D., Overell, P. "Augmented BNF for Syntax Specifications: ABNF", RFC-2234, November 1997. [6] Daniel R. "A Trivial Convention for using HTTP in URN Resolution". RFC2169. June 1997. [7] Mealling, M., Daniel, R., "Resolution of Uniform Resource Identifiers using the Domain Name System". RFCXXXX. November 1998. Mealling & Daniel [Page 10] RFC nnnn The NAPTR Resource Record Type November 1998 [8] IEEE Standard for Information Technology - Portable Operating System Interface (POSIX) - Part 2: Shell and Utilities (Vol. 1); IEEE Std 1003.2-1992; The Institute of Electrical and Electronics Engineers; New York; 1993. ISBN:1-55937-255-9 [9] Braden, R., "Requirements for Internet Hosts - Application and and Support", RFC-1123, Oct. 1989. IANA Considerations =================== The only registration function that impacts the IANA is for the values that are standardized for the Services and Flags fields. To extend the valid values of the Flags field beyond what is specified in this document requires a published specification that is approved by the IESG. The values for the Services field will be determined by the application that makes use of the NAPTR record. Those values must be specified in a published specification and approved by the IESG. Security Considerations ======================= The interactions with DNSSEC are currently being studied. It is expected that NAPTR records will be signed with SIG records once the DNSSEC work is deployed. The rewrite rules make identifiers from other namespaces subject to the same attacks as normal domain names. Since they have not been easily resolvable before, this may or may not be considered a problem. Regular expressions should be checked for sanity, not blindly passed to something like PERL. This document has discussed a way of locating a service, but has not discussed any detail of how the communication with that service takes place. There are significant security considerations attached to the communication with a service. Those considerations are outside the scope of this document, and must be addressed by the specifications for particular communication protocols. Mealling & Daniel [Page 11] RFC nnnn The NAPTR Resource Record Type November 1998 Author Contact Information: =========================== Michael Mealling Network Solutions 505 Huntmar Park Drive Herndon, VA 22070 voice: (703) 742-0400 fax: (703) 742-9552 email: michaelm@netsol.com URL: http://www.netsol.com/ Ron Daniel Jr. DATAFUSION, Inc. 139 Townsend Street, Ste. 100 San Francisco, CA 94107 415.222.0100 fax 415.222.0150 rdaniel@datafusion.net http://www.datafusion.net Mealling & Daniel [Page 12]