Network Working Group Roland Hedberg Internet Draft Bruce Greenblatt Ryan Moats Expires in six months Mark Wahl A Tagged Index Object for use in the Common Indexing Protocol Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months. Internet-Drafts may be updated, replaced, or made obsolete by other documents at any time. It is not appropriate to use Internet-Drafts as reference material or to cite them other than as a "working draft" or "work in progress". To learn the current status of any Internet-Draft, please check the 1id-abstracts.txt listing contained in the Internet-Drafts Shadow Directories on ds.internic.net (US East Coast), nic.nordu.net (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific Rim). Distribution of this document is unlimited. Abstract This document defines a mechanism by which information servers can exchange indices of information from their databases by making use of the Common Indexing Protocol (CIP). This document defines the structure of the index information being exchanged, as well as a the appropriate meanings for the headers that are defined in the Common Indexing Protocol. It is assumed that the structures defined here can be used by X.500 DSAs, LDAP servers, whois++ servers, CCSO servers and many others. Hedberg, Greenblatt, Moats, Wahl [Page 1] Internet Draft March 1997 1. Introduction The Common Indexing Protocol (CIP) as defined in [1] proposes a mechanism for distributing searches across several instances of a single type of search engine with a view to creating a global direc- tory. CIP provides a scalable, flexible scheme to tie individual databases into distributed data warehouses that can scale gracefully with the growth of the Internet. CIP provides a mechanism for meet- ing these goals that is independent of the access method that is used to access the actual data that underlies the indices. Separate from CIP is the definition of the Index Object that is used to con- tain the information that is exchanged among Index Servers. One such Index Object that has already been defined is the Centroid that is derived from the Whois++ protocol [2]. The Centroid does not meet all of the requirements for the exchange of index information amongst information servers. For exam- ple, it does not support the notion of incremental updates natively. For information servers that contain millions of records in their database, constant exchange of complete dredges of the database is bandwidth intensive. The Tagged Index Object is specifically designed to support the exchange of index update information. This design comes at the cost of an increase in the size of the index object being exchanged. The Centroid is also not tailored to always be able to give boolean answers to queries. In the Centroid Model, "an index server will take a query in standard whois++ format, search its collections of centroids and other forward information, determine which servers hold records which may fill that query, and then noti- fies the user's client of the next servers to contact to submit the query." [2] Thus, the exchange of Centroids amongst index servers allows hints to be given as to which information server actually con- tains the information. The Tagged Index Object labels the various pieces of information with identifiers that tie the individual object attributes back to an object as a whole. This "tagging" of informa- tion allows an index server to be more capable of directing a specific query to the appropriate information server. Again, this feature is added to the Tagged Index Object at the expense of an increase in the size of the index object. 2. Background The Lightweight Directory Access Protocol (LDAP) is defined in Hedberg, Greenblatt, Moats, Wahl [Page 2] Internet Draft March 1997 [3], and it defines a mechanism for accessing a collection of infor- mation arranged hierarchically in such a manner as to provide a glo- bally distributed database which is normally called the Directory Information Tree (DIT). Some distinguishing characteristics of LDAP servers are that it is normally the case that several servers cooperate to manage a common subtree of the DIT. LDAP servers are expected to respond to requests that pertain to portions of the DIT for which they have data, as well as for those portions for which they have no information in their database. For example, the LDAP server for a portion of the DIT in the United States (c=US) must be able to provide a response to a Search operation that pertains to a portion of the DIT in Sweden (c=se). Normally, the response given will be a referral to another LDAP server that is expected to be more knowledgeable about the appropriate subtree. However, there is no mechanism that currently enables these LDAP servers to refer the LDAP client to the supposedly more knowledgeable server. Typically, an LDAP server is configured with the name of exactly one other LDAP server to which to refer all LDAP clients when their requests fall outside the subtree of the DIT for which that LDAP server has knowledge. This specification defines a mechanism whereby LDAP server can exchange index information that will allow referrals to point towards a clearly accurate destination. While the X.500 series of recommendations defines the Directory Information Shadowing Protocol (DISP) [4] which allows X.500 DSAs to exchange actual information in the DIT. Shadowing allows various information from various portions of the DIT to be replicated amongst participating DSAs. The design point of DISP is optimized at the exchange of entire portions of the DIT, whereas the design point of CIP and the Tagged Index Object is optimize at the exchange of struc- tural index information about the DIT, and improving the performance of tree navigation amongst various information servers. The Tagged Index Object is more appropriate for the exchange of index informa- tion than is DISP. DISP is more targeted at DIT distribution and fault tolerance. DISP is thus more appropriate for the exchange of the actual data in order to spread the load amongst several informa- tion servers. DISP is tailored specifically to X.500 (and other hierarchical directory systems), while the Tagged Index Object and CIP can be used in a wide variety of information server environments. While DISP allows an individual directory server to collect information about large parts of the DIT, it would require a huge database to collect all of the replicas for a meaningful portion of the DIT. Furthermore, as X.525 states: "Before shadowing can occur, an agreement, covering the conditions under which shadowing may occur is required. Although such agreements may be established in a variety Hedberg, Greenblatt, Moats, Wahl [Page 3] Internet Draft March 1997 of ways, such as policy statements covering all DSAs within a given DMD ...", where a DMD is a Directory Management Domain. This is due to the case that the actual data in the DIT is being exchanged amongst DSA rather than only the information required to maintain an Index. In many environments such an agreement is not appropriate, and in order to collect information for a meaningful portion of the DIT, a large number of agreements may need to be arranged. 3. Object What is desired is to have an information server (or network of information servers) that can quickly respond to real world requests, like: - What is Tim Howes' email address? This is much harder than, What is Tim Howes at Netscape's email address. - What is the X.509 certificate for Fred Smith at compuserve.com? One certainly doesn't want to search CompuServe's entire direc- tory tree to find out this one piece of information. I also don't want to have to shadow the entire CompuServe directory subtree onto my server. If this request is being made because Fred is trying to log into my server, I'd certainly want to be able to respond to the BIND in real time. - Who are all of the people in the o=Novell container that have a title of programmer? All of these requests are reasonably straightforward transla- tions into LDAP or whois++, etc. They can also be serviced in a straightforward manner by the users home information server if it has the appropriate reference information into the DIT that contains the source data. Alternatively, a precise referral could be returned. If the home information server wants to service the request based on the index information that it has on hand, this servicing could be done by any number of means: - issuing LDAP operations to the remote directory server - issuing DSP operations to the remote directory server - issuing DAP operations to the remote directory server Hedberg, Greenblatt, Moats, Wahl [Page 4] Internet Draft March 1997 - issuing Whois++ operations to the remote Whois++ server - ... 4. The Tagged Index Object This section defines a Tagged Index Object that can be exchanged by Information Servers using CIP. While in many cases it is accept- able for Information Servers to make use of the Centroid construct (as defined in [2]) to exchange index information, the goals in defining a new construct are multi-pronged: - When the Information Server receives a search request that war- rants that a referral be returned, allow the server to return a referral to the client that is almost guaranteed to allow the client's next request to be sent to the correct Information Server. - When the Information Server receives a search request that is not operating against local data, allow the Information Server itself to "chain" the request to the appropriate remote Informa- tion Server. Note that LDAP itself does not define how Chaining works, but X.500 does. This seems very similar to the first "prong". - Finally, when a collection of Information Servers are operating against a large distributed directory, allow them to distribute index information amongst themselves (ala CIP) so that as their own searches can be carried out with some degree of efficiency. 4.1. The Agreement Before a Tagged Index Object can be exchanged, the organization which administers the object supplier and the organization which admisiters the object consumer must reach an agreement on how the servers will communicate. This agreement contains the following: - "version":The version of the agreement and the index type. This specification describes the index type "x-tagged-index-1" - "baseobject":The Distinguished Name of the prefix entry of the supplier's subtree. This field is not explicitly necessary, as it may appear in the "base-uri" field below. - "scope": The subset of information in the supplier's subtree for Hedberg, Greenblatt, Moats, Wahl [Page 5] Internet Draft March 1997 which the update information will index. For this version of the specification, the scope is always "subtree": the base object and all entries down to the leaves of the tree, including any subordinate naming contexts. This field is not explicitly necessary, as it may appear in the "base-uri" field below. - "dsi": An OID which uniquely identifies the subtree and scope. This field is not explicitly necessary, as it may not provide information beyond that which is contained in the "base-uri" below. - "base-uri": One or more URI's which will form the base of any referrals created based upon the index object that is governed by this agreement. - "supplier": The hostname and listening port number of the sup- plier server, as well as any alternative servers holding that same naming contexts, in case the supplier is unavailable. - "consumeraddr": This is a URI of the "mailto:" form, with the RFC 822 email address of the consumer server. Subsequent ver- sions of this draft allow other forms of URI, so that the consu- mer may retrieve the update via the WWW, FTP or CIP - "updateinterval": The maximum duration in seconds between occu- rances of the supplier server generating an update. If the con- sumer server has not received an update from the supplier server after waiting this long since the previous update, it is likely that the index information is now out of date. A typical value for a server with frequent updates would be 604800 seconds, or every week. Servers whose DITs are only modified annually could have a much longer update interval. - "securityoption": Whether and how the supplier server should sign and encrypt the update before sending it to the consumer server. Options for this version of the specification are: "none" - the update is sent in plaintext "PGP/MIME": the update is digitally signed and encrypted using PGP [ref] "S/MIME": the update is digitally signed and encrypted using S/MIME [ref] "SSLv3": the update is digitally signed and encrypted using an SSLv3 connection [ref] "Fortezza": the update is digitally signed and encrypted using Fortezza [5] It is recommended that the "PGP/MIME" option be used when exchanging sensitive information across public networks, and both the supplier and consumer have PGP keys. The "Fortezza" option is intended for use in environments where security protocols are based on Fortezza-compatible devices. The "S/MIME" option can be used with Hedberg, Greenblatt, Moats, Wahl [Page 6] Internet Draft March 1997 both the supplier and consumer have RSA keys and can make use of the PKCS protocols defined in the S/MIME specification. The "SSLv3" option can be used when both the supplier and consumer have access to SSL services, have server certificates, and can mutually authenticate each other. Should these be IANA registered things??? - Security Credentials: The long-term cryptographic credentials used for key exchange and authentication of the consumer and supplier servers, if a security option was selected. For "PGP/MIME", this will be the trusted public keys of both servers. For "Fortezza", this will be the certificate paths of both servers to a common point of trust. For "S/MIME" and "SSLv3" these will be the certificates of the supplier and consumer. 4.2. Content Type The update consists of a MIME object of type application/cip- index-object. The parameters are: "type": this has value "x- tagged-index-1". "dsi": the DSI (if any) from the agreement. "base-uri". A set of URIs, separated by spaces. In each URI, the hostname/portno must be distinct, and based on the "supplier" part of the agreement. The payload is mostly textual data but may include bytes with the high bit set. The quoted-printable content-transfer-encoding is recommended to be used if there are any bytes with the high bit set, otherwise no transfer encoding is needed. This object may be encapsulated in a wrapper content (such as multipart/signed) or be encrypted as part of the security procedures. The resulting content can the distributed, for example via electronic mail. For example, From: supplier@sup.com Date: Thu, 16 Jan 1997 13:50:37 -0500 Message-Id: <199701161850.NAA29295@sup.com> To: consumer@consumer.com <<-- from consumer server address Reply-to: supplier-admin@sup.com MIME-Version: 1.0 Content-Type: application/cip-index-object; type=x-ldap-centroid-1; dsi=1.3.6.1.4.1.1466.85.85.1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.16; base-uri="ldap://sup.com/dc=sup,dc=com ldap://alt.com/dc=sup,dc=com" Hedberg, Greenblatt, Moats, Wahl [Page 7] Internet Draft March 1997 The payload is series of CRLF-terminated lines. Each line is in the UTF-8 encoding of the Unicode (ISO-10646 BMP) character set. No other character sets are permitted by this version of the specifica- tion. Some supplier servers may only be able to generate the print- able US-ASCII subset, but all consumer servers must be able to handle the full range of Unicode characters. 4.3. Tagged Index BNF The Tagged Index object has the following grammar, expressed in modified BNF format: Hedberg, Greenblatt, Moats, Wahl [Page 8] Internet Draft March 1997 index-object = 1*(io-part SEP) io-part io-part = header-line | schema-spec | index-info header-line = version-spec | update-type | this-update | last-update | context-size version-spec = "version:" *SPACE "x-tagged-index-1" update-type = "updatetype:" *SPACE ( "total" | "incremental") this-update = "thisupdate:" *SPACE TIMESTAMP last-update = "lastupdate:" *SPACE TIMESTAMP context-size = [ "contextsize:" *SPACE 1*DIGIT ] schema-spec = "BEGIN IO-Schema" SEP 1*(schema-line) SEP "END IO-Schema" schema-line = attribute-name ":" token-type token-type = "FULL" | "TOKEN" | "RFC822" | "UUCP" | "DNS" index-info = "BEGIN Index-Info" SEP 1*(index-block) SEP "END Index-Info" index-block = first-line 0*(SEP cont-line) first-line = attr-name ":" *SPACE taglist "/" attr-value cont-line = "-" taglist "/" attribute-value taglist = tag 1*("," tag) tag = "*" | 1*DIGIT ["-" 1*DIGIT] attr-value = 1*(UTF8) attr-name = 1*(UTF8) UTF8 = ASCII | "%" HEX HEX TIMESTAMP = 1*DIGIT ASCII = DIGIT | UPPER | LOWER | OTHER SPACE = <ASCII space, hex 20> SEP = (CR LF / LF) CR = <ASCII CR, carriage return, hex 0D> LF = <ASCII LF, line feed, hex 0A> HEX = "A" | "B" | "C" | "D" | "E" | "F" | "a" | "b" | "c" | "d" | "e" | "f" | DIGIT DIGIT = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" UPPER = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" LOWER = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" OTHER = "(" | ")" | "+" | "," | "-" | "." | "/" | ":" | "=" | "?" | "@" | ";" | "$" | "_" | "!" | "~" | "*" | "'" | " "[" | "]" | "^" | "`" | "{" | "|" | "}" Hedberg, Greenblatt, Moats, Wahl [Page 9] Internet Draft March 1997 4.3.1. Header Descriptions The header section consists of one or more "header lines". The following header lines are defined: "version": This line must always be present, and have the value "x-tagged-index-1" for this version of the specification. "updatetype": This line must always be present. It takes as the value either "total" or "incremental". The first update sent by a supplier server to a consumer server for a DSI must be a "total" update (why?). "thisupdate": This line must always be present. The value is the number of seconds from 00:00:00 UTC January 1, 1970 at which the supplier constructed this update. "lastupdate": This line must be present if the "updatetype" list has the value "incremental". The value is the number of seconds from 00:00:00 UTC January 1, 1970 at which the supplier constructed the previous update sent to the consumer. This field allows the consumer to determine if a previous update was missed. "contextsize": This line may be present at the supplier's option. The value is a number, which is the approximate total number of entries in the subtree. This information is provided for statistical purposes only. 4.3.2. Tokenization Types The Tagged Index Object inherits the "TOKEN" scheme for tokeni- zation as specified in [2]. In addition, there are several other tokenization schemes defined for the Tagged Index Object. The fol- lowing table presents these schemes and what character(s) are used to delimit tokens. Should these be IANA registered things??? Token Type Tokenization Characters FULL none TOKEN white space, "@" RFC822 white space, ".", "@" UUCP white space, "!" DNS any character note a number, letter, or "-" 5. Example As an example, the following LDIF [6] entries and the resulting Tagged Index Object are presented. Hedberg, Greenblatt, Moats, Wahl [Page 10] Internet Draft March 1997 dn: cn=Barbara Jensen, ou=Product Development, o=Ace Industry, c=US objectclass: top objectclass: person objectclass: organizationalPerson cn: Barbara Jensen cn: Barbara J Jensen cn: Babs Jensen sn: Jensen uid: bjensen telephonenumber: +1 408 555 1212 description: A big sailing fan. dn: cn=Bjorn Jensen, ou=Accounting, o=Ace Industry, c=US objectclass: top objectclass: person objectclass: organizationalPerson cn: Bjorn Jensen sn: Jensen telephonenumber: +1 408 555 1212 dn: cn=Gern Jensen, ou=Product Testing, o=Ace Industry, c=US objectclass: top objectclass: person objectclass: organizationalPerson cn: Gern Jensen cn: Gern O Jensen sn: Jensen uid: gernj telephonenumber: +1 408 555 1212 dn: cn=Horatio Jensen, ou=Product Testing, o=Ace Industry, c=US objectclass: top objectclass: person objectclass: organizationalPerson cn: Horatio Jensen cn: Horatio N Jensen sn: Jensen uid: hjensen telephonenumber: +1 408 555 1212 The Tagged Index Object for this example would be: Hedberg, Greenblatt, Moats, Wahl [Page 11] Internet Draft March 1997 version: x-tagged-index-1 updatetype: total thisupdate: 855938804 BEGIN IO-Schema dn: FULL ou: TOKEN o: TOKEN c: TOKEN objectclass: FULL cn: TOKEN sn: FULL uid: FULL title: TOKEN END IO-Schema BEGIN Index-Info dn: 1/cn=Barbara Jensen,ou=Product Development,o=Ace Industry,c=US -2/cn=Bjorn Jensen,ou=Accounting,o=Ace Industry,c=US -3/cn=Gern Jensen,ou=Product Testing,o=Ace Industry,c=US -4/cn=Horatio Jensen,ou=Product Testing,o=Ace Industry,c=US ou: 1,3-4/Product -1/Development -2/Accounting -3-4/Testing o: */Ace -*/Industry c: */US objectclass: */top -*/person -*/organizationalPerson cn: 1/Barbara -1/J -1/Babs -*/Jensen -2/Bjorn -3/Gern -3/O -4/Horatio -4/N sn: */Jensen uid: 1/bjensen -3/gernj -4/hjensen title: 1/product 1/manager 1/rod 1/and 1/reel Hedberg, Greenblatt, Moats, Wahl [Page 12] Internet Draft March 1997 1/division END Index-Info In this next example, consider an LDIF file containing a series of change records and comments Hedberg, Greenblatt, Moats, Wahl [Page 13] Internet Draft March 1997 # Add a new entry dn: cn=Fiona Jensen, ou=Marketing, o=Ace Industry, c=US changetype: add objectclass: top objectclass: person objectclass: organizationalPerson cn: Fiona Jensen sn: Jensen uid: fiona telephonenumber: +1 408 555 1212 jpegphoto:< /usr/local/directory/photos/fiona.jpg # Delete an existing entry dn: cn=Robert Jensen, ou=Marketing, o=Ace Industry, c=US changetype: delete # Modify an entry's relative distinguished name dn: cn=Paul Jensen, ou=Product Development, o=Ace Industry, c=US changetype: modrdn newrdn: cn=Paula Jensen deleteoldrdn: 1 # Rename and entry and move all of its children to a new location in # the directory tree (only implemented by LDAPv3 servers). dn: ou=PD Accountants, ou=Product Development, o=Ace Industry, c=US changetype: modrdn newrdn: ou=Product Development Accountants deleteoldrdn: 0 newsuperior: ou=Accounting, o=Ace Industry, c=US # Modify an entry: add an additional value to the postaladdress attribute, # completely delete the description attribute, replace the telephonenumber # attribute with two values, and delete a specific value from the # facsimiletelephonenumber attribute dn: cn=Paula Jensen, ou=Product Development, o=Ace Industry, c=US changetype: modify add: postaladdress postaladdress: 123 Anystreet $ Sunnyvale, CA $ 94086 - delete: description - replace: telephonenumber telephonenumber: +1 408 555 1234 telephonenumber: +1 408 555 5678 - delete: facsimiletelephonenumber facsimiletelephonenumber: +1 408 555 9876 - The Tagged Index Object for this example would be: Hedberg, Greenblatt, Moats, Wahl [Page 14] Internet Draft March 1997 version: x-tagged-index-1 updatetype: incremental thisupdate: 855938804 lastupdate: 855912345 BEGIN IO-Schema dn: FULL ou: TOKEN o: TOKEN c: TOKEN objectclass: FULL cn: TOKEN sn: FULL uid: FULL title: TOKEN END IO-Schema BEGIN Index-Info changetype: 1/add -2/delete -3/modrdn -4/modify ou: 1-2/Marketing -3-4/Product -3-4/Development o: */Ace -*/Industry c: */US objectclass: */top -*/person -*/organizationalPerson cn:1/Fiona -*/Jensen -2/Robert -3-4/Paula sn:1/Jensen uid:1/fiona postaladdress: 4/123 -4/Anystreet -4/Sunnyvale -4/CA -4/94086 END Index-Info 6. Aggregation TBD Hedberg, Greenblatt, Moats, Wahl [Page 15] Internet Draft March 1997 7. Recommendations TBD 8. Security Considerations Information Server administrators must decide what portions of their databases are appropriate for inclusion in the Tagged Index Object. For distribution of information outside of the enterprise, information server developers are encouraged to allow for facilities that hide the organizational structure when generating the Tagged Index Object from the underlying information database. In order to allow for the secure transmission of Tagged Index Objects across the Internet, Index Servers should make use of SSL to carry out the con- nection. In order to strongly verify the identity of the peer index server on the other side of the connection, SSL version 3 certificate exchange should be implemented, and the identity in the peer's certi- ficate verify with the Public Key Infrastructure. If electronic mail is used to exchange the Tagged Index Objects, then a secure messaging facility, such as PGP/MIME or S/MIME should be used to sign or encrypt (or both) the information. 9. References [1] J. Allen, "The Common Indexing Protocol (CIP)," Internet Draft (work in progress) 19 November 1996. [2] C. Weider, J. Fullton, S. Spero, "Architecture of the Whois++ Index Service. RFC 1913, February 1996. [3] W. Yeong, T. Howes, S. Kille, "Lightweight Directory Access Pro- tocol," RFC 1777, March 1995. [4] ITU, "X.525 Information Technology - Open Systems Interconnec- tion - The Directory: Replication", November 1993. [5] "FORTEZZA Application Implementors Guide for the FORTEZZA Crypto Card (Production Version)", Document #PD4002102-1.01, SPYRUS, 1995. [6] The LDAP Data Interchange Format (LDIF). Internet Draft (work in progress), 25 November 1996. Hedberg, Greenblatt, Moats, Wahl [Page 16] Internet Draft March 1997 10. Author's Addresses Roland Hedberg Umdac Umea University 901 87 Umea Sweden Email: Roland.Hedberg@umdac.umu.se Bruce Greenblatt Novell, Inc 2180 Fortune Drive San Jose, CA 95131 USA Email: bgg@novell.com Phone: +1-408-577-7688 Ryan Moats AT&T 15621 Drexel Circle Omaha, NE 68135-2358 USA EMail: jayhawk@ds.internic.net Phone: +1 402 894-9456 Mark Wahl Critical Angle, Inc. 4815 W Braker Lane #502-385 Austin, TX 78759 Email: M.Wahl@critical-angle.com Hedberg, Greenblatt, Moats, Wahl [Page 17] Internet Draft March 1997 Table of Contents 1. Introduction ................................................ 2 2. Background .................................................. 2 3. Object ...................................................... 4 4. The Tagged Index Object ..................................... 5 4.1. The Agreement ............................................. 5 4.2. Content Type .............................................. 7 4.3 Tagged Index BNF ........................................... 8 4.3.1. Header Descriptions ..................................... 10 4.3.2. Tokenization types ...................................... 10 5. Example ..................................................... 10 6. Aggregation ................................................. 15 7. Recommendations ............................................. 16 8. Security Considerations ..................................... 16 9. References .................................................. 16 10. Author's Addresses ......................................... 17 Hedberg, Greenblatt, Moats, Wahl [Page 18]