Network Working Group Roland Hedberg Bruce Greenblatt Ryan Moats Internet Draft November 26, 1996 Expires in six months Using the Common Indexing Protocol in an LDAP Environment Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months. Internet-Drafts may be updated, replaced, or made obsolete by other documents at any time. It is not appropriate to use Internet-Drafts as reference material or to cite them other than as a "working draft" or "work in progress". To learn the current status of any Internet-Draft, please check the 1id-abstracts.txt listing contained in the Internet-Drafts Shadow Directories on ds.internic.net (US East Coast), nic.nordu.net (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific Rim). Distribution of this document is unlimited. Abstract This document defines a mechanism by which LDAP servers can exchange index information by making use of the Common Indexing Protocol (CIP). Both the structure of the index information being exchanged is defined, as well as a Directory Schema that can be used by the LDAP servers that are participating in the LDAP-index mesh. It is assumed that the structures defined here can be used by X.500 DSAs as well as LDAP servers. Note that this is a very preliminary draft, and much work is left to do to finalize the directory schema as well as the index object. 1. Introduction ........................... 2 Hedberg, Greenblatt, Moats 1 Expires May 1997 Internet Draft October 13, 1996 2. Index Object for LDAP Servers .................. 4 3. Directory Schema Supporting the LDAP Index Mesh ......... 6 The Common Indexing Protocol (CIP) as defined in [CIP] proposes a mechanism for distributing searches across several instances of a single type of search engine with a view to creating a global directory. CIP provides a scalable, flexible scheme to tie individual databases into distributed data warehouses that can scale gracefully with the growth of the Internet. CIP provides a mechanism for meeting these goals that is independent of the access method that is used to access the actual data that underlies the indices. Separate from CIP is the definition of the Index Object that is used to contain the information that is exchanged among Index Servers. One such Index Object that has already been defined is the Centroid that is derived from the Whois++ protocol [RFC 1913]. The Lightweight Directory Access Protocol (LDAP) is defined in [RFC 1777], and it defines a mechanism for accessing a collection of information arranged hierarchically in such a manner as to provide a globally distributed database which is normally called the Directory Information Tree (DIT). Some distinguishing characteristics of LDAP servers are that it is normally the case that several servers cooperate to manage a common subtree of the DIT. LDAP servers are expected to respond to requests that pertain to portions of the DIT for which they have data, as well as for those portions for which they have no information in their database. For example, the LDAP server for a portion of the DIT in the United States (c=US) must be able to provide a response to a Search operation that pertains to a portion of the DIT in Sweden (c=se). Normally, the response given will be a referral to another LDAP server that is expected to be more knowledgeable about the appropriate subtree. However, there is no mechanism that currently enables these LDAP servers to refer the LDAP client to the supposedly more knowledgeable server. Typically, an LDAP server is configured with the name of exactly one other LDAP server to which to refer all LDAP clients when their requests fall outside the subtree of the DIT for which that LDAP server has knowledge. This specification defines a mechanism whereby LDAP Hedberg, Greenblatt, Moats 2 Expires May 1997 Internet Draft October 13, 1996 server can exchange index information that will allow referrals to point towards a clearly accurate destination. While the X.500 series of recommendations defines the Directory Information Shadowing Protocol (DISP) [X.525] which allows X.500 DSAs to exchange actual information in the DIT. Shadowing allows various information from various portions of the DIT to be replicated amongst participating DSAs. While DISP allows an individual directory server to collect information about large parts of the DIT, it would require a huge database to collect all of the replicas for a meaningful portion of the DIT. Furthermore, as X.525 states: Before shadowing can occur, an agreement, covering the conditions under which shadowing may occur is required. Although such agreements may be established in a variety of ways, such as policy statements covering all DSAs within a given DMD ...", where a DMD is a Directory Management Domain. This is due to the case that the actual data in the DIT is being exchanged amongst DSA rather than only the information required to maintain an Index. In many environments such an agreement is not appropriate, and in order to collect information for a meaningful portion of the DIT, a large number of agreements may need to be arranged. What is desired is to have an LDAP server (or network of LDAP servers) that can quickly respond to real world requests, like: - What is Tim Howes' email address? This is much harder than, "What is Tim Howes at Netscape's email address. - What is the X.509 certificate for Fred Smith at compuserve.com? One certainly doesn't want to search CompuServe's entire directory tree to find out this one piece of information. I also don't want to have to shadow the entire CompuServe directory subtree onto my server. If this request is being made because Fred is trying to log into my server, I'd certainly want to be able to respond to the BIND in real time. - Who are all of the people in the o=Novell container that have a title of programmer? Hedberg, Greenblatt, Moats 3 Expires May 1997 Internet Draft October 13, 1996 All of these requests are reasonably straightforward translations into LDAP. They can also be serviced in a straightforward manner by the users home LDAP server if it has the appropriate reference information into the DIT that contains the source data. Alternatively, a precise referral could be returned. If the home LDAP server wants to service the request based on the index information that it has on hand, this servicing could be done by any number of means: - issuing LDAP operations to the remote directory server - issuing DSP operations to the remote directory server - issuing DAP operations to the remote directory server - issuing operations in some other Internet protocol, like Whois++ This section defines the Index Object that are to be exchanged by LDAP Servers using CIP. While in many cases it is acceptable for LDAP Servers to make use of the Centroid construct to exchange index information, the goals in defining a new construct are multi-pronged: - When the LDAP server receives a search request that warrants that a referral be returned, allow the server to return a referral to the client that is almost guaranteed to allow the client's next request to be sent to the correct LDAP server. - When the LDAP server receives a search request that is not operating against local data, allow the LDAP server itself to "chain" the request to the appropriate remote LDAP server. Note that LDAP itself does not define how Chaining works, but X.500 does. This seems very similar to the first "prong". - Finally, when a collection of LDAP servers are operating against a large Hedberg, Greenblatt, Moats 4 Expires May 1997 Internet Draft October 13, 1996 distributed directory, allow them to distribute index information amongst themselves (ala CIP) so that as their own searches can be carried out with some degree of efficiency. One of the fundamental characteristics of LDAP (and X.500 for that matter) is that every object in the directory has a unique name, i.e. its distinguished name. One of the objectives is to allow the index server to have the information on hand to resolve a search operation into a set of zero or more distinguished names. In Whois++ the objective is to resolve the Whois++ query (the analog to the ldap search) into a particular server in the Whois++ mesh, and it is up to the individual Whois++ server to organize its data correctly to perform the individual query. The information passed around amongst LDAP enabled CIP index servers is an LDAP-INDEX. If it is the case that the LDAP server is going to be able to map the various data values that are contained in the centroid back to individual user objects, then the USER Template for LDAP-INDEXs would have to include the common name attribute, and a means of tagging the other fields in the LDAP-INDEX with an indication of the root common names to which it referred. In the LDAP-INDEX, these tags are known as Relative Handle Identifiers (RHIs). For example, it would need to be known that the Given Name of John mapped back to the common names JohnS, JohnT and Jsmith (if these were all common names in some directory tree). Assume the following three directory records: Record 1 Template: User Common Name: John First Name: John Last Name: Smith Favourite Drink: Labatt Beer Record 2 Template: User Common Name: Joe First Name: Joe Last Name: Smith Favourite Drink: Molson Beer Hedberg, Greenblatt, Moats 5 Expires May 1997 Internet Draft October 13, 1996 Record 3 Template: User Common Name: JohnS First Name: John Last Name: Smith Favourite Drink: Root Beer The LDAP-INDEX for this server would be: Template: User Common Name: 1/John 2/Joe 3/JohnS First Name: 2/Joe 1,3/John Last Name: 1,2,3/Smith Favourite Drink: 1,2,3/Beer 1/Labatt 2/Molson 3/Root In this example, the comma character is used as a RHI separator, and the forward slash is used as the RHI terminator. Placing the RHIs at the beginning of the attribute information makes the RHI parsing easier since there are no restrictions on the characters that may be contained in the attribute value. An RHI is an opaque value that has meaning only in the context of an individual LDAP-INDEX object. Thus, when this server repeats sending its LDAP-INDEX, it is not guaranteed that the RHIs will be identical to the previous instantiation of the LDAP-INDEX. In order to participate in a mesh of CIP servers, an LDAP server would need to appropriately define a set of CIP header fields. The following describes how to use CIP headers in an LDAP indexing environment. The CIP headers that are required are: type, dsi and base-uri parameters in the case of the subtype: cip-index-object. The type field for an LDAP-INDEX object should be LDAP-INDEX-1". The dsi field does not appear to be of much interest in an LDAP environment. This is due to the case that LDAP provides a global naming scheme, and it appears that the dsi is in place to assist Hedberg, Greenblatt, Moats 6 Expires May 1997 Internet Draft October 13, 1996 those naming services that do not provide an ability to distinctly identify a directory object in particular. Thus, it appears that the LDAP-INDEX-1 index object could make do with one OID that would describe every index object, since each index object describes some subset of a global name space. The base-uri for an LDAP-INDEX-1 object would be whatever ldap uri could be used to search the subtree of the DIT to which this index object indexes. Thus, for an index of the whole server at ldap.novell.com, the base-URI could be ldap.novell.com, or if we just wanted the c=us, o=novell subtree that would be added on to the end of the uri according to the format defined in RFC 1959. Todo: how to define CIP headers, or information inside the LDAP- INDEX-1 object type to allow for partial updates (i.e. deltas). For example, I'd like to be able to poll ldap.attmail.com once a week to get the changes that have been made. This section is based on experimental work that has been done, and needs to be cleaned up. This Schema requires that each organization that wanted to be in the index should be represented as a data set, therefore each organization was given an OID from the part of the OID tree that I control. This was then the DSI. I didn't have and do not have any control over which directory service the organization would use, it was their internal choice. Hence I had to be prepared to deal with Whois++, PH, X.500 DSA as well as SLAPD servers. (Note: this is conflicting with the previous section). Attributes and ObjectClasses ---------------------------- As stated before I wanted the index to be stored in 'normal' X.500 attributes/objectClasses. Therefore I designed two new objectClasses and a couple of attributes. Name: cIPIndex Description: Objectclasses that holds one indexvalue and information connected to this value. Hedberg, Greenblatt, Moats 7 Expires May 1997 Internet Draft October 13, 1996 OID: umuObjectClass.9 (1.2.752.17.3.9) SubclassOf: top MustContain: dSI, idx MayContain: field,template Name: cIPDataSet Description: Objectclass that holds information concerning one data set. OID: umuObjectClass.10 (1.2.752.17.3.10) SubclassOf: top MustContain: dSI, baseURI, accessPoint MayContain: description, protocolVersion, indexOCAT ------------------------------------------------------------------ -- Name: idx ShortName: Description: RDN of a indexobject OID: umuAttributeType.20 (1.2.752.17.1.20) Syntax: DirectoryString SizeRestriction: none SingleValued: True Name: dSI ShortName: Description: Data set Identifier, a unique identifier for one particular set of information. OID: umuAttributeType.21 (1.2.752.17.1.21) Syntax: OID SizeRestriction: none SingleValued: False Name: field ShortName: Description: Field type, describes the type of data that is stored in this entry. Hedberg, Greenblatt, Moats 8 Expires May 1997 Internet Draft October 13, 1996 OID: umuAttributeType.24 (1.2.752.17.1.24) Syntax: caseIgnoreString SizeRestriction: none SingleValued: True Name: template ShortName: Description: Template type, describes the kind of object we're dealing with. OID: umuAttributeType.25 (1.2.752.17.1.25) Syntax: caseIgnoreString SizeRestriction: none SingleValued: True Name: accessPoint ShortName: Description: Host and portnumber written in the normal UR* way that is host:portnumber OID: umuAttributeType.22 (1.2.752.17.1.22) Syntax: caseIgnoreString SizeRestriction: none SingleValued: False Name: baseURI ShortName: Description: Universal Resource Identifier that is used in further searches. OID: umuAttributeType.26 (1.2.752.17.1.26) Syntax: caseExactString SizeRestriction: none SingleValued: False Name: protocolVersion ShortName: Description: Common Indexing Protocol version OID: umuAttributeType.27 (1.2.752.17.1.27) Syntax: caseIgnoreString SizeRestriction: none Hedberg, Greenblatt, Moats 9 Expires May 1997 Internet Draft October 13, 1996 SingleValued: True Name: indexOCAT ShortName: Description: A description of what kind of information that are indexed like "person commonName" some like to have OIDs instead of readable names but that's debatable. Anyway they should be standardized by FIND or some other group (IDS?) within IETF. OID: umuAttributeType.27 (1.2.752.17.1.27) Syntax: caseIgnoreString SizeRestriction: none SingleValued: False ------------------------------------------------------------ DIT Structure ------------- The format of the part of the DIT where the index information was stored would look like this: c=SE | | | dSI=1.2.752.17.5.0 /.|.\ /..|..\ /...|...\ dSI=1.2.752.17.5.1....idx="Roland Hedberg" That is there would be a 'root' of each collective index and below that a flat structure with all the dSI's and index values. This makes it rather easy to handle incremental updates. Hedberg, Greenblatt, Moats 10 Expires May 1997 Internet Draft October 13, 1996 There might be several of these trees each then representing different type of indexes, "person commonName" or "organisation organisationalName" or "organizationalRole commonName" or combinations "person commonName" and "person title". I have choosen to use X.500 names since there are no standardisation by FIND or anyone else on this for the time being. This would allow a user to choose which index to use based on what's index and additional information kept in the descriptionfield. If the indexOCAT specifies "person commonName" the description might be "Biologists". In the case that you store both titles as well as commonNames in one indexserver you might change the idx to be indexvalue+"attributeName"+"objectClass" or store attributeName, objectClass in the field,template attributes. If nothing is stored either way about attributename and objectClass then it defaults to whatever is specified in the dSI entry which then obviously only can contain one such specification. A typical dSI entry would be : dSI= 1.2.752.17.5.3 description= Umea University accessPoint= kybele.umdc.umu.se:1389 indexOCAT= person commonName baseURI= ldap://ldap.umu.se/o=Umea%20Universitet,c=SE protocolVersion= 1.7 Hedberg, Greenblatt, Moats 11 Expires May 1997 Internet Draft October 13, 1996 and a indexentry would be something like this idx=Roland Hedberg dSI= 1.2.752.17.5.3&1.2.752.17.5.32 Roland Hedberg Umdac Umea University 901 87 Umea Sweden Email: Roland.Hedberg@umdac.umu.se Bruce Greenblatt Novell, Inc 2180 Fortune Drive San Jose, CA 95131 USA Email: bgg@novell.com Phone: +1-408-577-7688 Ryan Moats AT&T 15621 Drexel Circle Omaha, NE 68135-2358 USA EMail: jayhawk@ds.internic.net Phone: +1 402 894-9456 Hedberg, Greenblatt, Moats 12 Expires May 1997