Network Working Group                                     Roland Hedberg
                                                              Bruce Greenblatt
                                                                    Ryan Moats

      Internet Draft
      <draft-ietf-find-cip-ldap-00.txt>                      November 26, 1996
      Expires in six months


             Using the Common Indexing Protocol in an LDAP Environment

      Status of this Memo


         This document is an Internet-Draft.  Internet-Drafts are working
         documents of the Internet Engineering Task Force (IETF), its areas,
         and its working groups. Note that other groups may also distribute
         working documents as Internet-Drafts.

         Internet-Drafts are draft documents valid for a maximum of six
         months.  Internet-Drafts may be updated, replaced, or made obsolete
         by other documents at any time.  It is not appropriate to use
         Internet-Drafts as reference material or to cite them other than as a
         "working draft" or "work in progress".

         To learn the current status of any Internet-Draft, please check the
         1id-abstracts.txt listing contained in the Internet-Drafts Shadow
         Directories on ds.internic.net (US East Coast), nic.nordu.net
         (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific
         Rim).

         Distribution of this document is unlimited.

      Abstract
            This document defines a mechanism by which LDAP servers can
         exchange index information by making use of the Common Indexing
         Protocol (CIP).  Both the structure of the index information being
         exchanged is defined, as well as a Directory Schema that can be used
         by the LDAP servers that are participating in the LDAP-index mesh. 
         It is assumed that the structures defined here can be used by X.500
         DSAs as well as LDAP servers.  Note that this is a very preliminary
         draft, and much work is left to do to finalize the directory schema
         as well as the index object.


      1. Introduction ...........................                            2


      Hedberg, Greenblatt, Moats     1              Expires May 1997


      Internet Draft                                October 13, 1996


      2. Index Object for LDAP Servers  ..................                   4

      3. Directory Schema Supporting the LDAP Index Mesh  .........          6


            The Common Indexing Protocol (CIP) as defined in [CIP] proposes a
         mechanism for distributing searches across several instances of a
         single type of search engine with a view to creating a global
         directory.  CIP provides a scalable, flexible scheme to tie
         individual databases into distributed data warehouses that can scale
         gracefully with the growth of the Internet.  CIP provides a mechanism
         for meeting these goals that is  independent of the access method
         that is used to access the actual data that underlies the indices. 
         Separate from CIP is the definition of the Index Object that is used
         to contain the information that is exchanged among Index Servers. 
         One such Index Object that has already been defined is the Centroid
         that is derived from the Whois++ protocol [RFC 1913]. 

            The Lightweight Directory Access Protocol (LDAP) is defined in
         [RFC 1777], and it defines a mechanism for accessing a collection of
         information arranged hierarchically in such a manner as to provide a
         globally distributed database which is normally called the Directory
         Information Tree (DIT).  Some distinguishing characteristics of LDAP
         servers are that it is normally the case that several servers
         cooperate to manage a common subtree of the DIT.  LDAP servers are
         expected to respond to requests that pertain to portions of the DIT
         for which they have data, as well as for those portions for which
         they have no information in their database. For example, the LDAP
         server for a portion of the DIT in the United States (c=US) must be
         able to provide a response to a Search operation that pertains to a
         portion of the DIT in Sweden (c=se).  Normally, the response given
         will be a referral to another LDAP server that is expected to be more
         knowledgeable about the appropriate subtree.  However, there is no
         mechanism that currently enables these LDAP servers to refer the LDAP
         client to the supposedly more knowledgeable server.  Typically, an
         LDAP server is configured with the name of exactly one other LDAP
         server to which to refer all LDAP clients when their requests fall
         outside the subtree of the DIT for which that LDAP server has
         knowledge.  This specification defines a mechanism whereby LDAP


      Hedberg, Greenblatt, Moats     2              Expires May 1997


      Internet Draft                                October 13, 1996


         server can exchange index information that will allow referrals to
         point towards a clearly accurate destination. 


            While the X.500 series of recommendations defines the Directory
         Information Shadowing Protocol (DISP) [X.525] which allows X.500 DSAs
         to exchange actual information in the DIT.  Shadowing allows various
         information from various portions of the DIT to be replicated amongst
         participating DSAs.  While DISP allows an individual directory server
         to collect information about large parts of the DIT, it would require
         a huge database to collect all of the replicas for a meaningful
         portion of the DIT.  Furthermore, as X.525 states:  Before shadowing
         can occur, an agreement, covering the conditions under which
         shadowing may occur is required. Although such agreements may be
         established in a variety of ways, such as policy statements covering
         all DSAs within a given DMD ...", where a DMD is a Directory
         Management Domain.  This is due to the case that the actual data in
         the DIT is being exchanged amongst DSA rather than only the
         information required to maintain an Index.  In many environments such
         an agreement is not appropriate, and in order to collect information
         for a meaningful portion of the DIT, a large number of agreements may
         need to be arranged.

            What is desired is to have an LDAP server (or network of LDAP
         servers) that can quickly respond to real world requests, like:

            - What is Tim Howes' email address?  This is much harder than,
         "What is

              Tim Howes at Netscape's email address.
            - What is the X.509 certificate for Fred Smith at compuserve.com? 
         One certainly
              doesn't want to search CompuServe's entire directory tree to
         find out this
              one piece of information.  I also don't want to have to shadow
         the entire
              CompuServe directory subtree onto my server.  If this request is
         being made
              because Fred is trying to log into my server, I'd certainly want
         to be

              able to respond to the BIND in real time.
            - Who are all of the people in the o=Novell container that have a
         title of
              programmer?


      Hedberg, Greenblatt, Moats     3              Expires May 1997


      Internet Draft                                October 13, 1996


            All of these requests are reasonably straightforward translations
         into LDAP.  They can also be serviced in a straightforward manner by
         the users home LDAP server if it has the appropriate reference
         information into the DIT that contains the source data. 
         Alternatively, a precise referral could be returned. If the home LDAP
         server wants to service the request based on the index information
         that it has on hand, this servicing could be done by any number of
         means:


            - issuing LDAP operations to the remote directory server
            - issuing DSP operations to the remote directory server
            - issuing DAP operations to the remote directory server
            - issuing operations in some other Internet protocol, like Whois++


            This section defines the Index Object that are to be exchanged by
         LDAP Servers using CIP.  While in many cases it is acceptable for
         LDAP Servers to make use of the Centroid construct to exchange index
         information, the goals in defining a new construct are multi-pronged:


            - When the LDAP server receives a search request that warrants
         that a 
              referral be returned, allow the server to return a referral to
         the client
              that is almost guaranteed to allow the client's next request to
         be sent to 
              the correct LDAP server.
            - When the LDAP server receives a search request that is not
         operating against

              local data, allow the LDAP server itself to "chain" the request
         to the  
              appropriate remote LDAP server.  Note that LDAP itself does not
         define how 
              Chaining works, but X.500 does.  This seems very similar to the
         first "prong".
            - Finally, when a collection of LDAP servers are operating against
         a large


      Hedberg, Greenblatt, Moats     4              Expires May 1997


      Internet Draft                                October 13, 1996


              distributed directory, allow them to distribute index
         information amongst
              themselves (ala CIP) so that as their own searches can be
         carried out

              with some degree of efficiency.

            One of the fundamental characteristics of LDAP (and X.500 for that
         matter) is that every object in the directory has a unique name, i.e.
         its distinguished name.  One of the objectives is to allow the index
         server to have the information on hand to resolve a search operation
         into a set of zero or more distinguished names.  In Whois++ the
         objective is to resolve the Whois++ query (the analog to the ldap
         search) into a particular server in the Whois++ mesh, and it is up to
         the individual Whois++ server to organize its data correctly to
         perform the individual query.  The information passed around amongst
         LDAP enabled CIP index servers is an LDAP-INDEX.  If it is the case
         that the LDAP server is going to be able to map the various data
         values that are contained in the centroid back to individual user
         objects, then the USER Template for LDAP-INDEXs would have to include
         the common name attribute, and a means of tagging the other fields in
         the LDAP-INDEX with an indication of the root common names to which
         it referred.  In the LDAP-INDEX, these tags are known as Relative
         Handle Identifiers (RHIs).  For example, it would need to be known
         that the Given Name of John mapped back to the common names JohnS,
         JohnT and Jsmith (if these were all common names in some directory
         tree).  Assume the following three directory records:
                           

                           Record 1
                           Template: User
                           Common Name: John
                           First Name: John
                           Last Name: Smith
                           Favourite Drink: Labatt Beer

                           
                           Record 2
                           Template: User
                           Common Name: Joe
                           First Name: Joe

                           Last Name: Smith
                           Favourite Drink: Molson Beer


      Hedberg, Greenblatt, Moats     5              Expires May 1997


      Internet Draft                                October 13, 1996


                           Record 3

                           Template: User
                           Common Name: JohnS
                           First Name: John
                           Last Name: Smith
                           Favourite Drink: Root Beer

                           
                           The LDAP-INDEX for this server would be:
                           
            Template: User
            Common Name: 1/John
                         2/Joe

                         3/JohnS
            First Name: 2/Joe
                        1,3/John
            Last Name: 1,2,3/Smith
            Favourite Drink: 1,2,3/Beer

                             1/Labatt
                             2/Molson
                             3/Root

            In this example, the comma character is used as a RHI separator,
         and the forward slash is used as the RHI terminator.  Placing the
         RHIs at the beginning of the attribute information makes the RHI
         parsing easier since there are no restrictions on the characters that
         may be contained in the attribute value.  An RHI is an opaque value
         that has meaning only in the context of an individual LDAP-INDEX
         object.  Thus, when this server repeats sending its LDAP-INDEX, it is
         not guaranteed that the RHIs will be identical to the previous
         instantiation of the LDAP-INDEX.


            In order to participate in a mesh of CIP servers, an LDAP server
         would need to appropriately define a set of CIP header fields.  The
         following describes how to use CIP headers in an LDAP indexing
         environment.  The CIP headers that are required are: type, dsi and
         base-uri parameters in the case of the subtype: cip-index-object. 
         The type field for an LDAP-INDEX object should be  LDAP-INDEX-1". 
         The dsi field does not appear to be of much interest in an LDAP
         environment.  This is due to the case that LDAP provides a global
         naming scheme, and it appears that the dsi is in place to assist

      Hedberg, Greenblatt, Moats     6              Expires May 1997


      Internet Draft                                October 13, 1996


         those naming services that do not provide an ability to distinctly
         identify a directory object in particular.  Thus, it appears that the
         LDAP-INDEX-1 index object could make do with one OID that would
         describe every index object, since each index object describes some
         subset of a global name space.  The base-uri for an LDAP-INDEX-1
         object would be whatever ldap uri could be used to search the subtree
         of the DIT to which this index object indexes.  Thus, for an index of
         the whole server at ldap.novell.com, the base-URI could be
         ldap.novell.com, or if we just wanted the c=us, o=novell subtree that
         would be added on to the end of the uri according to the format
         defined in RFC 1959.


            Todo: how to define CIP headers, or information inside the LDAP-
         INDEX-1 object type to allow for partial updates (i.e. deltas).  For
         example, I'd like to be able to poll ldap.attmail.com once a week to
         get the changes that have been made.


            This section is based on experimental work that has been done, and
         needs to be cleaned up.


            This Schema requires that each organization that wanted to be in
         the index should be represented as a data set, therefore each
         organization was given an OID from the part of the OID tree that I
         control. This was then the DSI.  I didn't have and do not have any
         control over which directory service the organization would use, it
         was their internal choice. Hence I had to be prepared to deal with
         Whois++, PH, X.500 DSA as well as SLAPD servers.  (Note: this is
         conflicting with the previous section).

            Attributes and ObjectClasses
            ----------------------------


            As stated before I wanted the index to be stored in 'normal' X.500
         attributes/objectClasses.  Therefore I designed two new objectClasses
         and a couple of  attributes.  

            Name:              cIPIndex
            Description:       Objectclasses that holds one indexvalue and 
                                  information connected to this value.


      Hedberg, Greenblatt, Moats     7              Expires May 1997


      Internet Draft                                October 13, 1996


            OID:               umuObjectClass.9 (1.2.752.17.3.9)
            SubclassOf:        top

            MustContain:       dSI, idx
            MayContain:        field,template

            Name:              cIPDataSet
            Description:       Objectclass that holds information concerning

                                  one data set.
            OID:               umuObjectClass.10 (1.2.752.17.3.10)
            SubclassOf:        top
            MustContain:       dSI, baseURI, accessPoint
            MayContain:        description, protocolVersion, indexOCAT


            ------------------------------------------------------------------
         --
             
               Name:              idx
               ShortName:
               Description:       RDN of a indexobject

               OID:               umuAttributeType.20 (1.2.752.17.1.20)
               Syntax:            DirectoryString
               SizeRestriction:   none
               SingleValued:      True


               Name:              dSI
               ShortName:
               Description:       Data set Identifier, a unique identifier for

                                  one particular set of information.
               OID:               umuAttributeType.21 (1.2.752.17.1.21)
               Syntax:            OID

               SizeRestriction:   none
               SingleValued:      False

               Name:              field
               ShortName:

               Description:       Field type, describes the type of data
                                  that is stored in this entry.


      Hedberg, Greenblatt, Moats     8              Expires May 1997


      Internet Draft                                October 13, 1996


               OID:               umuAttributeType.24 (1.2.752.17.1.24)
               Syntax:            caseIgnoreString

               SizeRestriction:   none
               SingleValued:      True

               Name:              template
               ShortName:

               Description:       Template type, describes the kind of
                                  object we're dealing with.
               OID:               umuAttributeType.25 (1.2.752.17.1.25)
               Syntax:            caseIgnoreString
               SizeRestriction:   none
               SingleValued:      True


               Name:              accessPoint
               ShortName:
               Description:       Host and portnumber written in the normal
                                  UR* way that is host:portnumber

               OID:               umuAttributeType.22 (1.2.752.17.1.22)
               Syntax:            caseIgnoreString
               SizeRestriction:   none
               SingleValued:      False


               Name:              baseURI
               ShortName:
               Description:       Universal Resource Identifier that is
                                  used in further searches.
               OID:               umuAttributeType.26 (1.2.752.17.1.26)
               Syntax:            caseExactString

               SizeRestriction:   none
               SingleValued:      False

               Name:              protocolVersion
               ShortName:

               Description:       Common Indexing Protocol version
               OID:               umuAttributeType.27 (1.2.752.17.1.27)
               Syntax:            caseIgnoreString
               SizeRestriction:   none


      Hedberg, Greenblatt, Moats     9              Expires May 1997


      Internet Draft                                October 13, 1996


               SingleValued:      True


               Name:              indexOCAT
               ShortName:
               Description:       A description of what kind of information
                                  that are indexed like "person commonName"
                                  some like to have OIDs instead of readable

                                  names but that's debatable.
                                  Anyway they should be standardized by FIND
                                  or some other group (IDS?) within IETF.
               OID:               umuAttributeType.27 (1.2.752.17.1.27)
               Syntax:            caseIgnoreString
               SizeRestriction:   none

               SingleValued:      False

            ------------------------------------------------------------

            DIT Structure

            -------------

            The format of the part of the DIT where the index information
            was stored would look like this:


                                 c=SE
                                  |
                                  |
                                  |
                                dSI=1.2.752.17.5.0
                                /.|.\

                               /..|..\
                              /...|...\
               dSI=1.2.752.17.5.1....idx="Roland Hedberg"


            That is there would be a 'root' of each collective index and below
         that
            a flat structure with all the dSI's and index values.
            This makes it rather easy to handle incremental updates.


      Hedberg, Greenblatt, Moats     10             Expires May 1997


      Internet Draft                                October 13, 1996


            There might be several of these trees each then representing
         different

            type of indexes, "person commonName" or "organisation
         organisationalName"
            or "organizationalRole commonName" or combinations "person
         commonName"
            and "person title".
            I have choosen to use X.500 names since there are no
         standardisation
            by FIND or anyone else on this for the time being.


            This would allow a user to choose which index to use based on
         what's
            index and additional information kept in the descriptionfield. 
            If the indexOCAT specifies "person commonName" the description
         might
            be "Biologists".


            In the case that you store both titles as well as commonNames in
         one
            indexserver you might change the idx to be
            indexvalue+"attributeName"+"objectClass"
            or store attributeName, objectClass in the field,template
         attributes.
            If nothing is stored either way about attributename and
         objectClass then

            it defaults to whatever is specified in the dSI entry which then
         obviously
            only can contain one such specification.

            A typical dSI entry would be :


            dSI= 1.2.752.17.5.3
            description= Umea University
            accessPoint= kybele.umdc.umu.se:1389
            indexOCAT= person commonName
            baseURI= ldap://ldap.umu.se/o=Umea%20Universitet,c=SE
            protocolVersion= 1.7


      Hedberg, Greenblatt, Moats     11             Expires May 1997


      Internet Draft                                October 13, 1996


            and a indexentry would be something like this
            idx=Roland Hedberg

            dSI= 1.2.752.17.5.3&1.2.752.17.5.32


            Roland Hedberg
            Umdac
            Umea University
            901 87 Umea
            Sweden

            Email:  Roland.Hedberg@umdac.umu.se

            Bruce Greenblatt
            Novell, Inc
            2180 Fortune Drive

            San Jose, CA 95131
            USA
            Email: bgg@novell.com
            Phone: +1-408-577-7688


            Ryan Moats
            AT&T
            15621 Drexel Circle
            Omaha, NE 68135-2358
            USA
            EMail:  jayhawk@ds.internic.net

            Phone:  +1 402 894-9456


      Hedberg, Greenblatt, Moats     12             Expires May 1997