Network Working Group                                     Roland Hedberg
Internet Draft                                          Bruce Greenblatt
<draft-ietf-find-cip-tagged-05.txt>                           Ryan Moats
Expires in six months                                          Mark Wahl


     A Tagged Index Object for use in the Common Indexing Protocol


Status of this Memo


     This document is an Internet-Draft.  Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas, and
its working groups. Note that other groups may also distribute working
documents as Internet-Drafts.


     Internet-Drafts are draft documents valid for a maximum of six
months.  Internet-Drafts may be updated, replaced, or made obsolete by
other documents at any time.  It is not appropriate to use  Internet-
Drafts as reference material or to cite them other than as a "working
draft" or "work in progress".


     To learn the current status of any Internet-Draft, please check the
1id-abstracts.txt listing contained in the Internet-Drafts Shadow Direc-
tories on ds.internic.net (US East Coast), nic.nordu.net (Europe),
ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific Rim).


     Distribution of this document is unlimited.


     Abstract


     This document defines a mechanism by which information servers can
exchange indices of information from their databases by making use of
the Common Indexing Protocol (CIP).  This document defines the structure
of the index information being exchanged, as well as a the appropriate
meanings for the headers that are defined in the Common Indexing Proto-
col.  It is assumed that the structures defined here can be used by
X.500 DSAs, LDAP servers, Whois++ servers, CCSO servers and many others.


Hedberg, Greenblatt, Moats, Wahl                                [Page 1]


Internet Draft                                              January 1998


1.  Introduction


     The Common Indexing Protocol (CIP) as defined in [1] proposes a
mechanism for distributing searches across several instances of a single
type of search engine with a view to creating a global directory.  CIP
provides a scalable, flexible scheme to tie individual databases into
distributed data warehouses that can scale gracefully with the growth of
the Internet.  CIP provides a mechanism for meeting these goals that is
independent of the access method that is used to access the actual data
that underlies the indices.  Separate from CIP is the definition of the
Index Object that is used to contain the information that is exchanged
among Index Servers.  One such Index Object that has already been
defined is the Centroid that is derived from the Whois++ protocol [2].


     The Centroid does not meet all of the requirements for the exchange
of index information amongst information servers.  For example, it does
not support the notion of incremental updates natively.  For information
servers that contain millions of records in their database, constant
exchange of complete dredges of the database is bandwidth intensive.
The Tagged Index Object is specifically designed to support the exchange
of index update information.  This design comes at the cost of an
increase in the size of the index object being exchanged.  The Centroid
is also not tailored to always be able to give boolean answers to
queries.  In the Centroid Model, "an index server will take a query in
standard Whois++ format, search its collections of centroids and other
forward information, determine which servers hold records which may fill
that query, and then notifies the user's client of the next servers to
contact to submit the query." [2] Thus, the exchange of Centroids
amongst index servers allows hints to be given as to which information
server actually contains the information.  The Tagged Index Object
labels the various pieces of information with identifiers that tie the
individual object attributes back to an object as a whole.  This "tag-
ging" of information allows an index server to be more capable of
directing a specific query to the appropriate information server.
Again, this feature is added to the Tagged Index Object at the expense
of an increase in the size of the index object.


2.  Background


     The Lightweight Directory Access Protocol (LDAP) is defined in [3],
and it defines a mechanism for accessing a collection of information
arranged hierarchically in such a manner as to provide a globally


Hedberg, Greenblatt, Moats, Wahl                                [Page 2]


Internet Draft                                              January 1998


distributed database which is normally called the Directory Information
Tree (DIT).  Some distinguishing characteristics of LDAP servers are
that it is normally the case that several servers cooperate to manage a
common subtree of the DIT.  LDAP servers are expected to respond to
requests that pertain to portions of the DIT for which they have data,
as well as for those portions for which they have no information in
their database. For example, the LDAP server for a portion of the DIT in
the United States (c=US) must be able to provide a response to a Search
operation that pertains to a portion of the DIT in Sweden (c=se).  Nor-
mally, the response given will be a referral to another LDAP server that
is expected to be more knowledgeable about the appropriate subtree.
However, there is no mechanism that currently enables these LDAP servers
to refer the LDAP client to the supposedly more knowledgeable server.
Typically, an LDAP (v3) server is configured with the name of exactly
one other LDAP server to which all LDAP clients are referred when their
requests fall outside the subtree of the DIT for which that LDAP server
has knowledge.  This specification defines a mechanism whereby LDAP
server can exchange index information that will allow referrals to point
towards a clearly accurate destination.


     While the X.500 series of recommendations defines the Directory
Information Shadowing Protocol (DISP) [4] which allows X.500 DSAs to
exchange actual information in the DIT.  Shadowing allows various infor-
mation from various portions of the DIT to be replicated amongst partic-
ipating DSAs.  The design point of DISP is optimized at the exchange of
entire portions of the DIT, whereas the design point of CIP and the
Tagged Index Object is optimize at the exchange of structural index
information about the DIT, and improving the performance of tree naviga-
tion amongst various information servers.  The Tagged Index Object is
more appropriate for the exchange of index information than is DISP.
DISP is more targeted at DIT distribution and fault tolerance.  DISP is
thus more appropriate for the exchange of the actual data in order to
spread the load amongst several information servers.  DISP is tailored
specifically to X.500 (and other hierarchical directory systems), while
the Tagged Index Object and CIP can be used in a wide variety of infor-
mation server environments.


     While DISP allows an individual directory server to collect infor-
mation about large parts of the DIT, it would require a huge database to
collect all of the replicas for a meaningful portion of the DIT.  Fur-
thermore, as X.525 states: "Before shadowing can occur, an agreement,
covering the conditions under which shadowing may occur is required.
Although such agreements may be established in a variety of ways, such
as policy statements covering all DSAs within a given DMD ...", where a
DMD is a Directory Management Domain.  This is due to the case that the
actual data in the DIT is being exchanged amongst DSA rather than only


Hedberg, Greenblatt, Moats, Wahl                                [Page 3]


Internet Draft                                              January 1998


the information required to maintain an Index.  In many environments
such an agreement is not appropriate, and in order to collect informa-
tion for a meaningful portion of the DIT, a large number of agreements
may need to be arranged.


3.  Object


     What is desired is to have an information server (or network of
information servers) that can quickly respond to real world requests,
like:


-    What is Tim Howes' email address?  This is much harder than, What
     is Tim Howes at Netscape's email address.

-    What is the X.509 certificate for Fred Smith at compuserve.com?
     One certainly doesn't want to search CompuServe's entire directory
     tree to find out this one piece of information.  I also don't want
     to have to shadow the entire CompuServe directory subtree onto my
     server.  If this request is being made because Fred is trying to
     log into my server, I'd certainly want to be able to respond to the
     BIND in real time.


-    Who are all of the people at Novell that have a title of program-
     mer?


     All of these requests can reasonably be translated into LDAP or
Whois++, and other directory access protocol queries.  They can also be
serviced in a straightforward manner by the users home information
server if it has the appropriate reference information into the database
that contains the source data.  In this situation, the first server
would be able to "chain" the request on behalf of the user.  Alterna-
tively, a precise referral could be returned.  If the home information
server wants to service (i.e chain) the request based on the index
information that it has on hand, this servicing could be done by any
number of means:


-    issuing LDAP operations to the remote directory server

-    issuing DSP operations to the remote directory server

-    issuing DAP operations to the remote directory server


Hedberg, Greenblatt, Moats, Wahl                                [Page 4]


Internet Draft                                              January 1998


-    issuing Whois++ operations to the remote Whois++ server

-     ...


4.  The Tagged Index Object

     This section defines a Tagged Index Object that can be exchanged by
Information Servers using CIP.  While in many cases it is acceptable for
Information Servers to make use of the Centroid construct (as defined in
[2]) to exchange index information, the goals in defining a new con-
struct are multi-pronged:

-    When the Information Server receives a search request that warrants
     that a referral be returned, allow the server to return a referral
     that will point client to a server that is most likely able to
     answer the request correctly.  False positive referrals (the search
     turns up hits in the index object that generate referrals to
     servers that don't hold the desired information) can be reduced,
     depending on the choice of attribute tokenization types that are
     used.

-    When the Information Server receives a search request that is not
     operating against local data, allow the Information Server itself
     to "chain" the request to the appropriate remote Information
     Server.  Note that LDAP itself does not define how Chaining works,
     but X.500 does.  This seems very similar to the first "prong".

-    Finally, when a collection of Information Servers are operating
     against a large distributed directory, allow them to distribute
     index information amongst themselves (ala CIP) so that as their own
     searches can be carried out with some degree of efficiency.


4.1.  The Agreement


     Before a Tagged Index Object can be exchanged, the organization
which administers the object supplier and the organization which admin-
isters the object consumer must reach an agreement on how the servers
will communicate. This agreement contains the following:

-    "version":The version of the agreement and the index type.  This
     specification describes the index type "x-tagged-index-1"

-    "dsi": An OID which uniquely identifies the subtree and scope.
     This field is not explicitly necessary, as it may not provide
     information beyond that which is contained in the "base-uri" below.


Hedberg, Greenblatt, Moats, Wahl                                [Page 5]


Internet Draft                                              January 1998


-    "base-uri": One or more URI's which will form the base of any
     referrals created based upon the index object that is governed by
     this agreement.  For example, in the LDAP URL format [8] the base-
     uri would specify (among other items): the LDAP host,  the base
     object to which this index object refers (e.g. c=SE), and the scope
     of the index object (e.g. single container).

-    "supplier": The hostname and listening port number of the supplier
     server, as well as any alternative servers holding that same naming
     contexts, in case the supplier is unavailable.

-    "consumeraddr": This is a URI of the "mailto:" form, with the RFC
     822 email address of the consumer server.  Subsequent versions of
     this draft allow other forms of URI, so that the consumer may
     retrieve the update via the WWW, FTP or CIP

-    "updateinterval": The maximum duration in seconds between occu-
     rances of the supplier server generating an update.  If the con-
     sumer server has not received an update from the supplier server
     after waiting this long since the previous update, it is likely
     that the index information is now out of date.  A typical value for
     a server with frequent updates would be 604800 seconds, or every
     week.  Servers whose DITs are only  modified annually could have a
     much longer update interval.

-    "securityoption": Whether and how the supplier server should  sign
     and encrypt the update before sending it to the consumer server.
     Options for this version of the specification are:

          "none" - the update is sent in plaintext

          "PGP/MIME": the update is digitally signed and encrypted using
          PGP [9]

          "S/MIME": the update is digitally signed and encrypted using
          S/MIME [10]

          "SSLv3": the update is digitally signed and encrypted using an
          SSLv3 connection [11]

          "Fortezza": the update is digitally signed and encrypted using
          Fortezza [5]

     It is recommended that the "PGP/MIME" option be used when  exchang-
ing sensitive information across public networks, and both the supplier
and consumer have PGP keys. The "Fortezza" option is intended for use in
environments where security protocols are based on Fortezza-compatible
devices. The "S/MIME" option can be used with both the supplier and


Hedberg, Greenblatt, Moats, Wahl                                [Page 6]


Internet Draft                                              January 1998


consumer have RSA keys and can make use of the PKCS protocols defined in
the S/MIME specification. The "SSLv3" option can be used when both the
supplier and consumer have access to SSL services, have server certifi-
cates, and can mutually authenticate each other.  Should these be IANA
registered things???

-    Security Credentials: The long-term cryptographic credentials used
     for key exchange and authentication of the consumer and supplier
     servers, if a security option was selected.  For "PGP/MIME", this
     will be the trusted public keys of both servers.  For "Fortezza",
     this will be the certificate paths of both servers to a common
     point of trust. For "S/MIME" and "SSLv3" these will be the certifi-
     cates of the supplier and consumer.

     Note that if the index server maintains the information that would
appear in the agreement in a directory according to the definitions in
[7], then no real formal agreement between the two parties needs to be
put in place, and the information that is required for communication
between the two index servers is derived automatically from the direc-
tory.

4.2.  Content Type


     The update consists of a MIME object of type application/cip-index-
object.  The parameters are:

     "type": this has value "application/index.obj.tagged".

     "dsi": the DSI (if any) from the agreement.

     "base-uri". A set of URIs, separated by spaces. In each URI, the
     hostname/portno must be distinct, and based on the "supplier" part
     of the agreement.


     The payload is mostly textual data but may include bytes with the
high bit set.  The originating information server should set the con-
tent-transfer-encoding as appropriate for the information included in
the payload.

     This object may be encapsulated in a wrapper content (such as mul-
tipart/signed) or be encrypted as part of the security procedures.   The
resulting content can the distributed, for example via electronic mail.
For example,
From: supplier@sup.com Date: Thu, 16 Jan 1997 13:50:37 -0500
Message-Id: <199701161850.NAA29295@sup.com>;
To: consumer@consumer.com       <<-- from consumer server address


Hedberg, Greenblatt, Moats, Wahl                                [Page 7]


Internet Draft                                              January 1998


Reply-to: supplier-admin@sup.com
MIME-Version: 1.0
Content-Type: application/index.obj.tagged;
dsi=1.3.6.1.4.1.1466.85.85.1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.16;
base-uri="ldap://sup.com/dc=sup,dc=com ldap://alt.com/dc=sup,dc=com"


     The payload is series of CRLF-terminated lines.  The payload only
includes characters from a subset of the printable US-ASCII subset of
UTF-8.  Attribute values that occur outside of this subset are encoded
as defined below.  As more experience is gained with index objects and
UTF-8 data, a future version of this specification may allow for the
native transfer of UTF-8 data with requiring this special encoding.  No
other character sets are permitted by this version of the specification.
Some supplier servers may only be able to generate the printable US-
ASCII subset, but all consumer servers must be able to handle the full
range of Unicode characters when decoding the attribute values (in the
"attr-value" field in the BNF below).


4.3.  Tagged Index BNF


     The Tagged Index object has the following grammar, expressed in
modified BNF format:

index-object = 0*(io-part SEP) io-part
io-part      = header SEP schema-spec SEP index-info
header       = version-spec SEP update-type SEP this-update SEP
                last-update SEP context-size
version-spec = "version:" *SPACE "x-tagged-index-1"
update-type  = "updatetype:" *SPACE ( "total" | "incremental")
this-update  = "thisupdate:" *SPACE TIMESTAMP
last-update  = [ "lastupdate:" *SPACE TIMESTAMP ]
context-size = [ "contextsize:" *SPACE 1*DIGIT ]
schema-spec  = "BEGIN IO-Schema" SEP 1*(schema-line SEP)
               "END IO-Schema"
schema-line  = attribute-name ":" token-type
token-type   = "FULL" | "TOKEN" | "RFC822" | "UUCP" | "DNS"
index-info   = full-index | incremental-index
full-index   = "BEGIN Index-Info" SEP 1*(index-block SEP)
               "END Index-Info"
incremental-index = 1*(add-block | delete-block | update-block)
add-block    = "BEGIN Add Block" SEP 1*(index-block SEP)
               "END Add Block"
delete-block = "BEGIN Delete Block" SEP 1*(index-block SEP)
               "END Delete Block"
update-block = "BEGIN Update Block" SEP 1*(index-block SEP)


Hedberg, Greenblatt, Moats, Wahl                                [Page 8]


Internet Draft                                              January 1998


               "END Update Block"
index-block  = first-line 0*(SEP cont-line)
first-line   = attr-name ":" *SPACE taglist "/" attr-value
cont-line    = "-" taglist "/" attr-value
taglist      = tag 0*("," tag)
tag          = 1*DIGIT ["-" 1*DIGIT]
attr-value   = 0*(UTF8)
attr-name    = 1*(NAMECHAR)
UTF8         = ASCII | "%" HEX HEX
TIMESTAMP    = 1*DIGIT
ASCII        = DIGIT | UPPER | LOWER | OTHER
NAMECHAR     = DIGIT | UPPER | LOWER | "-" | ";" | "."
SPACE        = <ASCII space, hex 20>;
SEP          = (CR LF) | LF
CR           = <ASCII CR, carriage return, hex 0D>;
LF           = <ASCII LF, line feed, hex 0A>;
HEX          = "a" | "b" | "c" | "d" | "e" | "f" | DIGIT
DIGIT        = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
               "8" | "9"
UPPER        = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" |
               "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" |
               "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" |
               "Y" | "Z"
LOWER        = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" |
               "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" |
               "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" |
               "y" | "z"
OTHER        = "(" | ")" | "+" | "," | "-" | "." | "/" | ":" |
               "=" | "?" | "@" | ";" | "$" | "_" | "!" | "~" |
               "*" | "'" | "\" | """ | "#" | "&" | "<" | ">" |
               "[" | "]" | "^" | "`" | "{" | "|" | "}"

     Characters that are allowed to appear unescaped in attr-values are
the printable subset of (low) ASCII minus the "%" characters, i.e. hex
21 through hex 7e inclusive with the exception of hex 25 (which is the
"%" character).  Any other UTF-8 encoding of a character that appears in
an attr-value must be excaped by using the "%" character and two hex
digits that encode the character.  For example, The UCS-2 sequence
"A<NOT IDENTICAL TO><ALPHA>." (0041, 2262, 0391, 002E) may be encoded in
UTF-8 as follows:
   41 E2 89 A2 CE 91 2E

     If this character sequence appears in an attribute that is in a
Tagged Index Object attr-value, then it is encoded as:
   41 25 65 32 25 38 39 25 61 32 25 63 65 25 39 31 2E

     When viewed as an character string the encoding appears as:
   "A%e2%89%a2%ce%91."


Hedberg, Greenblatt, Moats, Wahl                                [Page 9]


Internet Draft                                              January 1998


     The set of characters allowed to appear in the attr-name field is
limited to the set of characters used in LDAP and WHOIS++ attribute
names.  For other services that have attribute name character sets that
are larger than these, it is suggested that those services create a pro-
file that maps the names onto object identifiers, and the sequence of
digits and periods is used by those services in creating the attr-name
fields for their Tagged Index Objects.

     Note that the attribute value may only be empty in the case of an
incremental update that contains a "Update Block" in which the index
object indicates that certain attributes of objects are being removed.
This specification only supports the replacement of entire attributes,
so that in the case of a multi-valued attribute, all of the values must
be specified in the Replace Block, not just the newly added values.  The
intention of the Tagged Index Object is to supply a snapshot of the cur-
rent index of the directory.

4.3.1.  Header Descriptions

     The header section consists of one or more "header lines".  The
following header lines are defined:

     "version": This line must always be present, and have the value "x-
     tagged-index-1" for this version of the specification.

     "updatetype": This line must always be present.  It takes as the
     value either "total" or

     "incremental".  The first update sent by a supplier server to a
     consumer server for a DSI must be a "total" update (why?).

     "thisupdate": This line must always be present. The value is the
     number of seconds from 00:00:00 UTC January 1, 1970 at which the
     supplier constructed this update.

     "lastupdate": This line must be present if the "updatetype" list
     has the value

     "incremental".  The value is the number of seconds from 00:00:00
     UTC January 1, 1970 at which the supplier constructed the previous
     update sent to the consumer.  This field allows the consumer to
     determine if a previous update was missed.

     "contextsize": This line may be present at the supplier's option.
     The value is a number, which is the approximate total number of
     entries in the subtree.  This information is provided for statisti-
     cal purposes only.


Hedberg, Greenblatt, Moats, Wahl                               [Page 10]


Internet Draft                                              January 1998


4.3.2.  Tokenization Types

     The Tagged Index Object inherits the "TOKEN" scheme for tokeniza-
tion as specified in [2].  In addition, there are several other tok-
enization schemes defined for the Tagged Index Object.  The following
table presents these schemes and what character(s) are used to delimit
tokens.


        Token Type      Tokenization Characters
        FULL    none
        TOKEN   white space, "@"
        RFC822  white space, ".", "@"
        UUCP    white space, "!"
        DNS     any character note a number, letter, or "-"


4.3.3.  Tag Conventions

     In the tag list, multiple consecutive tags may be shortened by
using "#-#".  For example, the list "3,4,5,6,7,8,9,10" may be shortened
to "3-10".  Tags are to be applied to the data on a per entry level.
Thus, if two index lines in the same index object contain the same tag,
then it is always the case that those two lines refer back to the same
"record" in the directory.  In LDAP terminology, the two lines would
refer back to the same directory object.  Additionally if two index
lines in the same index object contain different tags, then it is always
the case that those two lines refer back to different records in the
directory.

     The tags in the index object are meaningful only in the context of
that transmission.  The tag applied to the same underlying record in two
separate transmissions of a full-update may be different.  Thus, receiv-
ing index servers should make no assumptions about the values of the
tags across index object boundaries.  If the recieving index server is
implemented in such a way that it maintains a structure similar to the
one that exists in the tagged index object with numbered tags attached
to various records, then these "internal" tags are distinct from the
tags that appear in the index object as created by the transmitting
index server.

4.4.  Incremental Indexing

     The tagged index object format supports the ability of information
servers to distribute only delta index data, rather than distributing
total index information each.  This scenario, known as incremental
indexing supports three basic types of operations: add, delete and


Hedberg, Greenblatt, Moats, Wahl                               [Page 11]


Internet Draft                                              January 1998


replace.  If th incremental updatetype is specified in the tagged index
object, then the index object contains a snapshot of only the changes
that have been made since the index object specified in the lastupdate
header was distributed.  If the receiving index server did not receive
that index object, it should request a total index object.  If the CIP
protocol supports it, the index server may request the specific index
object that it missed.

     If the tagged index object contains an Add Block, then the lines in
the Add Block refer to new records that were added to the information
base of the transmitting index server.  It can be guaranteed that those
records did not exist in any previously received tagged index object,
and the receiving index server can insert this index information in the
index that it already maintains for the transmitting index server.  If
the receiving index server is maintaining internal tags, then a new
internal tag should be created for each tag in the Add Block.

     If the tagged index object contains a Delete Block, then the Delete
Block contains lines each of which refers to the "key" field (in the
attr-name area of the index line) from a record in the information
server that has been deleted since the last update (specified in the
lastupdate header field).  This key field is assumed to be the unique
identifier on the transmitting information server for the record that
has been deleted.  In the case of LDAP servers, this field would have an
attr-name of "dn".  Other forms of information servers would use the
appropriate unique identifier.  Thus, the unique identifier must have
previously been sent by the transmitting index server.  If the receiving
index server has never received information for the record refered to by
a line in the Delete Block, then it should be ignored, with the proviso
that the receiving index server has more than likely "lost" some infor-
mation previously distributed by the transmitting index server.  If the
receiving index server is maintaining internal tags, then after process-
ing the Delete Block, the internal tag numbers may be reordered so as to
not have "holes" in the sequence.

     If the tagged index object contains an Update Block, then the lines
in the Update Block refer to records that were changed in the informa-
tion base of the transmitting index server.  As was mentioned in clause
4.3, if any portion of an attribute in the information server has been
changed, then the entire attribute must be specified, and all index
information from all values of a multi-valued attribute must be speci-
fied.  If the attribute was removed from the record in the information
server, the attribute value specified in the attr-value field should be
empty.  Attributes which have not been changed in the record are not
specified.  The Update Block also supports the idea of indexing new
attributes which were not previously included in the tagged index
object.  For example, if the transmitting index server began including
index information on postal addresses, then it could include an Update


Hedberg, Greenblatt, Moats, Wahl                               [Page 12]


Internet Draft                                              January 1998


Block in the index object that included all of the index information on
postal addresses for all records in its information base, and indicate
that nothing else has changed.  If the receiving index server is main-
taining internal tags, then after processing the Update Block, the
internal tag numbers should remain the same.


5.  Example

     As an example, the following LDIF [6] entries and the resulting
Tagged Index Object are presented.

           dn: cn=Barbara Jensen, ou=Product Development, o=Ace Industry, c=US
           objectclass: top
           objectclass: person
           objectclass: organizationalPerson
           cn: Barbara Jensen
           cn: Barbara J Jensen
           cn: Babs Jensen
           sn: Jensen
           uid: bjensen
           telephonenumber: +1 408 555 1212
           description: A big sailing fan.
           dn: cn=Bjorn Jensen, ou=Accounting, o=Ace Industry, c=US
           objectclass: top
           objectclass: person
           objectclass: organizationalPerson
           cn: Bjorn Jensen
           sn: Jensen
           telephonenumber: +1 408 555 1212
           dn: cn=Gern Jensen, ou=Product Testing, o=Ace Industry, c=US
           objectclass: top
           objectclass: person
           objectclass: organizationalPerson
           cn: Gern Jensen
           cn: Gern O Jensen
           sn: Jensen
           uid: gernj
           telephonenumber: +1 408 555 1212
           dn: cn=Horatio Jensen, ou=Product Testing, o=Ace Industry, c=US
           objectclass: top
           objectclass: person
           objectclass: organizationalPerson
           cn: Horatio Jensen
           cn: Horatio N Jensen
           sn: Jensen
           uid: hjensen
           telephonenumber: +1 408 555 1212


Hedberg, Greenblatt, Moats, Wahl                               [Page 13]


Internet Draft                                              January 1998


     The Tagged Index Object for this example would be:

                      version: x-tagged-index-1
                      updatetype: total
                      thisupdate: 855938804
                      BEGIN IO-Schema
                      dn: FULL
                      ou: TOKEN
                      o: TOKEN
                      c: TOKEN
                      objectclass: FULL
                      cn: TOKEN
                      sn: FULL
                      uid: FULL
                      title: TOKEN
                      END IO-Schema
                      BEGIN Index-Info
                      dn: 1/cn=Barbara Jensen,ou=Product Development,o=Ace Industry,c=US
                      -2/cn=Bjorn Jensen,ou=Accounting,o=Ace Industry,c=US
                      -3/cn=Gern Jensen,ou=Product Testing,o=Ace Industry,c=US
                      -4/cn=Horatio Jensen,ou=Product Testing,o=Ace Industry,c=US
                      ou: 1,3-4/Product
                      -1/Development
                      -2/Accounting
                      -3-4/Testing
                      o: */Ace
                      -*/Industry
                      c: */US
                      objectclass: */top
                      -*/person
                      -*/organizationalPerson
                      cn: 1/Barbara
                      -1/J
                      -1/Babs
                      -*/Jensen
                      -2/Bjorn
                      -3/Gern
                      -3/O
                      -4/Horatio
                      -4/N
                      sn: */Jensen
                      uid: 1/bjensen
                      -3/gernj
                      -4/hjensen
                      title: 1/product
                      1/manager
                      1/rod
                      1/and


Hedberg, Greenblatt, Moats, Wahl                               [Page 14]


Internet Draft                                              January 1998


                      1/reel
                      1/division
                      END Index-Info

     As an example of the Incremental Index Object, consider an update
that occurs when Barbara Jensen's entry above changes to:

           dn: cn=Barbara Jensen-Smith, ou=Product Development, o=Ace Industry, c=US
           objectclass: top
           objectclass: person
           objectclass: organizationalPerson
           cn: Barbara Jensen-Smith
           cn: Barbara J Jensen-Smith
           cn: Babs Jensen-Smith
           sn: Jensen-Smith
           uid: bjensen
           telephonenumber: +1 408 555 1212
           description: A big sailing fan.

     The Tagged Index Object for this example would be:

                      version: x-tagged-index-1
                      updatetype: incremental
                lastupdate: 855940000
                      thisupdate: 855938804
                      BEGIN IO-schema
                      dn: FULL
                      cn: TOKEN
                      sn: FULL
                      END IO-Schema
                      BEGIN Delete Block
                      dn: 1/cn=Barbara Jensen,ou=Product Development,o=Ace Industry,c=US
                      cn: 1/Jensen
                      sn: 1/Jensen
                      END Delete Block
                      BEGIN Add Block
                      dn: 1/cn=Barbara Jensen-Smith,ou=Product Development,o=Ace Industry,c=US
                      cn: 1/Jensen-Smith
                      sn: 1/Jensen-Smith
                      END Add Block

     In this next example, consider an LDIF file containing a series of
change records and comments.

   # Add a new entry
   dn: cn=Fiona Jensen, ou=Marketing, o=Ace Industry, c=US
   changetype: add
   objectclass: top


Hedberg, Greenblatt, Moats, Wahl                               [Page 15]


Internet Draft                                              January 1998


   objectclass: person
   objectclass: organizationalPerson
   cn: Fiona Jensen
   sn: Jensen
   uid: fiona
   telephonenumber: +1 408 555 1212
   jpegphoto:< /usr/local/directory/photos/fiona.jpg
   # Delete an existing entry
   dn: cn=Robert Jensen, ou=Marketing, o=Ace Industry, c=US
   changetype: delete
   # Modify an entry's relative distinguished name
   dn: cn=Paul Jensen, ou=Product Development, o=Ace Industry, c=US
   changetype: modrdn
   newrdn: cn=Paula Jensen
   deleteoldrdn: 1
   # Rename and entry and move all of its children to a new location in
   # the directory tree (only implemented by LDAPv3 servers).
   dn: ou=PD Accountants, ou=Product Development, o=Ace Industry, c=US
   changetype: modrdn
   newrdn: ou=Product Development Accountants
   deleteoldrdn: 0
   newsuperior: ou=Accounting, o=Ace Industry, c=US
   # Modify an entry: add an additional value to the postaladdress attribute,
   # completely delete the description attribute, replace the telephonenumber
   # attribute with two values, and delete a specific value from the
   # facsimiletelephonenumber attribute
   dn: cn=Paula Jensen, ou=Product Development, o=Ace Industry, c=US
   changetype: modify
   add: postaladdress
   postaladdress: 123 Anystreet $ Sunnyvale, CA $ 94086
   -
   delete: description
   -
   replace: telephonenumber
   telephonenumber: +1 408 555 1234
   telephonenumber: +1 408 555 5678
   -
   delete: facsimiletelephonenumber
   facsimiletelephonenumber: +1 408 555 9876
   -

     The Tagged Index Object for this example would be:

version: x-tagged-index-1
updatetype: incremental
thisupdate: 855938804
lastupdate: 855912345
BEGIN IO-Schema


Hedberg, Greenblatt, Moats, Wahl                               [Page 16]


Internet Draft                                              January 1998


dn: FULL
ou: TOKEN
o: TOKEN
c: TOKEN
objectclass: FULL
cn: TOKEN
sn: FULL
uid: FULL
title: TOKEN
END IO-Schema
BEGIN Add Block
objectclass: top
objectclass: person
objectclass: organizationalPerson
c: 1/us
o: 1/Ace
o: 1/Industry
ou: 1/Marketing
cn: 1/Fiona
cn: 1/Jensen
sn: 1/Jensen
uid: 1/Fiona
END Add Block

BEGIN Delete Block
dn: 1/cn=Robert Jensen, ou=Marketing, o=Ace Industry, c=us
END Delete Block

BEGIN Update Block
dn: 1/ou=PD Accountants, ou=Product Development, o=Ace Industry, c=US
-2/cn=Paula Jensen, ou=Product Development, o=Ace Industry, c=US
rdn: 1/Product Development Accountants
description: 2/
telephonenumber: 2/+1 408 555 5678
facsimilenumber: 2/
postaladdress: 2/123
-2/AnyStreet
-2/Sunnyvale
-2/CA
-2/94086
END Update Block
END Index-Info

6.  Aggregation


Hedberg, Greenblatt, Moats, Wahl                               [Page 17]


Internet Draft                                              January 1998


6.1.  Aggregation of Tagged Index Objects


     Aggregation  of two tagged index objects is done by merging the two
lists of values and rewriting each tag list.   The  tag  list  rewriting
process is done so that the resulting index object appears as if it came
from a single source.  Tags from one of the two tagged index objects are
"mapped"  to  the number space above that used by the other tagged index
object.  An index server that aggregates tagged index objects for export
MUST  ensure  that  the export URL (i.e. the base-uri of the CIP object)
for the aggregate index object will route all queries that  have  "hits"
on  the  index  object to that server (otherwise, query routing will not
succeed).


7.  Recommendations


     TBD

8.  Security Considerations

     This specification provides a protocol for transfering  information
between two servers.  The actual information transfered may be protected
by laws in many countries, so care must be taken in the methods used  to
tokenize  the  data  in  order  to ensure that protected data may not be
reconstructed in full by the receiving server.  This protocol  does  not
have  any  inherent  protection against spoofing or eavesdropping.  How-
ever, since this protocol is transported in MIME messages  (as  are  all
CIP  index  objects),  it  inherits all of the security capabilities and
liabilities of other MIME messages.  Specifically, those wanting to pre-
vent  eavesdropping  or  spoofing may use some of the various techniques
for signing and encrypting MIME messages.

     Information Server administrators  must  decide  what  portions  of
their  databases  are  appropriate  for  inclusion  in  the Tagged Index
Object.  For distribution of  information  outside  of  the  enterprise,
information  server  developers  are  encouraged to allow for facilities
that hide the organizational structure when generating the Tagged  Index
Object  from the underlying information database.  In order to allow for
the secure transmission of Tagged Index  Objects  across  the  Internet,
Index  Servers  should  make use of SSL to carry out the connection.  In
order to strongly verify the identity of the peer index  server  on  the
other  side of the connection, SSL version 3 certificate exchange should
be implemented, and the identity in the peer's certificate  verify  with
the  Public  Key Infrastructure.  If electronic mail is used to exchange
the Tagged Index Objects, then a  secure  messaging  facility,  such  as
PGP/MIME   or  S/MIME  should  be  used to sign or encrypt (or both) the


Hedberg, Greenblatt, Moats, Wahl                               [Page 18]


Internet Draft                                              January 1998


information.


9.  References


[1]  J. Allen, M. Mealling, "The Architecture  of  the  Common  Indexing
     Protocol (CIP)," Internet Draft (work in progress) June 1997.

[2]  C. Weider, J. Fullton, S. Spero, "Architecture of the Whois++ Index
     Service.  RFC 1913, February 1996.

[3]  M. Wahl, T. Howes, S. Kille, "Lightweight Directory Access Protocol
     (v3)," Internet Draft (work in progress), June 1997.

[4]  ITU, "X.525 Information Technology - Open Systems Interconnection -
     The Directory: Replication", November 1993.

[5]  "FORTEZZA Application Implementors Guide for  the  FORTEZZA  Crypto
     Card (Production Version)", Document #PD4002102-1.01, SPYRUS, 1995.

[6]  The LDAP Data Interchange Format (LDIF). Internet  Draft  (work  in
     progress), 25 November 1996.

[7]  R. Hedberg, "LDAPv2 client Vs the Index Mesh". Internet Draft (work
     in progress), November 1997.

[8]  T. Howes, M. Smith, "The LDAP URL Format". Internet Draft (work  in
     progress), June 1997.

[9]  M. Elkins, "MIME Security with Pretty Good Privacy (PGP)", RFC2015,
     October 1996.

[10] Blake Ramsdell, "S/MIME Version 3 Message Specification",  Internet
     Draft,  (work in progress), May 1997.

[11] C.  Allen,  T.  Dierks,  "The  TLS  Protocol Version 1.0", Internet
     Draft, (work in progress), November 1997.


10. Author's Addresses


Hedberg, Greenblatt, Moats, Wahl                               [Page 19]


Internet Draft                                              January 1998


     Roland Hedberg
     Umdac
     Umea University
     901 87 Umea
     Sweden
     Email:  Roland.Hedberg@umdac.umu.se


     Bruce Greenblatt
     RSA Data Security
     100 Marine Parkway
     Suite 500
     Redwood City, CA 94065
     USA
     Email: bgreenblatt@rsa.com
     Phone: +1-650-595-8782


     Ryan Moats
     AT&T
     15621 Drexel Circle
     Omaha, NE 68135-2358
     USA
     EMail:  jayhawk@ds.internic.net
     Phone:  +1 402 894-9456


     Mark Wahl
     Critical Angle, Inc.
     4815 W Braker Lane #502-385
     Austin, TX 78759
     Email: M.Wahl@critical-angle.com


Hedberg, Greenblatt, Moats, Wahl                               [Page 20]


Internet Draft                                              January 1998


                           Table of Contents


1. Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . .   2
2. Background  . . . . . . . . . . . . . . . . . . . . . . . . . . .   2
3. Object  . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   4
4. The Tagged Index Object . . . . . . . . . . . . . . . . . . . . .   5
4.1. The Agreement . . . . . . . . . . . . . . . . . . . . . . . . .   5
4.2. Content Type  . . . . . . . . . . . . . . . . . . . . . . . . .   7
4.3 Tagged Index BNF . . . . . . . . . . . . . . . . . . . . . . . .   8
4.3.1. Header Descriptions . . . . . . . . . . . . . . . . . . . . .  10
4.3.2. Tokenization types  . . . . . . . . . . . . . . . . . . . . .  11
4.3.3. Tag Conventions . . . . . . . . . . . . . . . . . . . . . . .  11
4.4. Incremental Indexing  . . . . . . . . . . . . . . . . . . . . .  11
5. Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  13
6. Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . .  17
6.1 Aggregation of Tagged Index Objects  . . . . . . . . . . . . . .  18
7. Recommendations . . . . . . . . . . . . . . . . . . . . . . . . .  18
8. Security Considerations . . . . . . . . . . . . . . . . . . . . .  18
9. References  . . . . . . . . . . . . . . . . . . . . . . . . . . .  19
10. Author's Addresses . . . . . . . . . . . . . . . . . . . . . . .  19


Hedberg, Greenblatt, Moats, Wahl                               [Page 21]