Internet DRAFT - draft-weider-cip-hierarchy


HTTP/1.1 200 OK
Date: Tue, 09 Apr 2002 12:07:19 GMT
Server: Apache/1.3.20 (Unix)
Last-Modified: Thu, 17 Apr 1997 07:43:00 GMT
ETag: "3ddf07-7c6f-3355d484"
Accept-Ranges: bytes
Content-Length: 31855
Connection: close
Content-Type: text/plain

Network Working Group					Chris Weider
Internet Draft						Paul Leach
<draft-weider-cip-hierarchy-00.txt>			Microsoft Corp.
							April, 1997

   Hierarchical Extensions to the Common Indexing Protocol

Status of this Memo

This is a personal submission to the FIND Working Group. It does not 
represent working group consensus.

This document is an Internet-Draft. Internet-Drafts are working 
documents of the Internet Engineering Task Force (IETF), its areas, 
and its working groups. Note that other groups may also distribute 
working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months 
and may be updated, replaced, or obsoleted by other documents at any 
time. It is inappropriate to use Internet-Drafts as reference 
material or to cite them other than as "work in progress".

WARNING: The specification in this document is subject to change, and 
will certainly change.  It is inappropriate AND STUPID to implement to 
the proposed specification in this document.  In particular, anyone 
who implements to this specification and then complains when it 
changes will be properly viewed as an idiot, and any such complaints 
shall be ignored. YOU HAVE BEEN WARNED.

To learn the current status of any Internet-Draft, please check the 
1id-abstracts.txt listing contained in the Internet-Drafts Shadow 
Directories on (Africa), (Europe), (Pacific Rim), (US East Coast), or (US West Coast).

Distribution of this document is unlimited.  Please send comments to 
the FIND working group at <>.  Discussions of the 
working group are archived at 

1. Introduction

This work explores what, in the parlance of the current CIP draft, is 
called an index type -- specifically, a new kind of index that merges indexing 
of  hierarchically named attribute-value entities (such as in LDAP and 
RWHOIS) and ones without distinguished names (such as in WHOIS++). It 
is based on a previous version of the CIP specification, but that was 
just a convenient syntactical jumping off point. It is intended to be 
orthogonal to the FIND working group task of  defining a framing 
syntax and functionality for a common indexing data wrapping protocol, 
and that the concepts and protocol elements in this draft should be 
able to be expressed in a manner consistent with the new CIP framework 
at the appropriate time.

2. Protocol Functionality and components of the Index Service

2.1 Base data servers 

Most directory services today specify only the query language, the 
information model, and the server responses for their servers. Most 
also use a basic 'template-based' information model, in which each 
entry consists of a set of attribute-value pairs. Thus the basic 
service can be provided by a wide variety of databases and directory 
services. However, to participate in the Index Service, that 
underlying database must also be able to generate a 'centroid', or 
some other type of forward knowledge, for the data it serves.

Connections out from the indexing service to the base data servers 
will be accomplished using URIs for the various end protocols. This 
will avoid the need to rewrite the data from its native formats.

2.2 Centroids as forward knowledge

The centroid of a server is comprised of a list of the templates and 
attributes used by that server, and a word list for each attribute. 
The word list for a given attribute contains one occurrence of every 
word which appears at least once in that attribute in some record in 
that server's data, and nothing else.

For example, if a server contains exactly three records, as follows:

Record 1                        Record 2
Template: User                  Template: User
First Name: John                First Name: Joe
Last Name: Smith                Last Name: Smith
Favourite Drink: Labatt Beer    Favourite Drink: Molson Beer

Record 3
Template: Domain
Domain Name:
Contact Name: Mike Foobar

the centroid for this server would be

Template: User
First Name: 
Last Name: 
Favourite Drink: 

Template: Domain
Domain Name:
Contact Name: 

It is this information which is handed up the tree to provide forward 
knowledge.  As we mention above, this may not turn out to be the ideal 
solution for forward knowledge, and we suspect that there may be a 
number of different sets of forward knowledge used in the Index 
Service. However, the indexing architecture is in a very real sense 
independent of what types of forward knowledge are handed around, and 
it is entirely possible to build a unified directory which uses many 
types of forward knowledge.

2.3 Other types of forward information

There are several other types of forward information that might be 
useful in an indexing service. The first is untokenized values for the 
given  attributes, as opposed to the tokenized values given in the 
centroid. A second type is forward information generated by a typical 
query; this can be used for replication of databases or of specific 
records in  a database. A third type is forward information which 
specifies from which server a given value was obtained. All of these 
are given in the protocol. A fourth type is aggregated hierarchical 
values: for example, let's assume that server A holds many email 
addresses with domain names such as,, and so forth. It would enhance compression if 
server A could simply specify that the email attribute was hierarchical 
and that any query which matched contained as the 
leftmost string would be a hit for purposes of referral. 

2.4 Index servers and Index server Architecture

A index server collects and collates the centroids (or other forward 
knowledge) of either a number of base servers or of a number of other 
index servers. An index server must be able to generate a centroid (or 
other forward knowledge) for the information it contains. In addition, 
an index server can index any other server it wishes, which allows one 
base level server (or index server) to participate in many hierarchies 
in the directory mesh.

2.4.1 Queries to index servers

An index server receives a query, searches its collections of 
centroids and other forward information, determines which servers hold 
records which may fill that query, and then notifies the user's client 
of the next servers to contact to submit the query. An index server can 
also contain primary data of its own; and thus act a both an index 
server and a base level server. In this case, the index server's 
response to a query may be a mix of records and referral pointers.

Each index server is required to support the following query protocols 
and to generate referrals in the proper format for those protocols: 
RWhois, WHOIS++, and LDAP. Index servers which directly index a base 
level server may in the future return referrals to those servers in 
their native protocols.
2.4.2 Index server distribution model and forward knowledge propogation

The diagram on the next page illustrates how a mesh of index servers 
might be created for a set of base servers. Although it looks like a 
hierarchy, the protocols allow (for example) server A to be indexed 
by both server D and by server H.

    whois++               index                   index
     servers               servers                 servers
                           for                     for
                           whois++                 lower-level
                           servers                 index servers
    |       |
    |   A   |__
    |_______|  \            _______
                \----------|       |
     _______               |   D   |__             ______
    |       |   /----------|_______|  \           |      |
    |   B   |__/                       \----------|      |
    |_______|                                     |  F   |
     _______                _______  /
    |       |              |       |-
    |   C   |--------------|   E   |
    |_______|              |_______|-
     _______                           \            ______
    |       |                           \----------|      |
    |   G   |--------------------------------------|  H   |
    |_______|                                      |______|

             Figure 1: Sample layout of the Index Service mesh

In the portion of the index tree shown above, base servers A and B hand 
their centroids up to index server D, base server C hands its centroid 
up to index server E, and index servers D and E hand their centroids 
up to indexserver F. Servers E and G also hand their centroids up to H.

The number of levels of index servers, and the number of index servers 
at each level, will depend on the number of base servers deployed, and 
the responsetime of individual layers of the server tree. These 
numbers will have to be determined in the field.

2.4.3 Forward knowledge propogation and changes to forward knowledge

Forward knowledge propogation is initiated by an authenticated POLL 
command (sec. 3.4.1).  The format of the POLL command allows the 
poller to request the forward knowledge of any or all templates and 
attributes held by the polled server. After the polled server has 
authenticated the poller, it determines which of the requested forward 
knowledge the poller is allowed to request, and then issues a 
CENTROID-CHANGES report (sec. 3.4.2) to transmit the data. When the 
poller receives the CENTROID-CHANGES report, it can authenticate the 
pollee to determine whether to add the new changes to its data. 
Additionally, if a given pollee knows what pollers hold forward 
knowledge from the pollee, it can signal to those pollers the fact 
that its information has changed by issuing a DATA-CHANGED command. 
The poller can then determine if and when to issue a new POLL request 
to get the updated information. The DATA-CHANGED command is included 
in this protocol to allow 'interactive' updating of critical information.

2.4.4 Forward knowledge propogation and mesh traversal

When an index server issues a POLL request, it may indicate to the 
polled server what relationship it has to the polled. This information 
can be used to help traverse the directory mesh. Two fields are 
specified in the current proposal to transmit the relationship 
information, although it is expected that richer relationship 
information will be shared in future revisions of this protocol.

One field used for this information is the Hierarchy field, and can 
take on three values. The first is 'topology', which indicates that 
the indexing  server is at a higher level in the network topology 
(e.g. indexes the whole regional ISP). The second is 'geographical', 
which indicates that the polling  server covers a geographical area 
subsuming the pollee. The third is  'administrative', which indicates 
that the indexing server covers an administrative domain subsuming the 

The second field used for this information is the Description field, 
which contains the DESCRIBE record of the polling server. This allows 
users to obtain richer metainformation for the directory mesh, 
enabling them to expand queries more effectively.

2.4.5 Loop control

Since there are no a priori restrictions on which servers may poll 
which other servers, and since a given server may participate in many 
sub-meshes, mechanisms must be installed to allow the detection of 
cycles in the polling relationships. This is accomplished in the 
current protocol by including a hop-count on polling relationships. 
Each time a polled server generates forward information, it informs 
the polling server about its current hopcount, which is the maximum of 
the hopcounts of all the servers it polls, plus 1. A base level server 
(one which polls no other servers) will have a hopcount of  0. When a 
server decides to poll a new server, if its hopcount goes up, then it 
must information all the other servers which poll it about its new 
hopcount. A maximum hopcount (8 in the current version) will help the 
servers detect  polling loops. 

A second approach to loop detection is to do all the work in the 
client; which would determine which new referrals have already 
appeared in the referral list, and then simply iterate the referral 
process until there are no new servers to ask.  An algorithm to 
accomplish this in WHOIS++ is detailed in [Faltstrom 95].

2.4.6 Query handling and passing algorithms

When an index server receives a query, it searches its collection of 
forward knowledge and determines which servers hold records which may 
fill that query.  As this service becomes widely deployed, it is 
expected that some index servers may specialize in indexing certain 
template types or perhaps even certain fields within those templates. 
If an index server obtains a match with the query _for those template 
fields and attributes the server indexes_, it is to be considered a 
match for the purpose of forwarding the query.

2.4.7 Query referral

Query referral is the process of informing a client which servers to 
contact next to resolve a query.  The syntax for notifying a client is 
outlined in section 4.5. A  query can specify the 'trace' option, 
which causes each server  which receives the query to send its server 
handle and an identification string to the client.

2.5 Security considerations

In the opinion of this author, until a generally accepted Internet 
wide security service is available (or until a web of such services 
reaches into most of the Internet) administrators should not assume 
that servers outside their control, or with which they have not 
established a trust relationship, will secure their data.. Propogating 
security information through the common index mesh will run 
immediately into the problems of common authentication, access 
control, and incommensurable security features. Thus any index 
information propogated to an untrusted (i.e. public) server should be 
considered unsecured.

3. Integrating disparate services

3.1 The service model

The basic service model uses a common data model and allows the use of 
different access protocols to access a CIP server. CIP schema will not 
be standardized in this version of the protocol.

3.2 Integration of data models

The basic data models for most of the existing directory services are 
essentially the same, a set of templates or object classes which are 
composed of attribute value pairs. Therefore integration of the data 
models should not prove too difficult.

3.3 Integration of schema

The various protocols use different attribute names for attributes 
which typically contain the same data. In this version of the 
protocol, the attributes will not be changed for inclusion into the 
CIP mesh. However, it is our intent at some point to require the 
translation of the base schema into a standard CIP schema set. This 
implies that in meshes based on this version of the protocol, that 
the schema may be different for each mesh.

3.4 Using different query protocols to access the CIP service

As this document is presently constituted, one can use many protocols 
to access a CIP server. If the attributes used by the client and 
server are the same, the query may be answered by the CIP service.

4. Protocol Specification for the Index Service

The syntax for each protocol component is listed below. In addition, 
each section contains a listing of which of these attributes is 
required and optional for each of the components. All timestamps 
must be in the format YYYYMMDDHHMM in GMT.
4.1 Request-Response model

There are two basic transactions in the Common Indexing Protocol: A 
Change Notification, with which a polled server indicates that the 
data is holds has changed, and that the polling server should repoll 
the polled server, and a Poll, in which a polling server indicates 
which data it would like an index for and the polled server sends that 
index. A polling server may issue a poll at any time, even if a prior 
change notification has not been received from the polling server.

4.2 Syntax Conventions

All lines in the protocol end in <CR><LF>. Line breaks are not to be 
included in the values extracted from a line. Special characters are 
escaped by a backslash, “\”. An escaped line break indicates that the 
line following the line ending in an escaped line break is supposed 
to be concatenated with the previous line to form a single value. A 
line break which is part of a value (in a postal address, for example) 
is indicated by the special token <b>. Component specifications and 
grouping operators are expressed using the standard HTML format <token> 
to open a block and <\token> to close the block. 

4.3 Change Notification

A polled server opens a TCP connection to a polling server, and 
issues a Data-Changed report, as detailed in 3.3.1. When the polling 
server receives the \Data-Changed line, it generates a 
Data-Changed-Ack, as detailed in 3.3.2. When the polled server receives 
the <\Data-Changed-Acl> line of the Data-Changed-Ack, it closes the 
connection. If the transaction is interrupted at any point, the polled 
server should assume that the report was not received, and should 
resend as appropriate.
4.3.1 Data-changed report syntax

The data changed report look like this:

 Version-number: // version number of index service software, used to insure
                 // compatibility. Current value is 2.3
 Time-of-latest-centroid-change: // time stamp of latest forward information 
                                // change,GMT
 Time-of-message-generation: // time when this message was generated, GMT
 DSI: // Data set identifier. This uniquely identifies a given data set in case the
	// server manages multiple logical data sets
 Server-handle: // IANA unique identifier for this server
		// or OID for this server
		// Or  Distinguished Name of the root of the subtree this server
		// is responsible for.
 Host-Name: // Host name of this server (current name)
 Host-Port: // Port number of this server (current port)
 Protocol: // Access protocol to use when speaking to this server
 Best-time-to-poll: // For heavily used servers, this will identify when
                    // the server is likely to be lightly loaded
                    // so that response to the poll will be speedy, GMT
<\Data-Changed> // This line must be used to terminate the data changed message

Required/optional table


4.3.2 DATA-CHANGED-ACK report

The DATA-CHANGED-ACK report has the following syntax:


4.4 Centroid Change Report

A polling server opens a TCP connection to a polled server, and issues 
a POLL command, as detailed in 4.4.1. When the polled server receives 
the # END POLL line, it generates a CENTROID-CHANGES report, as 
detailed in 4.4.2. When the polled server received the # END 
CENTROID-CHANGES line of the CENTROID-CHANGES report, it commits the 
data to its database and closes the connection. If the transaction is 
interrupted at any point, the polling server should assume that the 
entire centroid was not received, and should repoll the polled server.

4.4.1 Poll syntax

 Version: // version number of poller's index software, used to
                 // insure compatibility. Current is 2.2
 Charset: // specifies character set in which the centroid changes are to be
	// transmitted. Must be one of ISO-8859-1 or UNICODE-1-1-UTF-8
 DSI: // Data set identifier. Indicates which data set of multiple data sets
	// should be indexed. Must be an OID.
 Type-of-poll: //optional. If not present, indicates centroid poll
 Start-time: // give me all the centroid changes starting at this time, GMT
 End-time: // ending at this time, GMT
<Request> // This block may occur multiple times
 Template: // a standard template or object class name, or the keyword ALL, for a
           // full update.
 Field:    // used to limit centroid update information to specific fields,
           // is either a specific field name, a list of field names separated by,
           // spaces, or the keyword ALL. May occur multiple times per template.
 Starting-point: // location in the DIT or other hierarchical structure
	// to start the index. If used, it implies that the entire subtree is
	// indexed as well. If this attribute is missing, then the index request is 
      // assumed to cover the entire data store of the polled server.
 Server-handle: // IANA unique identifier for the polling server.
                // this handle may optionally be cached by the polled
                // server to announce future changes
 Host-Name: // Host name of the polling server.
 Host-Port: // Port number of the polling server.
  Description: // This field contains the DESCRIBE record of the
                // polling server
 Tokenization: // The tokenization algorithm used
       // Can be one of: "TOKENS", or "FIELDS".
       // Default is "FIELDS", which means the entire value in each field..
 Options: // Can be used to request the WEIGHT, HANDLE, and/or HOST information
        // for the returned values
 <\Poll> // This line must by used to terminate the poll message

When the poll type is CENTROID, the poll scope is FULL if the 
Start-time attribute is missing and incremental otherwise. If 
Start-time is present, it must be the same value as the End-time from 
a previous CENTROID-CHANGES report from this server. 

The allowable values for OPTION are WEIGHT, HANDLE, and HOST. Support 
for the HANDLE and HOST values are required. HANDLE indicates that 
each attribute value must be listed with the server handle of the 
server from which this value was obtained by the polled server; HOST 
indicates that each attribute value must be listed with the host name 
and port number of the server from which this value was obtained. 
WEIGHT is optional, and allows each value to be assigned a relative 
weight according to a defined and specified weighting scheme. This 
value is included for future clarification. Since a weighting scheme 
will need to be identified, WEIGHT will take additional scheme 
identifiers in a syntax to be determined. 


Required/Optional Table

REQUIRED, value is 2.0
Support for values ISO-8859-1 and UNICODE-1-1-UTF-8 
are required


If not present, report all templates
If not present, report all fields

Support for value TOKENS and FIELDS are required


Example of a POLL command:
 Version-number: 2.0
 Charset: UNICODE-1-1-UTF-8
 Server-handle: BUNYIP01
 Host-Port: 7070
 Tokenization-type: TOKENS

4.4.2 Centroid-changes report syntax

The centroid change report contains nested multiply occurring blocks. 
These blocks are delimited by lines which start with the # character, 
and have comments indicating that they may be used multiple times.

The syntax of a Data: item is either a list of values (words or other 
phases, depending on the tokenization value), one value per line, with 
the syntax:

or the keyword:


The weight is not required, but is expected to be used by advanced  
servers. The weight is the relative weight of the value for weighting 
The keyword * as the only item of a Data: list means that any value 
for this field should be treated as a hit by the indexing server.

The field Any-field: needs more explanation than can be given in the 
body of the syntax description below. It can take two values, True or 
False. If the value is True, the pollee is indicating that there are 
fields in this template which are not being exported to the polling 
server, but wishes to treat as a hit. Thus, when the polling server 
gets a query which has a term requesting a field not in this list for 
this template, the polling server will treat that term as a 'hit'.  If 
the value is False, the pollee is indicating that there are no other 
fields for this template which should be treated as a hit. This field 
is required because the basic model for the CIP query syntax requires 
that the results of each search term be 'and'ed together. This field 
allows polled servers to export data only for non-sensitive fields, yet 
still get referrals of queries which contain sensitive terms.

<Centroid >
 Version:	// version number of pollee's index software, used to
// insure compatibility. Current value is 2.3
 Character-set:	// Specifies which character set the data is in. Allowable values
// are ISO-8859-1 and UNICODE-1-1-UTF-8
 Start-time:	// change list starting time, GMT
 End-time:	// change list ending time, GMT
 Server-handle:	// IANA unique identifier of the responding server
 Hop-Count:	// One more than the largest value the polled server has received 
// when polling other servers. If the polled server is a leaf ,
// server, hop-count should be zero. The current maximum value 
// (Oct 96) is 8.
 Options:	// Which options the polled server was able to satisfy. Values are
 Status-Codes: // transmit error codes which indicate errors in the fulfillment of
     // the request. See section 5.
 Compression-type:	// Type of compression used on the data, or NONE
 Size-of-compressed-data:	// size of compressed data if compression is used
 Protocol:	// Query protocol spoken by the polled server. Used to construct the URLs 
// for referrals. One of WHOIS++, LDAP, CCSO, RWHOIS
 Operation:	// One of 3 keywords: ADD, DELETE, FULL
// ADD - add these entries to the centroid for this server
 	// DELETE - delete these entries from the centroid of this server
// FULL - the full centroid as of end-time follows
 Tokenization:	// The tokenization algorithm used
 	// Can be one of: "TOKENS "FIELDS".
// Default is "FIELDS".
Token:	// Character(s) used in the tokenization algorithm
<Server> // may occur multiple times
 Host: // Host name of server to which the following centroid data belongs. Must
      // be present and have a correct value even if the only server presenting
     // data is the polled server.
 Port: // Port number of server to which the following centroid data belongs. Must 
     // be present and have a correct value even if the only server presenting 
     // data is the polled server.
 Server-Handle: // server handle of server to which the following centroid data 
     // belongs. Must be present and have a correct value even if the only server
     // presenting data is the polled server
<Schema>	// may occur multiple times
Template:	// A standard template name
Field:	// an attribute (field) name inside the template
<Template>	// may occur multiple times
Template:	// a standard template name

Any-field:	// TRUE or FALSE. See beginning of 3.4.2 for explanation.
		// if this is TRUE, there will be no field blocks.
 	// the template contains multiple field blocks
Field:	// a field name within that template
Hierarchy:  // LEFT, RIGHT, or NONE
Tokenization: // TOKENS or FIELDS
Data:	// Either the keyword *, or
 	// the value list itself, one per line, cr/lf terminated,
 	// Each value may be optionally followed by another line containing
// weight  information, this other line begins
// with the weight tag, <weight>, the weight, and the close weight tag <\weight> 
<\Field>	// the field ends with \Field
<\Template>	// the template block ends with \Template
<\Server> // The server block ends with \Server
<\Centroid>	// This line must be used to terminate the centroid
// change report

For each template, all fields must be listed, or queries will not be referred correctly.

Required/Optional table

REQUIRED, value is 2.0
REQUIRED, values of ISO-8859-1 and UNICODE-
1-1-UTF-8 must be supported
REQUIRED (even if the centroid type is FULL)
REQUIRED (even if the centroid type is FULL)
OPTIONAL If the polling server has requested 
options a polled server is unable to satisfy, an error 
message will be generated

OPTIONAL (even if compression is used)
REQUIRED, Support for all three values is required


REQUIRED (if Any-field is FALSE)
REQUIRED (if Any-field is FALSE)

REQUIRED (if Any-field is FALSE)
REQUIRED (if Any-field is FALSE)


 Version: 2.0
 Charset: UNICODE-1-1-UTF-8
 Start-time: 197001010000+0100
 End-time: 199503012336+0100
 Server-handle: BUNYIP01
 Hop-Count: 3
 Tokenization: FIELDS
 Port: 7070
 Server-Handle: NADA01
 Template: USER
 Any-field: TRUE
 Field: Name
 Chris Weider
 Paul Leach
 Field: Email 

5. Client-Server Interaction

Access can be made to a CIP server using RWhois, WHOIS++, and LDAP. 
Referrals will be made in the protocol which was used to contact the 
CIP server, with the exception that the referral given by an index 
server which is a direct poller of a base level server may indicate 
that a different protocol must be used to contact the base server.

6. Reply Codes

The following reply codes are used by the Common Indexing Protocol. 
These are placed into the Status field of the CENTROID-CHANGES response.

113 Requested method not available      Unable to provide a requested tokenization,
                                        compression, or transfer encoding method.
                                        Contacted server will send requested data
                                        in different format.

114 Requested option not available      Unable to provide a requested option in
                                        CENTROID-CHANGES. No options have been used
                                        but raw data will be transmitted.

430 Authentication needed		    Authentication is required for this 

503 Required attribute missing          A REQUIRED attribute is missing in an

530 Authentication failed		    The authentication failed.

7. References

 [Faltstrom 95] Faltstrom, Patrik, Rickard Schoultz, and Chris Weider,
"How to interact with a WHOIS++ mesh", RFC 1914, Proposed Standard, November

8.  Author’s Addresses

Chris Weider,
Paul Leach,
1 Microsoft Way, 
Redmond, WA 98052
Weider	Commmon Indexing Protocol	Page 15

INTERNET-DRAFT	draft-ietf-find-cip-02.txt	11/25/96