DAV Searching and Locating                                 March 1998


     INTERNET-DRAFT                            S. Reddy
     draft-reddy-dasl-requirements-02.txt      Microsoft Corporation
     March, 1998                               J. Slein
     Expires July, 1998                        Xerox Corporation


             Requirements for DAV Searching and Locating

     Status of this Memo

     This document is an Internet draft. Internet drafts are working
     documents of the Internet Engineering Task Force (IETF), its areas
     and its working groups. Note that other groups may also distribute
     working information as Internet drafts.

     Internet Drafts are draft documents valid for a maximum of six
     months and can be updated, replaced or obsoleted by other documents
     at any time. It is inappropriate to use Internet drafts as
     reference material or to cite them as other than as "work in
     progress".

     To learn the current status of any Internet draft please check the
     "lid-abstracts.txt" listing contained in the Internet drafts shadow
     directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
     munnari.oz.au (Pacific Rim), ds.internic.net (US East coast) or
     ftp.isi.edu (US West coast). Further information about the IETF can
     be found at URL: http://www.ietf.org/

     Distribution of this document is unlimited. Editorial comments
     should be sent to the author (saveenr@microsoft.com).

     Abstract

     The Distributed Authoring and Versioning protocol [WEBDAV] defines
     simple mechanisms to assign and retrieve values for properties.
     This document presents requirements for a WEBDAV extension to
     support efficient searching for resources based on WEBDAV
     properties and content.   These requirements are intended to be the
     basis for the DAV Searching a Location (DASL) protocol.


     1 Introduction


     1.1 Existing DAV searching mechanisms


     INTERNET DRAFT      DAV Searching and Locating                       1


      DAV Searching and Locating                                 March 1998


     WEBDAV and HTTP provide support for client-side search, but not
     server-side search.  The GET method defined in [HTTP] allows
     clients to retrieve a resource’s content; the PROPFIND method
     defined in [WEBDAV] allows clients to retrieve a resource’s
     properties.  Having retrieved a resource’s properties and / or
     content, the client can compare them to its search criteria to
     determine whether the resource is of interest.


     1.2 Limitations of Client-side Searching


     Client-side searching requires no modifications to the server.
     However, simplicity for the server comes at a cost:

     (1)  It makes inefficient use of network resources. Clients must
          retrieve properties and content for each resource under
          consideration.

     (2)  It does not take advantage of server intelligence. Servers
          capable of searching can use sophisticated mechanisms to
          generate results: internal caching of intermediate search
          results, content-indexing, etc.

     Even simple, common queries may expose these limitations. Consider
     the query "find all text files modified during the last week.” When
     such a query is extended to a large number of clients searching
     against a single server, the limitations become more apparent.
     Client-side searching has difficulties scaling in these cases.


     1.3 Server-side Searching


     DASL allows for server-side searching. Server-side searching allows
     the client to formulate a query and have the server perform task of
     selecting the resources that fit the criteria. This overcomes both
     of the limitations of client-side searching described above. The
     benefit is a searching solution that scales; the cost is that the
     server software becomes more complex.


     2 Terminology


     2.1 DASL Terms


     2.1.1 Search Criteria


     INTERNET DRAFT      DAV Searching and Locating                       2


      DAV Searching and Locating                                 March 1998


     Search criteria are an expression against which each resource in
     the search scope is evaluated. Those resources for which the
     expression evaluates to True are included in the result set.

     2.1.2 Search Expression

     An Expression is a Term or the negation of an Expression (using the
     Boolean NOT operator) or two expressions joined by one of the
     Boolean operators (AND or OR). An expression evaluates to either
     True, False, or Unknown.

     2.1.3 Search Term

     A Search Term is an assertion about a resource. The term may assert
     that: (1) a property has a relationship to some value, (2) a
     property exists, or (3) the content of a resource has a
     relationship to some value.

     2.1.4 Result Set

     The Result Set is a response to a search request. This is a set of
     result records, one record for each resource that matches the
     search criteria.

     2.1.5 Result Record Definition

     The Result Record Definition is the set of properties specified by
     the client that it requests the server to transmit for each
     resource that matches the criteria.

     2.1.6 Result Record

     A unit of information appearing in the result set that corresponds
     to a resource that matches the search criteria. The record consists
     of those properties listed in the Result Record Definition.

     2.1.7 Search Scope

     The Search Scope is the set of resources to be searched.

     Comparison Operator

     A comparison operator is a function used in a search term that
     evaluates the relationship between two values. Examples of
     comparison operators are <, <=, >=, >, ==, and != .


     INTERNET DRAFT      DAV Searching and Locating                       3


      DAV Searching and Locating                                 March 1998


     2.1.8 Sort Specification

     A sort specification tells the server how to sort the result set.

     2.1.9 Search Attribute

     A Search Attribute is an instruction that governs the execution of
     the query but is not part of the search scope, result record
     definition, the search criteria, or the sort specification. An
     example of a search modifier is one that controls how much time the
     server can spend on the query before giving a response.

     2.1.10 Query

     The Query is the combination of search criteria, search scope,
     result record definition, sort specification, and search
     attributes.


     2.2 Additional Terms


     In addition to the terms defined above, this document uses
     terminology consistent with [HTTP] and [WEBDAV].


     3 Query Semantics


     3.1 General Requirements


     3.1.1 Simple Searches on Content

     It must be possible to perform simple searches on content of any
     media type.

     Searching for specific content inside a resource is a common
     operation. DASL must provide a mechanism to provide searching on
     content of a resource to provide for this scenario.

     3.1.2 Variants

     It must be possible for searches to occur across multiple variants
     of a resource and to target specific variants.

     The WEBDAV working group is addressing the standardization of
     mechanisms for authors to use when submitting variants to the
     server. DASL must provide mechanisms that can intelligently query
     on those variants.


     INTERNET DRAFT      DAV Searching and Locating                       4


      DAV Searching and Locating                                 March 1998


     3.1.3 Versioning

     It must be possible for searches to occur across multiple versions
     of resource and to target specific versions.

     The WEBDAV working group is addressing the standardization of
     mechanisms for authors to use when submitting versions to the
     server. DASL must provide mechanisms that can intelligently query
     on those versions.


     3.2 Result Record Definition


     The client must be able to identify the properties or content to be
     returned in the result records.

     Search criteria and search result records are not required to
     overlap. For example, a query might ask for "the authors of those
     documents under 10K in size". In this case, the criterion relates
     only to the size, but the desired result record contains only the
     author.


     3.3 Scope


     3.3.1 Scope Identification & Multiple Scopes

     It must be possible for the client to specify a number of
     different, unrelated URIs over which the search is to range.

     3.3.2 Resource-Based Scopes

     It should be possible to perform scoping within a resource. For
     example, one may wish to limit a search to a single chapter within
     a document.

     3.3.3 Depth

     It must be possible for the client to specify the "depth" of a
     search for a search scope URI.

     Users often intend to scope their searches either to the immediate
     children of a container or to extend the search recursively to the
     container's children. Furthermore, depth control is needed to
     prevent servers from performing unnecessary work.


     INTERNET DRAFT      DAV Searching and Locating                       5


      DAV Searching and Locating                                 March 1998


     3.4 Search Criteria


     3.4.1 Simple Terms

     3.4.1.1 Exact Matching

     A query term must be able to compare the entire value of a property
     to some constant value.

     3.4.1.2 Regular Expression Matching

     A query term must be able to compare a property to an expression
     with the expressive power of regular expressions.

     The power and frequent use of the UNIX utility GREP highlights the
     value of regular expressions for searching large bodies of content.

     3.4.1.3 Property Comparisons

     It must be possible to specify criteria on "equal to", and "not
     equal to" for all property values that can be compared. It must be
     possible to support relative comparison operators ( >, >=, <=, and
     < ) on those properties that can be ordered (for example, those
     having numerical values).

     Many common searches involve such comparisons. For example, a
     stereotypical query might ask for "those documents under 10K in
     size" or "those text files authored by Saveen".

     DASL must support the ability to compare property values against
     literal values, other property values, and expressions.

     3.4.1.4 Content Comparisons

     It must be possible to specify searches for content-based operators
     such as NEAR, IN, CONTAINS, LIKE.

     It must be possible to specify how linguistic stemming, phonetic
     searching, truncation, keyword expansion, and case-sensitivity will
     play a role in the search.

     It must be possible to specify the relevance and ranking criteria
     for content-based searches.


     INTERNET DRAFT      DAV Searching and Locating                       6


      DAV Searching and Locating                                 March 1998


     3.4.1.5 Existence Assertions

     It must be possible to test for the existence or non-existence of a
     property.

     3.4.2 Complex Expressions

     3.4.2.1 Logical Boolean Operators

     It must be possible to use the logical Boolean operators (AND, OR,
     NOT) in the search criteria to combine search expressions.

     Often criteria involve the evaluation of several conditions
     simultaneously. For example, a stereotypical query might ask for
     "those documents modified by user X within some period of time Y."
     Boolean operations are necessary to provide support these criteria.

     3.4.2.2 Undefined properties and values

     The behavior of a query when properties or their values are
     undefined must be specified.

     Undefined properties are those that do not exist. Their role in
     query evaluation needs to be specified. Undefined values can occur
     when properties are calculated from expressions like "x/y" where
     y=0.

     3.4.2.3 Sort Order

     DASL must define a mechanism to allow clients to specify a sort
     order for the result set.

     3.4.3 Other Query Attributes

     3.4.3.1 Maximum Result Rest

     It must be possible to indicate that the search result must not
     exceeded some fixed number of records.

     3.4.3.2 Paged Results

     It must be possible to request pages results.


     3.5 Query Syntax


     INTERNET DRAFT      DAV Searching and Locating                       7


      DAV Searching and Locating                                 March 1998


     3.5.1 Standard Query Grammar

     The DASL extensions must define a query grammar that provides
     simple searching functionality.

     For the sake of interoperability, DASL servers are expected to
     offer a basic set of searching capabilities. Likewise, clients need
     a standard, simple syntax by which to access those capabilities.

     3.5.2 Support for Other Query Grammars

     DASL extensions must allow servers to support other grammars.

     A particular query grammar may not expose useful searching
     functionality of a server. Clients should be allowed to query a
     server using any grammar that takes advantage of those special
     server capabilities.

     3.5.3 Natural Language Queries

     It must be possible to support natural language queries.


     3.6 Results


     3.6.1 Standard format

     DASL must define a standard format for search results.
     For the sake of interoperability, it is desirable that server
     result formats be standardized so that regardless of the type of
     query syntax used, clients are guaranteed to successfully
     understand the results of a query.

     3.6.2 Paged Results

     DASL search results must be conducive to paged retrieval.

     Paged retrieval is necessary if result sets are very large and if
     clients must also present a responsive interface to a user. In this
     scenario clients need to access portions of the search result at
     specific times. DASL search results must be defined so that paged
     search results are possible.


     3.7 Discovery Mechanisms


     INTERNET DRAFT      DAV Searching and Locating                       8


      DAV Searching and Locating                                 March 1998


     3.7.1 Grammar Discovery

     It must be possible for clients to discover which query grammars a
     server supports.

     If a server is capable of supporting several search grammars, the
     client needs to determine which grammars are supported.

     3.7.2 Operator Discovery

     It must be possible for a client to discover which operators are
     available for a given query grammar.

     3.7.3 Scope Information Discovery

     It should be possible for a client to determine searching
     information about a scope, if that information is available.
     Examples of such information includes information that reveals
     which properties can be searched in a scope, indexing statistics
     for the scope, etc.


     3.8 Redirecting a Query


     It must be possible for the server to refer the client to other
     resources in order to continue a search.

     For example, a client may ask the resource http://ren/stimpy to
     perform a search over http://foo/bar and http://blah/mumble.
     However http://ren/stimpy may not be able to perform the search
     itself and so will need to be able to inform the client that it
     should submit its search request directly to http://foo/bar and
     http://blah/mumble.


     3.9 Hit Highlighting


     DASL must define a mechanism to allow clients to request and
     receive "hit highlighting". Hit highlighting allows clients to
     provide visual cues to a user to identify segments in a text
     resource that cause them to match content-based queries.


     4 Authentication


     The DASL specification should state how the DASL extensions to
     WEBDAV interoperate with existing authentication schemes, and
     should make recommendations for using those schemes.


     INTERNET DRAFT      DAV Searching and Locating                       9


      DAV Searching and Locating                                 March 1998


     5 Access Control


     The DASL specification should state how the DASL extensions to
     WEBDAV interoperate with the ACL mechanisms supported by WEBDAV,
     and should make recommendations for using those schemes.


     6 Internationalization


     DASL extensions must describe how to perform searches on
     internationalized content and properties. Information intended for
     user comprehension must conform to the IETF Character Set Policy
     [CHAR].


     7 Related Work


     Z39.50: "Information Retrieval (Z39.50): Application Service
     Definition and Protocol Specification".
     http://lcweb.loc.gov/z3950/agency/

     Z39.50 Profile for Simple Distributed Search and Ranked Retrieval
     http://lcweb.loc.gov/z3950/agency/profiles/zdsr.html

     The STARTS Protocol
     http://www-db.stanford.edu/~gravano/starts.html

     The Harvest Information Discovery and Access System
     http://mordor.transarc.com/afs/transarc.com/public/trg/Harvest/


     8 References


     [CHAR] H.T. Alvestrand, "IETF Policy on Character Sets and
     Languages", June 1997, internet-draft, work-in-progress, draft-
     alvestrand-charset-policy-02.txt.

     [HTTP] R. Fielding, J. Gettys, J. C. Mogul, H. Frystyk, and T.
     Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2068,
     U.C. Irvine, DEC, MIT/LCS, January 1997.

     [WEBDAV] Y. Y. Goland, E. J. Whitehead, Jr., A. Faizi, S. R.
     Carter, D. Jensen, "Extensions for Distributed Authoring and
     Versioning on the World Wide Web", October, 1997, internet-draft,


     INTERNET DRAFT      DAV Searching and Locating                      10


      DAV Searching and Locating                                 March 1998


     work-in-progress, draft-ietf-webdav-protocol-04.txt.Authors'
     Addresses


     9 Author's Addresses


     Saveen Reddy
     Microsoft Corporation
     One Microsoft Way
     Redmond WA, 9085-6933
     EMail: saveenr@microsoft.com

     Judith Slein
     Xerox Corporation
     800 Phillips Road 105-50C
     Webster, NY 14580
     EMail: slein@wrc.xerox.com

     Expires July 1998


     INTERNET DRAFT      DAV Searching and Locating                      11