Internet DRAFT - draft-hernacki-nntpsrch

draft-hernacki-nntpsrch



HTTP/1.1 200 OK
Date: Tue, 09 Apr 2002 00:21:31 GMT
Server: Apache/1.3.20 (Unix)
Last-Modified: Mon, 07 Oct 1996 22:18:00 GMT
ETag: "304c9f-5cd5-32598198"
Accept-Ranges: bytes
Content-Length: 23765
Connection: close
Content-Type: text/plain







INTERNET-DRAFT                                             B. Hernacki
Expires: April 4, 1997                                         B. Polk
<draft-hernacki-nntpsrch-00.txt>         Netscape Communications, Inc.
                                                       October 4, 1996


                   NNTP Full-text Search Enhancements



1.  Status of this Memo

This document is an Internet-Draft.  Internet-Drafts are  working  docu-
ments  of the Internet Engineering Task Force (IETF), its areas, and its
working groups.  Note that other  groups  may  also  distribute  working
documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum  of  six  months
and  may  be  updated,  replaced, or obsoleted by other documents at any
time.  It is inappropriate to use Internet- Drafts as reference material
or to cite them other than as ``work in progress.''

To learn the current status of  any  Internet-Draft,  please  check  the
``1id-abstracts.txt''  listing  contained in the Internet- Drafts Shadow
Directories on ds.internic.net (US East Coast), nic.nordu.net  (Europe),
ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific Rim).

2.  Abstract

This document describes a set of enhancements to the Network News  Tran-
sport  Protocol [NNTP-977] that allows full-text searching of news arti-
cles across multiple newsgroups.

This new search mechanism also allows search criteria to be  saved  into
search  profiles.   Articles  arriving on the server are checked against
the profiles, and the articles that match are collected together for the
client.

The availability of the extensions described here will be advertised  by
the  server  using  the extension negotiation-mechanism described in the
new NNTP protocol specification currently being developed [NNTP-NEW].










Hernacki & Polk                                                 [Page 1]





INTERNET-DRAFT                                           October 4, 1996


3.  Introduction

The new SEARCH NNTP command is sent from the client to specify and  ini-
tiate  a  full-text search.  The server constructs a "virtual newsgroup"
consisting of articles that matched the search  criteria.   The  virtual
newsgroup  acts  in  most  ways like a normal newsgroup, allowing access
through the standard NNTP commands.

The new PROFILE command makes a virtual newsgroup permanent,  and  saves
the  search criteria that generated the newsgroup.  The server will show
newly arrived articles that match the search criteria as new articles in
the  virtual newsgroup.  This can be implemented on the server by reexe-
cuting the search periodically or by  using  a  profile  mechanism  that
checks each article as it arrives.

Because the virtual newsgroup usually consists  of  articles  from  many
other  newsgroups,  clients  might want to display it differently than a
non-virtual newsgroup.  For example, clients may  want  to  display  the
source  newsgroup  of each article.  To make this easier, and to resolve
some of the longstanding problems with XOVER, the OVER command is intro-
duced.

To control the headers returned by the OVER command, and  to  allow  the
client  and  server to communicate information that does not fit through
other channels, the SET and GET commands have been  added.   SET  allows
the  client  to  send an attribute/value pair to the server.  GET allows
the client to retrieve an attribute/value pair by attribute name.

In addition, the XPAT command is extended so that  it  can  be  used  to
full-text  search  articles within a single newsgroup.  Both the headers
and the body of the articles are searched.

3.1.  New and Enhanced NNTP Commands

There are five new NNTP commands, three new options to the existing LIST
command, and enhancements to one existing command.

*    GET

*    SET

*    OVER

*    SEARCH

*    PROFILE

*    LIST SRCHHEADERS



Hernacki & Polk                                                 [Page 2]





INTERNET-DRAFT                                           October 4, 1996


*    LIST SEARCHES

*    LIST XACTIVE

*    XPAT

The GET and SET commands communicate per-session information between the
client and server.

The OVER command returns specific headers requested by the client.  This
command functions much like the widely implemented XOVER command.

The SEARCH command runs a one-time search.

The PROFILE command converts search  results  into  saved  profiles  and
manipulates them.

The LIST SRCHHEADERS command returns the headers that the server  allows
in full-text searches.

The LIST XACTIVE command functions in most ways  like  the  LIST  ACTIVE
command.  It  is  different because it can be made to return information
about a single newsgroup, and it supports new newsgroup  flags  for  the
virtual  newsgroups.   It  also  can return multpile newsgroup flags per
newsgroup.

The LIST SEARCHES command allows the client  to  determine  which  news-
groups  are  full-text  indexed.   Only  these  newsgroups are full-text
searchable.

The XPAT command has a simple extension  to  allow  the  header  "TEXT".
This  specifies a full-text (headers and body) search of the articles in
a single newsgroup.

4.  Use of NNTP Extension Mechanism

The NNTP extension mechanism allows a server to describe  its  capabili-
ties.   The  following  extensions are used to describe the capabilities
described in this document.

4.1.  SETGET Extension

The SETGET extension means that the server supports the SET and GET com-
mands.

4.2.  OVER Extension

The OVER extension means that the server supports the OVER command.   In



Hernacki & Polk                                                 [Page 3]





INTERNET-DRAFT                                           October 4, 1996


addition,  any server that supports the OVER extension must also support
the SETGET extension, and must explicitly include SETGET in the list  of
extensions it supports.

4.3.  SEARCH Extension

The SEARCH extension means that the server supports the  following  com-
mands:  SEARCH, LIST SEARCHES, LIST SRCHHEADERS, LIST XACTIVE.  In addi-
tion, any server that supports the SEARCH extension  must  also  support
the  OVER  and  SETGET  extensions, and must explicitly include OVER and
SETGET in the list of extensions it supports.

4.4.  PROFILE Extension

The PROFILE extension means that the server supports  the  PROFILE  com-
mand.   In addition, any server that supports the PROFILE extension must
also support the SEARCH, OVER, and SETGET extensions,  and  must  expli-
citly include these extensions in the list of extensions it supports.

4.5.  XPATTEXT Extension

The XPATTEXT extension means that the server supports the TEXT header in
the XPAT command, as described by this document.

5.  Command Descriptions

5.1.  GET command

GET [ATTRIBUTE [ATTRIBUTE]...]

GET allows the client to  retrieve  session-specific  state  information
from the server.

The only characters allowed in attributes or values  are  uppercase  and
lowercase letters, numbers, and the characters "-_:". Case is not signi-
ficant in the attribute names.  This information must not  be  preserved
by the client across server sessions.

If no ATTRIBUTE is specified, all of the attributes are returned by  the
server.

5.2.  Responses

The server will either return the values (209), indicate a syntax  error
(501), or indicate that the attribute was not recognized (409).

209 values follow
501 command syntax error



Hernacki & Polk                                                 [Page 4]





INTERNET-DRAFT                                           October 4, 1996


409 unknown attribute

5.3.  Example

C: GET
S: 209 values follow
S: OVERFIELDS Subject:Newsgroups:From:References:Lines:Bytes:
S: .

5.4.  OVER command

OVER [range]

The optional range argument may be any of the following:
                an article number
                an article number followed by a dash to indicate
                   all following
                an article number followed by a dash followed by
                   another article number

If no argument is specified, then information from the  current  article
is displayed.   Successful responses start with a 224 response, followed
by a line listing the headers, followed by the overview information  for
all  matched messages.  Once the output is complete, a period is sent on
a line by itself.  If no argument is specified, the information for  the
current  article  is  returned.  If a newsgroup has not been selected, a
412 error response is returned. If no articles are in the  range  speci-
fied,  a  420 error response is returned. If the client only has permis-
sion to transfer articles, a 502 response will be returned

By default, the headers returned are as specified  in  the  OVERVIEW.FMT
file,  and  will therefore be the same as the server would return for an
XOVER command.

The SET command may be used to specify what headers are returned and  in
what order. The SET attribute OVERFIELDS is used to specify the names of
the headers to return, with the headers concatenated together, including
the terminating ":".

This use of SET for the OVERFIELDS attribute  must  be  supported.   The
server  must honor this request and return only the headers specified in
subsequent OVER commands in that session.

The number of lines in the article is available  in  the  Lines:  field.
The number of bytes in the article is available in the Bytes: field.






Hernacki & Polk                                                 [Page 5]





INTERNET-DRAFT                                           October 4, 1996


5.5.  Responses

224 data follows
412 not in group
420 no articles in range
501 command syntax error
502 no permission

5.6.  Example

C: SET OVERFIELDS Subject:From:Lines:
S: 209 OK
C: OVER
S: 224 data follows
S: Subject:From:Lines:
S: Re: Long runing subjects/tfrequent-poster@somewhere.com/t593
S: .

5.7.  SEARCH command

SEARCH <query>

The specified query is executed, and the name of the  resulting  virtual
newsgroup is returned.

Search result virtual newsgroups are not  permanent.   The  server  must
keep  them  for at least ten minutes after the last client access to the
newsgroup, but after that time the server is free to remove them.   This
ten  minute  period  must be observed even if the client terminates it's
session with the server.  "Access to the newsgroup" is defined  to  mean
any  command  executed while the virtual newsgroup was the current news-
group.

The query is the full-text  search  criteria  expressed  in  the  syntax
described below.

5.7.1.  Search Syntax

The search query syntax is derived from the search  syntax  defined  for
the  IMAP4 protocol.  It is somewhat different because of the way inter-
national character sets need to be encoded.  See RFC  1730  [IMAP4]  for
the IMAP4 search syntax.

One exception defined by this RFC to the 7bit character set  restriction
for  commands in [NNTP-977] is that the 8bit ISO-8859-1 character set is
allowed in unencoded form in search strings.  This is allowed because it
simplifies  handling  this  widely used character set, without requiring
support of arbitrary binary data.



Hernacki & Polk                                                 [Page 6]





INTERNET-DRAFT                                           October 4, 1996


Here is a semi-formal definition of the search query syntax.

query = HEADER Newsgroups <group_pat> <search_term> [<search_term>...]

group_pat       = "<group_specifier>[,group_specifier...]"

group_specifier = Either a single * for all searchable groups,
                  a full newsgroup name, or a part of the news
                  hierarchy, suffixed with .*.

search_term     = TEXT <search_string> |
                  HEADER <header_line> <search_string> |
                  SENTBEFORE date |
                  SENTON date |
                  SENTAFTER date |
                  NOT <search_term> |
                  OR <search_term> <search_term> |
                  ( <search_term> )

search_string   = "<simple_string>" |
                  "<MIME-2String>"

date            = Date in DD-MMM-YYYY form.

simple_string   = US-ASCII or ISO-8859-1 text.
MIME-2String    = A MIME-2 encoded string.

The double quotes are always required around the group pattern  and  the
search strings.

BODY requests a search through the body of the  article,  excluding  the
headers.

TEXT requests a search through all indexed parts of the article, includ-
ing the body and all indexed headers.

If multiple search_terms are listed without being  prefixed  by  the  OR
operator, they are ANDed together.

SENTBEFORE, SENTON, and SENTAFTER may only be used if the  Date:  header
is indexed, as specified by the LIST SRCHHEADERS command.

The searches should be case insensitive.

5.7.2.  Query Examples

SEARCH HEADER Newsgroups "comp.*, alt.*" BODY "nntp" SENTAFTER 25-DEC-1995
SEARCH HEADER Newsgroups "comp.*" HEADER From "Salz" NOT HEADER From "Bob"



Hernacki & Polk                                                 [Page 7]





INTERNET-DRAFT                                           October 4, 1996


SEARCH HEADER Newsgroups "*" BODY "Election" ( OR TEXT "Bob" TEXT "Bill" )
SEARCH HEADER Newsgroups "comp.lang.c++" TEXT "=?ISO-8859-1?Q?QPtext?="

5.8.  Responses

A successful search returns the name of a newsgroup in which the  server
has  placed  the  results.   This newsgroup can then be treated like any
other non-postable newsgroup. If no articles  matched  the  search  cri-
teria, an error (460) is returned.

260 groupname
460 no matches found
462 error performing search
501 command syntax error

5.9.  Example

C: SEARCH header newsgroups "*" TEXT "internet"
S: 260 virtual.group.temp5423

5.10.  PROFILE command

PROFILE NEW [profilenamehint] | RET | DEL

The PROFILE subcommands specify what operation to perform:

NEW creates a new profile from the current search result.
RET returns the search criteria of a profile.
DEL deletes a profile.

5.10.1.  NEW Subcommand

NEW converts a SEARCH result group into a profile.

The profilenamehint is used by the server as part of  the  name  of  the
newsgroup.   The  client  must not make any assumptions that any part of
the name hint will be used.  The name hint  must  be  32  characters  or
less,  and  consist  of  valid newsgroup name characters, except that no
"."s are allowed in the profilenamehint.

5.10.2.  RET Subcommand

RET retrieves the QUERY field stored on the server for the current  pro-
file newsgroup.

5.10.3.  DEL Subcommand

DEL deletes the current profile newsgroup.  This command also  indicates



Hernacki & Polk                                                 [Page 8]





INTERNET-DRAFT                                           October 4, 1996


that  the  group should be deleted, although the server does not have to
delete it immediately.  The server must clear the current group context,
so that no commands that require a group context can be done.

5.11.  NEW Subcommand Responses

If the profile newsgroup is  created,  the  260  response  is  returned,
including  the  name  of the new newsgroup.  If there's no current news-
group, the error response 412 is returned.   If  the  current  newsgroup
isn't  a  search  result  virtual  newsgroup,  the 461 error response is
returned.

5.12.  RET Subcommand Responses

If the PROFILE RET is successful, the 261 response is returned,  includ-
ing  the  criteria.  If there's no current newsgroup, the error response
412 is returned.  If the current newsgroup isn't a profile virtual news-
group, the 461 error response is returned.

5.13.  DEL Subcommand Responses

If the PROFILE DEL is successful, the 260 response is returned,  includ-
ing  the  name  of the deleted virtual newsgroup.  If there's no current
newsgroup, the error response 412 is returned.  If the current newsgroup
isn't a profile virtual newsgroup, the 461 error response is returned.

5.14.  Responses

260 groupname
261 returned search criteria
412 not in group
461 current group is not a correct virtual newsgroup
462 profile error
501 command syntax error

5.15.  Example 1 - Create New Profile

C: SEARCH header newsgroups "comp.*" TEXT "fortran"
S: 260 virtual.search.temp3254
C: GROUP virtual.search.temp3254
S: 211 103 402 504 virtual.search.temp32
C: PROFILE NEW myprofile
S: 260 virtual.profile.myprofile

5.16.  Example 2 - Return Profile

C: GROUP virtual.profile.myprofile
S: 211 103 402 504 virtual.profile.myprofile



Hernacki & Polk                                                 [Page 9]





INTERNET-DRAFT                                           October 4, 1996


C: PROFILE RET
S: 261 TEXT searchstring

5.17.  Example 3 - Delete Profile

C: GROUP virtual.profile.myprofile
S: 211 103 402 504 virtual.profile.myprofile
C: PROFILE DEL
S: 260 virtual.profile.myprofile deleted

5.18.  SET command

SET ATTRIBUTE <value> [ATTRIBUTE <value> ...]

SET allows the client to set session specific state  information.   This
might include things like what language it wants to use, what version of
the protocol it wants, what type of authentication it will be using,  or
optional  article  compressions.   The only characters allowed in attri-
butes or values are upper and lower case letter, number, and the charac-
ters "-_:". Case is not significant in the attribute names.  This infor-
mation must not be preserved by the server across client sessions.

If multiple attributes are specified and the server does  not  recognize
one or more of them, it must return an error and not set any of them.

5.19.  Responses

The server will either return that it set the value (209), return a syn-
tax  error (501), or indicate that one or more of the attributes was not
recognized (409).

209 OK
501 command syntax error
409 unknown attribute

5.20.  Example

C: SET LANG USEnglish
S: 209 OK

5.21.  LIST SRCHHEADERS

LIST SRCHHEADERS

Returns a list of which headers can be  specified  in  full-text  search
queries on the server.





Hernacki & Polk                                                [Page 10]





INTERNET-DRAFT                                           October 4, 1996


5.22.  Responses

Returns a list of headers, one per line.  A "." on  its  own  line  ter-
minates the list.


5.23.  Example

C: LIST SRCHHEADERS
S: 215 Data follows.
S: From:
S: Date:
S: Subject:
S: .

5.24.  LIST SEARCHES

LIST SEARCHES

Returns a list of strings that define which newgroups are being  indexed
by  the  news server and are thus available for searching.  In addition,
the character sets allowed for each group is returned.


5.25.  Responses

When there are newsgroups indexed it will return 215, followed  by  each
portion  of the tree that is indexed.  If all groups are indexed, a line
with "*" is returned.  If only some parts of the newsgroup hierarchy are
indexed, they are identified in the form <indexed-hierarchy>.*.  Clients
should not assume that these will always be top  level  hierarchies.   A
"." on its own line terminates the list.

The character sets allowed in full-text searches for each entry is  also
returned.   The  character sets are identified by the name as defined in
[MIME-1].


5.26.  Example

C: LIST SEARCHES
S: 215 Data follows.
S: alt.* US-ASCII
S: comp.lang.* US-ASCII ISO-8859-1 ISO-8859-2
S: mcom.* ISO-8859-1
S: .





Hernacki & Polk                                                [Page 11]





INTERNET-DRAFT                                           October 4, 1996


5.27.  LIST XACTIVE

LIST XACTIVE [newsgroup]

The LIST XACTIVE command functions in most respects like the LIST ACTIVE
command.  It differs in the following ways:

First, multiple flags may  be  returned.   The  flags  are  concatenated
together.

Second, LIST XACTIVE allows two new flags to be returned,  "s"  or  "p",
indicating  a  search results virtual newsgroup or profile virtual news-
group, respectively.  In both these cases the "n" or "y"  flag  is  also
set, indicating whether the virtual group can be posted to.  So the flag
field in the response line for a search result virtual  group  that  can
not be posted to will be "ns".

Third, other flags may be added in  the  future.   Clients  must  ignore
flags they do not recognize.


5.28.  Responses

The responses are exactly the same as the LIST  ACTIVE  command,  except
for the new flags.


5.29.  Example

C: LIST XACTIVE virtual.guest.temp3453
S: 215 Newsgroups in form "group high low flags".
S: virtual.guest.temp3453 0000000000 0000000001 ns
S: .

5.30.  XPAT command enhancement

XPAT header range|<message-id> pat [pat...]

The XPAT command is enhanced in a simple way:  The new value  TEXT  will
be  supported  as  a  header when invoking the command.  The TEXT header
requests a full-text search the body and all headers  of  the  specified
articles.

When TEXT is specified for the header, only a single "pat"  is  allowed,
and  it must be a full word to search for, rather than a wildmat pattern
as allowed otherwise.





Hernacki & Polk                                                [Page 12]





INTERNET-DRAFT                                           October 4, 1996


5.31.  Responses

If TEXT isn't specified as the header, the response is the  same  as  it
always  has  been for XPAT, with each result line containing the article
number and the value of the header that matched the pattern.

If the TEXT header is specified, the constant string "TEXT" is  returned
in place of the value of the header that matched the pattern.


5.32.  Example

C: XPAT TEXT 1000-2000 searchtext
S: 221 Header follows
S: 1021 TEXT
S: 1024 TEXT
S:.

6.  Security Considerations

The search and profile commands must be implemented in a way  that  does
not  allow  access  to articles in newsgroups that a client is otherwise
restricted from reading due to access control rules.

Clients will in some cases want to control access to virtual  newsgroups
or  profiles.  No means to support this kind of protection is defined in
this document, as it requires access control infrastructure that is  not
currently defined for NNTP.

The OVER command should be treated the same as  the  XOVER  command  for
access control and security purposes.

The other commands do not introduce any new security issues.

7.  Bibliography

[NNTP-977]
     Network News Transfer Protocol.  B. Kantor, Phil  Lapsley,  Request
     for Comment (RFC) 977, February 1986.

[NNTP-NEW]
     Network News Transfer Protocol.  S.  Barber  INTERNET  DRAFT,  Sep-
     tember 1996.

[IMAP4]
     IMAP4 INTERNET MESSAGE ACCESS PROTOCOL -  VERSION  4.   M  Crispin,
     Request for Comment (RFC) 1730, December 1994




Hernacki & Polk                                                [Page 13]





INTERNET-DRAFT                                           October 4, 1996


[MIME-1]
     Borenstein N., and  N.  Freed,  MIME  (Multipurpose  Internet  Mail
     Extensions) Part One:  Mechanisms for Specifying and Describing the
     Format of Internet Message Bodies, RFC  1521,  Bellcore,  Innosoft,
     September 1993.

[MIME-2]
     Moore, K., MIME (Multipurpose Internet Mail Extensions)  Part  Two:
     Message  Header Extensions for Non-ASCII Text, RFC 1522, University
     of Tennessee, September 1993.

8.  Author's Address

   Brian Hernacki
   Netscape Communications, Inc.
   685 W. Middlefield Road
   Mountain View, CA  94043
   USA

   Phone: +1 415-937-6738
   Email: bhern@netscape.com

   Ben Polk
   Netscape Communications, Inc.
   685 W. Middlefield Road
   Mountain View, CA  94043
   USA

   Phone: +1 415-937-3686
   Email: bpolk@netscape.com

                  This Internet Draft expires  April 4, 1997.



















Hernacki & Polk                                                [Page 14]