INTERNET DRAFT                        Clive Best, Dirk-Willem van Gulik
draft-vangulik-http-search-00.txt                 ISIS/STA/CEO - TP 270
Expires: 23/04/1997                         Joint Research Centre Ispra
                                                       21020 Ispra (Va)
                                                                 Italy.
			                   

              HTTP based Spatial and Temporal Searching.

Status of this Memo
===================

    This document is an Internet-Draft.  Internet-Drafts are working
    documents of the Internet Engineering Task Force (IETF), its
    areas, and its working groups.  Note that other groups may also
    distribute working documents as Internet-Drafts.
  
    Internet-Drafts are draft documents valid for a maximum of six
    months and may be updated, replaced, or obsoleted by other
    documents at any time.  It is inappropriate to use Internet-
    Drafts as reference material or to cite them other than as
    ``work in progress.''
  
    To learn the current status of any Internet-Draft, please check
    the ``1id-abstracts.txt'' listing contained in the Internet-
    Drafts Shadow Directories on ftp.is.co.za (Africa),
    nic.nordu.net (Europe), munnari.oz.au (Pacific Rim),
    ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast).

    This draft expires in six months.
  
Abstract:
=========

This draft specifies the first three levels of operation of an http 
based distributed search protocol. It is designed for parallel 
client-side searching  of geospatial  catalogues. An important design 
objective is to minimize the impact and extra resources for catalogue 
sites which already have existing WWW gateways and search interfaces. 
        
                         
Introduction
============

WWW interfaces to databases referenced by geospatial and temporal
attributes are a growing public resource for Environment, GIS and
Earth Observation.  Each provider has developed their own specialised
Web interface that best represents their particular database schema.
These databases are then searched by a Web Browser typically via a map
and time interface. The results are returned by generated html.

Experience with these interfaces suggests that the most sought-after
requirements of end-users for searching these databases are few in
number and relatively standard in form.  End-Users require to  search
for resources which relate to a spatial area, a certain temporal
coverage, and perhaps one or more keywords. This search-request is
typically repeated on different data collections at several,
operationally disjoint sites. Only if sufficient information is
found, is the user willing to explore the collection-specific 
retrieval facility.

The  HTTP based Spatial and Temporal Searching (HGS) protocol then
defines a mechanism whereby disparate remote databases can be searched
through a single standard  HTTP interface.  Making use of either HTTP
based GET or POST operations, several standard queries are defined. A
database with an HGS standard interface will become remote searchable
via any HTTP client, robot, or special user-agent. Easy deployment will
have  minimal impact on existing web-based query-and-retrieval
infrastructure. This has been one of the main design considerations.

The core of the protocol consists of three types of stateless
client-server interactions over HTTP: 'Advertise', 'Search' and
'Explain'. The 'Advertise' request yields information on the dataset
and services provided by an information provider; the 'Search' request
allows for actual searching; whereas 'Explain' provides a mechanism to
learn more about the database schema in use by a specific provider.
This information can be used to enhance the 'Search' beyond
the simple, and mandatory, spatial, temporal and keyword search
criteria. A 'Retrieve' request of some kind is NOT present. This is
intentional, as it is expected that an existing web interface is to be
pointed to by means of a URI. This simplifies the protocol; since
complicated things such as ordering, data-formats, post-processing,
access restraints and costing do not have to be taken into account

The aim of HGS is to allow the construction of general purpose
(Java) clients and user-agents.  Ultimately a user could specify the
search criteria, and then expect the user-agent to contact the
various collections, and present a single collated result set.
Once one or more promising resources are identified via URIs,
conventional technology is used to further interface to  the site. 
HGS does not aim to intervene in this part of the process, since 
it will be site dependent.

Service discovery will be supported by maintaining a register of
compliant servers, possibly distributed at each site. These registries
allow clients to configure themselves dynamically. This service layer
is defined as the 'Advertise' layer.

All servers must support the 'Advertise' layer even if they contain
just one reference to the self same service. However it is expected
that user and research communities will maintain lists of related
services. In addition one can envisage that a specialist client will
support a local user defined register of search sites customised
according  to given interests.

The above notion of service discovery borrows from the popularity of
'hotlists' and lists-of-lists.  The idea is that this level 0
information might be presented by the user-agent in a partially nested
or linked way. The user is than able to select certain services or
groups of services, and move these to a list of preferred servers.

The 'Search' function will be satisfied by all servers and allow  an
interoperable geographic/temporal search across all systems. An
additional, optional free text field will be supported. It is
envisioned that the returned information will allow the construction
of graphic coverage maps and footprints.

The 'Explain' function allows a server to publish their particular
database scheme for more targeted searching. All servers compliant
with the same model can be searched at level 2. Level 2
configuration is downloaded from the server via an Explain
request.

Architecture
============

Although the protocol definition allows for a very flexible 
infrastructure. The dogma that 'a client can do what it
wants' holds true. The design assumes that:

1. The client will do most of the work. It will contact the
   advertise and search services directly; and will be responsible 
   for any of caching. The cleint is responsible for result 
   collation and possible state handling.

2. The Advertise services do not nessesarily share the same
   machine as the Search services. Furthermore it is expected that
   user-communities might operate shared directory services or
   interconnected directories for specific datasets and resources.

3. The search service does not nessesarily share the same machine
   as the normal web-based retrival interface; a third party may
   also reverse engineer an retrival URL on an arbitrary web 
   catalogue.

4. Although the search request can be 'fuzzy'; the returned records
   do have accurate spatial and temporal coverage information; thus
   reducing; where possible, the need for repeated, refining, queries.
 

'Advertise'
===========

A directory of servers is accessed via a url. Servers should
publish and maintain the directory on a standard url construction,
such as http://fully.qualified.host.name/hgs.txt. Agencies can 
maintain master copies of directories and each server can 
maintain a local version for updating and access. Service discovery 
can therefore be distributed. A given physical http server can 
support more than one service. Each service should be specified 
by a subject keyword descriptor.

The information below is provided over the normal HTTP service
layer, including the MIME encapsulation of specific languages,
character sets and encoding. A server must be able to serve a
request for a version of this document in the latin-1 character
set, 8 bit encoding and the English language. 

The cache/mirror support of HTTP can be used by providers to
give an indication of the time-to-live, date of last modification
and or expiry date. Thus enabling a clever enough UserAgent to
be able to refresh the directory data when needed.

The content of the reply to this 'Advertise' request consists of an
entry describing the discovery service, followed by a series of
directory entries, separated by a blank line. The entry for the
discovery service is identical to the directory entry, except for
the omission of the URL entry.

Server Specific Information
---------------------------
Firstly the general collection/directory information itself is
presented  from the server.
The content of the collection node for directory discovery  is as
follows, each item is encoding using colon separated RFC822 field
value pairs.

Version	      The HGS protocol version understood by this
              directory service; major and minor version number.
              Changes in minor version number are expected to be
              backward and forward compatible (at a possible loss
              of functionality).

              Example: 1.00
              Mandatory, non-repeatable

Name          The name of this directory service. Clients must be
              able to cope with names of at least 70 symbols in
              length. However a client should be able to handle
              strings of arbitrary length.

              Example: The Wetland HGS Directory service
              Mandatory, non-repeatable


Description   A short description of this directory service and
              the community it addresses. Clients must be able to
              cope with a description of at least 4 kbyte in
              length.

              Example: A collection of services, with dataset
                    relating to wetlands, marches, estuaries  and
                    coastal zones, mostly in a ecology and bio-
                    diversity context.
              Optional, non-repeatable

SiteURI       A full URL to a more elaborate description of this
              directory service, the organization which operates it
              and a possible contact addresses. Repeatable entries
              are each to refer to pages which convey the same
              information.

              Example: http://www.ceo.org/wetland-hgs-service.html
              Mandatory, repeatable

Keyword       From a Controlled keyword list; for example as maintained 
 	      by the Global Change Master Directory (GCMD,
	      http://gcmd.gsfc.nasa.gov) These keywords should describe 
              the collective set of services and directories in the 
              list following.

              Example: GEOLOGY > BIODIVERSITY
              Optional, Repeated

Currently under consideration is the addition of Expiry and Time
to live information; and its position relative to similar information
already passed in the HTTP request.


Directory Information
---------------------------

A server then presents  the entries in the directory held at that
site, separated by a blank line.

The content of each entry in the directory  for the given
collection node is as follows; each item is encoded using colon
separated RFC822 field value pairs.



Version       The HGS protocol version, major and minor
              version number. Changes in minor version number are
              expected to be backward and forward compatible (at a
              possible loss of functionality).

              Example: 1.00
              Mandatory, non-repeatable

Name          The name of the service or the collection. Clients
              must be able to cope with names of at least 70
              symbols in length. However a client should be able
              to handle strings of arbitrary length.

              Example: BRSC Bird sightings, by location, time and
              stock migration
              Mandatory, non-repeatable

Description   A short description of the service or data
              collection. Clients must be able to cope with a
              description of at least 4 kbyte in length.

              Example: Processed Field data collected by the BRSC
                    service for the mainland of Germany, the Ems
                    Estuary, the Waddensea and the Skylle. With
                    confirmed sightings and ring identifier
                    numbers.
              Optional, non-repeatable

SetURI        A full URL to a more elaborate description of the
              resource, collection and organization responsible for
              the service pointed at by the 'URL'. Repeatable
              entries are each to refer to pages which convey the
              same information.

              Example: http://www.rbrc.org/field/desc.html
              Mandatory, repeatable

Keyword       From a controlled keyword list;(from the GCMD spec)
              best describing the service or dataset published at
              the URL above.

              Example: GEOGRAPHY > BIODIVERSITY > BIRDS
              Optional, repeatable, controlled list

SearchURI     The full URL of  the search interface(s) Repeatable
              entries are each to refer to pages which operate on
              the same information space;

              Example: http://fully.qualified.domain.name/cgi-
              bin/hgs-search.pl
              Mandatory, repeatable

Currently under consideration is the addition of administrative
information, such as a technical contact email address and date
of last modification, expiry and time to live.


Furthermore to allow the UserInterface to present more meaningful
displays; an additional 'Type' field could be made mandatory:

Type          The type of the resource described; such as a 'Collection',
	      'Dataset', 'Inventory' or 'List-of-List'.

              Example: DataSet
              Mandatory, repeatable, controlled list


Example: Collection Information
-------------------------------

Version: 1.00
Type: Collection
Name: Marine Environment Resources
  Description: The Marine Environment Unit aims to develop,
  demonstrate and validate methodologies for the use of data from
  space and airborne Platforms in both operational applications and
  scientific investigations related to the marine environment.
SiteURI: http://me-www.jrc.it

Example: Service Directory
---------------------------

Version: 1.00
Name: CORSA / Ocean Colour European Archive Network
Type: Dataset
Description: The Ocean Colour European Archive Network (OCEAN)
  Project, was established in 1990 as a co-operation between the
  Joint Research Centre (JRC) of the European Commission (EC), with
  the support of the EC Directorate General XI, and the European
SetURI: http://me-www.jrc.it/OCEAN/ocean.html
Keyword: CZCS
Keyword: Colour Scanner
Keyword: Algea
SearchURI: http://elect6.jrc.it/hgs/dbi.pl/corsa

Version: 1.00
Name: CORSA / Cloud and Ocean Remote Sensing around Africa
Description: The Cloud and Ocean Remote Sensing around Africa
  (CORSA) project aims to provide a quality controlled data set of
  surface, atmospheric and cloud parameters over a time period, and
  at a resolution, not available from any other source. The proj
SetURI: http://me-www.jrc.it/CORSA/index.html
Type: Dataset
Keyword: CORSA
Keyword: SST
Keyword: NOAA14
SearchURI: http://elect6.jrc.it/hgs/dbi.pl/ocean


Higher Levels
=============

'Search' and 'Explain' queries are to use standard HTTP GET and
POST requests; conveying values using the CGI/1.0 standard.

A number of field/value pairs in the request is defined for
all requests. The standard HTTP Accept-type, encoding and
language specifications are to be followed.

Request		Request type; denoting the level of the request.
		Must be one of 'Search', 'Explain'.
	
		Mandatory, NonRepeatable,ControlledValue

UserAgent	The requesting user agent; string followed by a version
		number.

		Example: GeoSpava 1.00 (Sol 2.4/X11)
		Mandatory, non-repeatable

Version		Version of the request protocol used,

		Example: 1.00
		Mandatory, non-repeatable

Upgrade		Client side request to upgrade to a different
		protocol version.

		Example: 1.02
		Optional, non-repeatable

The reply of the server is governed by the normal HTTP protocol
and status codes. It is stressed that HTTP/1.1 already allows 
for caching, proxying and access authorization. If the Content-type
of the reply is set to text/x-hgs; the reply is to be in rfc822
colon separated field/value pair format.

The following field/value pairs have a meaning across all levels:

Version       Version of the protocol used by the server when
              sending out the reply. Changes in minor version
              number are to be both backward and forward
              compatible; whereas major version are used to denote
              an incompatible change. A server should upgrade, if
	      the client has send a supported 'Upgrade' version. 

              Example: 1.00
              Mandatory, Non repeatable
 
Engine        Software used to carry out the request; name,
              followed by a version number. (see http spec for
              this, copy that bit) A server should support this.

              Example; HGSGeoTem 0.01beta
              Optional, Non-repeatable

Upgrade       Version number of a higher level protocol, for which
              the server is capable of handling requests. A client
	      can, if desired upgrade to this protocol level.

              Example: 1.09
              Optional, Non-repeatable

Comment       A message, optionally displayed to the user

              Example: Your search on the blabla returned 5 hits.
              Optional, repeatable


Search Request
==============

Up to three  search criteria can be imposed upon a search during
the request. In the reply the server indicates which of these
conditions was applied. A server, or the properties of the dataset
searched, might not support any, one or more of the limitations.
In this case the search is to continue as if that limitation was
not applied. The server MUST be able to cope with a client
breaking the connection when the number of records returned exceed
the clients resources.

In the most extreme case, when the user agent does not specify any
criteria, or when the server cannot apply any of the criteria,
all records are to be returned.

As the search request is a one-off stateless interaction;
discrepancies and inaccurate matching, conversions and comparisons
are to be expected. For this reason the three search criteria are
intentionally in-exact. This allows the server to return possibly
false positives, and it puts some of the burden for detecting this
upon the user-agent and the final user.  Unlike the more machine-
oriented exchange of the 'Explain' request, Human pattern recognition
and iterative refining is relied on. The user-agent application is
to be designed with such interaction in mind.

Type definition of criteria
---------------------------

Three datatypes are in use at the level-1 queries; for geospatial
coordinates, for time specifications and for partial substrings.

Servers and Clients must be able to handle floating point numbers
which have the fractional and integral part separated by a period 
as well as a comma, regardless of the local and/or language/charset
and encoding triple specified by HTTP. 

Servers and Clients should use the following format for all floating
point numbers.

	digit 	= <0|1|2|3|4|5|6|7|8|9>
	digits 	= < digit [digits] >
	E	= 'E'
	sep	= < . >
	sign 	= <+ | ->
	float 	= <[sign] digits [ sep [ digits ]] [ E [sign] <digits> ] >

In particular, no separation on the powers of thousand
is allowed; such as 10,000.00 .

Geospatial Coordinates (GC)
---------------------------

The format for the Geospatial coordinate is as defined in the FGDC 1994
standard Content Standards for Digital Geospatial Metadata, with the 
exception of the length of the integral part of the latitude of longitude 
( two or three digits).

Values for latitude and longitude shall be expressed as decimal
fractions of degrees.  Whole degrees of latitude shall be represented
by a two-digit decimal number ranging from 0 through 90.  Whole degrees
of longitude shall be represented by a decimal number
ranging from 0 through 180.  When a decimal fraction of a degree is
specified, it shall be separated from the whole number of degrees by a
decimal point.  Decimal fractions of a degree may be expressed to the
precision desired.

Latitudes north of the equator shall be specified by a plus sign (+),
or by the absence of a minus sign (-), preceding the 
designating degrees.  Latitudes south of the Equator shall be
designated by a minus sign (-) preceding the two digits designating
degrees.  A point on the Equator shall be assigned to the Northern
Hemisphere.

Longitudes east of the prime meridian shall be specified by a plus sign
(+), or by the Longitudes west of the meridian shall be designated by
minus sign (-) preceding the digits designating degrees.  A point
on the prime meridian shall be assigned to the Eastern Hemisphere.  A
point on the 180th meridian shall be assigned to the Western
Hemisphere.  One exception to this last convention is permitted.  For
the special condition of describing a band of latitude around the
earth, the East Bounding Coordinate data element shall be assigned the
value +180 (180) degrees.

Any spatial address with a latitude of +90 (90) or -90 degrees will
specify the position at the North or South Pole, respectively.  The
component for longitude may have any legal value.

With the exception of the special condition described above, this form
is specified in Department of Commerce, 1986, Representation of
geographic point locations for information interchange (Federal
Information Processing Standard 70-1):  Washington,  Department of
Commerce, National Institute of Standards and Technology.

Servers and Clients must be able to handle floating point numbers
which have the fractional and integral part separated by a period 
as well as comma, regardless of the locale and/or language/charset
and encoding triple specified by HTTP. 

Servers and Clients must use the specified float format for
all latitude and longitude formats.

Temporal Dimension (JF)
---------------------------

The temporal dimension is either as defined per rfc1123, as a Julian
date or relative in days. A relative day and a Julian date is expresses
as a Floating point number of arbitrary accuracy denoting the number of
days before (negative), or after (positive) the 14th of September 1752 
(for julian days), or the number of days before or after the current day,
i.e. the day the query was dispatched. Relative and Julian days have
a 'R' and a 'J' prefix. This prefix is not case sensitive.

Examples:

	J 0
	R -5.62E+8 (1.5 Million years ago)
        R -1 ( Yesterday )
        Wed, 02 Apr 1997 17:06:40 GMT

When the rfc1123 format is used, the zone should be UT or GMT, and 
the date-name is optional. Please note that rfc1123 specifies a four 
digit year (unlike rfc822).

Search Sub String (SS)
---------------------------

A partial search string, in the appropriate language and charset
as specified on the HTTP transport level.

The criteria names are not case sensitive. 

Spatial Limit
-------------

A spatial limit can be imposed on the records returned. In this
case each of the returned records must be partially within the
specified bounding box. The server may only apply this limitation
to records with which a spatial domain can be associated. For each
of the records to which spatial limitation was imposed, the
spatial coverage associated with the record should be returned;
thus allowing the user-agent to do subsequent processing.  It is
proposed that rectangles are defined in simple lat/lon co-
ordinates, with up to a tenth of a degree accuracy.

    latmin   GC  the resource(s) returned are to cover a
                 latitude equal or larger than
                 the latmin specified.
    latmax   GC  the resource(s) returned are to cover a
                 latitude equal or smaller than
                 the latmax specified
    lonmax   GC  the resource(s) returned are to cover a
                 longitude equal or larger than
                 the lonmin specified
    lonmax   GC  the resource(s) returned are to cover a
                 longitude equal or smaller than
                 the lonmax specified

    Each of the above is optional and non-repeatable.

Absent values, for any of the above fields are to be treated as
not-limiting in any way. Consequently if all values are absent, no
spatial limit is to be applied at all.


Temporal Limit
--------------

Time intervals are a pair of dates or Julian day numbers which define 
a temporal search interval. The server must be able to handle these
numbers up to a tenth of day accuracy. The client must be able to
cope with a search applied with less accuracy than specified in
the request. The implementation on the server must be designed
with this in-accuracy in mind; possibly at the expense of
returning false positives.

There are three criteria specifying date searching; please note
that, insofar as the service is concerned, the timespan associated
with a resource can effectively be a single point in time.

    date_after    JF  (part of) the timespan of the returned
                      records is after the date specified
    date_before   JF  (part of) the timespan of the returned
                      records is before the date specified
    date_on       JF  The date_on is within the timespan of the
                      returned records

This allows search of  the types on-a-date, before-a-date, after-a-
date and any combination; thus making ranges and partial ranges
possible. In particular the server should must make no assumptions
on which, or what combination of these three specifiers is
requested.

Free text limit
---------------------------

Additionally a search string can specify one or  more partial
substrings to be matched upon. This option is repeatable and non-
mandatory. Repeatable entries are to be used in parallel; i.e. a
record has to relate to, or contain one or more of the substrings
specified by the user agent.

	free	SS	Partial string.
			optional, repeatable

Standard http get and post requests will be supported. 


Request procedure
---------------------------

For each request; the 'Request' field must be set to 'Search'.

A)  HTTP get.  Example

    http://hgs.ceo.org/cgi/search.pl?latmin=-30&lonmin=30&
		latmax=-40&lonmax=40&date_on=27585.1203&Text=Geology&
		Request=Search&UserAgent=DraftEx+1.00

B)  HTTP Post. Example

    <form method='post' action=' hgs.ceo.org/cgi/search.pl'>
    <input name='Request' value='Search'>
    <input name='UserAgent' value='DraftEx 1.00'>
    <input name='Version' value='1.00'>
    <input name='Text' value='Geology'>
    <input name='latmin' value='-39'>
    <input name='latmax' value='30'>
    <input name='lonmin' value='-40'>
    <input name='lonmax' value='40'>
    <input name='date_on' value='27585.1203'>
    <input type='submit'>
    </form>

C)  HTTP Reply

Status of the reply is either 200, for results follow, or 404,
for nothing found, depending on the success of the search. All
other headers, as described in rfc2068, have their normal 
meaning. In particular a 401 reply might cause the UserAgent
to prompt for a username and password.

Replies such as 500 indicate a failure. The Normal MIME rules
for labeling the reply apply. The returned content type is
either text/html, text/plain, or text/x-hgs. Only the latter is
intended for machine parsing. Replies in html or plain text should
be forwarded to the user directly.

The content of the reply consist of a header and a set of entries;
each separated by a blank line. Each line contains a field value
pair, in a RFC822 colon separated encoding. Field names are not
case sensitive.


Header Fields
---------------------------

A number of header fields are mandatory; a few are optional;
primarily for user interface purposes. In addition to the normal 
reply headers; the following field/value pairs are 'Search'
request type specific.

Applied       List of search criteria which where applied
              successfully. Space separated, case in-sensitive.

              Example: latmin latmax date_on
              Mandatory, non repeatable

EntriesExpected

              A, possible not correct, number of entries
              likely to be returned by the server. A server should
              try to ensure that this number is accurate. But the
              client must not depend on this number to be correct.
              It must not be used as an upper limit.

              Example: 5
              Optional, non-repeatable


Example of a full header;

    Version:     	hgs/1.00
    Engine:          	GeoLite 0.01a
    Applied:     	text latmin latmax lonmin lonmax ton
    EntriesExpected:   	5
    Comment:     	Your search on the RCS database yielded 5 entries


Record Entries,
---------------------------

Record entries again follow the rfc822 colon separated field value
format; and are separated by a blank line. A server which is able
to apply a spatial or temporal limit should (or must?) confirm the
coverage of the records returned for at least those criteria
specified in the original search; with as much accuracy as
possible.

URI     	Universal resource identifier; such as a URN or a URL.

        	Example: http://server.company.org/cgibin/show.pl?1e441aef
        	Mandatory, Non-repeatable

Name    	Short descriptive name

        	Example: Wetlands Survey 1996, Alabama
        	Optional, non-repeatable

Description   	Short description of the record, clients must be
              	able to cope with up to 4k and ..

        	Example: Someblurp on etc, from the gcmd
        	Optional, non-repeatable

Coverage 	Spatial area related to the resource, 2 or 4 space
         	separated GC, with as much accuracy as possible. A server
         	should supply this; especially when it was able to
         	effectuate one or more spatial criteria. If the entry is
         	repeated, each of the sets should fit the criteria
         	applied. An absent or criteria is denoted by a '*',
         	asterisk.

        	Example: 12 33 33 44
        	Optional, repeatable

OtherCoverage 	Any spatial coverage related to the resource, not
		relayed in the Coverage field

        	Example: 12,33 33,44
        	Optional, repeatable

Period   	Temporal  range or point related to the resource, 1 or 2
         	space separated JF, with as much accuracy as possible. A
         	server should supply this; especially when it was able to
         	effectuate one or more temporal criteria. If the field is
         	replated; each of the repeated entries should fulfill the
         	criteria applied.

         	Example:  1112.33 1198.11
        	Optional, repeatable

OtherPeriod   	Any temporal ranges or points not relayed in the
		Period field.

       		Example: 12,33 33,44
        	Optional, repeatable

Example;
    
    Name:        Wetlands in eastern Alabama
    URI:         http://ala.www.edu/sand.html
    Coverage:     12.33 13.44 44.12 34.23
    Period:      123.34 125.12


'Explain' Request
=================

The 'Explain' is an optional object description level for a given
server. It allows a server to define an ordered list of locally
defined searchable object types and their associated attributes.

Thus Level 2 allows customisable local attributes to be defined.
Using this configuration information the  client software should
therefore configure the search interface accordingly.

The actual attribute ID numbers used, should be standardised up
to a certain extend; especially in user communities with similar
database schema's. More work will be done in this area.

The format is best illustrated in the following example

Attribute ID   Parent ID        Object Name      Guide
100            0 (means root)   Type             http://hgs.ceo.org/type.html
101            100              User             http://hgs.ceo.org/usr.html
102            100              Organisation     http://hgs.ceo.org/org.html
103            100              Product          http://hgs.ceo.org/ps.html
104            103              Software         http://hgs.ceo.org/ps.html
105            103              Course           http://hgs.ceo.org/educ.html
104            103              Dataset          http://hgs.ceo.org/ps.html


Client request
--------------

A client requests a level 2 configuration using an http request as
defined for level1, but with the 'Request' field set to 'Explain'.

Server Response
--------------

The server responds with a normal header; followed by entries separated
by a blank line. The following fields are defined

AID		Attribute ID, unique sequence of digits
		Mandatory, non-repeatable

PID		Parent ID, unique sequence of digits, or a '0'
		Optional, repeatable

Object		Object name
		Mandatory, non-repeatable

Guide		URL
		Optional, non-repeatable.

Example:

Reply Content
Version: hgs/1.00
Applied: text explain

AID:100
PID:0
Object: Type
Guide: http://hgs.ceo.org/type.html
AID:101
PID:100
Object: User
Guide: http://hgs.ceo.org/user.html

AID:102
PID:100
Object: Organisation
Guide: http://hgs.ceo.org/type.html

AID:103
PID:100
Object: Product
Guide: http://hgs.ceo.org/product.html

AID:104
PID:103
Object: Software
Guide: http://hgs.ceo.org/product.html

AID:105
PID:103
Object: Educational Course
Guide: http://hgs.ceo.org/course.html

AID:106
PID:103
Object: Dataset
Guide: http://hgs.ceo.org/product.html


Customised searching
==============================

Upon receipt of 'Explain' configuration data, the client interface
should be configured dynamically  for example by the use of pull
down menus. The interface should allow users to select one or more
object types. These will then be used for subsequent level 1
searches to that server.

The selected object types will be appended to a level 1 search.

GET
    http://hgs.ceo.org/cgi/search.pl?latmin=-30&lonmin=30&latmax=-
    40&lonmax=40&Ton=27585.1203&Text=Geology&Object=101,104,106&
    Request=Search&Version=1.00&UserAgent=JavaGot+1.00

POST

<Object=101>
<Object=104>
<Object=106>

   Selected objects include all child objects.
   Servers not supporting level 2 ignore all Object definitions.


Implementations
===============

An HGS interface to a database will be implemented using CGI
scripts. These can be expected to be similar and developed using a
standard stub. Existing Web gateways to databases are unaffected.
All that is required is an add on CGI gateway which supports HGS.

Multiple server searching at level 1 will be  possible. Thus a
distributed search can be made by the client across several
servers contained within a given server directory. In this case
result collation at the client side is necessary.

The level 0 reply typically consists of a simple text file in
a directory 'hgs' under the server root and/or AliasMapping 
directives in the server setup

Scalability
===========

Issues of scale are not addressed, in particular broad searches
yeilding thousands of hits are potentially possible; and will 
be a pose a serious challenge for User Agent implementors. More work
will be done in this area.

However it is stressed that; because of the mandatory use of complete
URLs on all levels; query interfaces can be distributed; even in
one collection. Furthermore support for mirroring, caching and duplication
of services is potentially avaible; but as 'a client can do what it whats'
it is as yet unclear how to effectuate.
   
Security Implications
=====================
    
Security implications are not address; nor are they well understood.
More work is to be done in this area.


Acknowledgements
================

The development of HGS has benefitted from the ideas of and discussions 
with Zac Bjelirigc, WebBridges Srl, from the work done by CEONet, Canada 
and presentations at 1996 workshop organised by the CEOS (Commitee on 
Earth Observation Satellites) WGISS (Working Group on Information Systems 
and Services) WWW task team. Michael Kleih implemented and tested some 
early client applications written in Java. Ladson Hayes provided remote 
sensing specific information and proof-read this document. 

The work has been carried out in part for the Centre for Earth Observation,
of Space Applications Institute by the Software Technologies and automation
unit of the Institute for Systems, Informatics and Safety; both at the
Joint Research Centre Ispra of the European Communities.


Contacts
========

	URL: 		http://www.ceo.org/hgs/index.html
	Mailinglist: 	hgs@harp.gsfc.nasa.gov (Majordomo)

	Clive Best	           Clive.Best@jrc.it
	Dirk-Willem van Gulik      Dirk.vanGulik@jrc.it

	ISIS/STA/CEO - TP 270
	Joint Research Centre Ispra
	21020 Ispra (Va)
	Italy.

	Phone: +39 332 78 9549 or 5044	
	Fax: +39 332 78 9185



draft-vangulik-http-search-00.txt                       Expires: 23/04/1997