IRTF-ASRG-IAR                                              A. C. Howe
Internet-Draft                                            A. Lorenzen
Expires: 11 Feburary 2005                                    P. Paler                                                        
                                                        D. J. Balling
                                                       11 August 2004


                  Server Index Query (SIQ) Protocol
                 draft-irtf-asrg-iar-howe-siq-00.txt


Status of this Memo

   By submitting this Internet-Draft, I certify that any applicable
   patent or other IPR claims of which I am aware have been disclosed,
   or will be disclosed, and any of which I become aware will be
   disclosed, in accordance with RFC 3668.

   By submitting this Internet-Draft, I accept the provisions of Section
   4 of RFC 3667.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as 
   Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than a "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/1id-abstracts.html

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html
   
   
Abstract

   The Server Index Query (SIQ) protocol is intended to provide a
   standard means by which a mail exchange (MX) server can query one or
   more external services for scoring based on facts or reputation of an
   IP/domain pair. This document specifies the communication protocol
   used to transmit the IP/domain query and return the query response.
   The implementation, correctness of results, and/or management of SIQ
   servers is beyond the scope of this document.


Table of Contents

   1. Introduction
      1.2 Terminology
   2. Overview
   3. Query & Response UDP
      3.1 UDP Query Format
      3.2 UDP Response Format
   4. Query & Response HTTP
      4.1 HTTP Query Format
      4.2 HTTP Response Format
   5. Rationale
      5.1 About Query Type
      5.2 Use of IPv4-compatible IPv6 address format
      5.3 HTTP 1.0 and 1.1
   6. Security Considerations
   7  IANA Considerations
   8. Normative References
   9. Informational References
   10. Authors' Addresses
   Copyright Statement
   Disclaimer
   Acknowledgment
   Expires


1. Introduction

   The proposed SIQ protocol is intended to provide a light, reliable
   way for an inbound email server to query a "reputation" service and
   receive a useful response. Because an IP/domain pair is included in
   SIQ protocol queries, the response may score the IP network, domain
   ownership, and the quality of the relationship (denied, affirmed,
   inferred, undetectable) between the IP and domain.

   A variety of anti-forgery techniques have been proposed in recent
   years. However, many proposals require the domain owner to announce
   which outbound servers he authorizes, without third-party
   verification. This leaves open the possibility for abusive senders to
   achieve the same status as non-abusive senders, by making use of
   their own domains. Most of these proposals foresee the need for
   external reputation systems to close the abuse loophole. The SIQ
   protocol is put forth as a protocol for inbound servers to use in
   communicating with such reputation services or systems.

   One possibility niche for SIQ protocol servers is to do the heavy
   lifting - discovering, caching and collating all that can be divined
   from knowing an IP, a domain name, and the relationship of the IP to
   the domain name for the purposes of sending emails. The result for
   any particular IP/domain pair query may then be efficiently returned
   in composite form to the inbound server (SIQ protocol query client).

   The basis for the "reputation" scoring may be objective factors such
   as longevity, stability, and identifiability. Some reputation
   services may choose subjective factors such as judgements about
   content, morality, historical business practices, etc. The
   distinction between objective and subjective reputation scoring is
   beyond the scope of this document; the authors do want to point out
   that services in the class of "reputation", MAY be objectively based
   on measurable and observable facts, rather than based on opinion or
   payment.

   The SIQ protocol supports differentiated pre-DATA [RFC2821] and post-
   DATA queries. Pre-DATA queries have a limited scope of information
   they can provide; they refer to the connecting SMTP client IP and the
   MAIL FROM (aka envelope-from) domain. Post-DATA queries may pose
   queries about domains in URLs or email domains found in the body of
   message, or domains in particular headers such as Errors-To, Reply-
   To, From, Resent-From, etc. Thus any SIQ protocol reputation server
   may respond appropriately, according to the specific query type; not
   treating a post-DATA query with the same scoring or evaluation
   criteria as a pre-DATA query.

   As a specific example, query clients may be designed for 
   anti-phishing functions with post-DATA queries, such as via marking
   the suspicious emails with a warning. The criteria for evaluating
   these queries would be very different from the criteria for
   evaluating the pre-DATA, MAIL FROM domain and sending server IP
   address pairs.


1.2 Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].


2. Overview

   The SIQ protocol uses a query & response model over UDP with an
   alternative method over TCP/HTTP. Implementation and specification
   borrows ideas from the Domain Name Service (DNS) defined by [RFC1035]
   and the Hypertext Transfer Protocol (HTTP/1.1) defined by [RFC2616].

   Upon receiving a new inbound SMTP connection, an MX sends one or more
   queries via UDP or HTTP. The queries consist of the connecting client
   IP address, a query domain, along with other housekeeping bits which
   will be described in detail later in this document. The query domain
   is either the MAIL FROM domain or a domain found in the DATA content.
   See section 5.1.

   Depending on the type of query made, the SIQ score returned can be
   used to reject or accept, with optional tagging for sorting or
   further processing. Support for per RCPT domain processing is
   anticipated by the protocol design and may optionally be provided for
   multi-RCPT messages, dependent on query client implementation.


3. Query & Response UDP

   A client attempts to contact one or more SIQ servers with a query
   packet on port UDP/6262 (see section 7) for a response using an
   exponential backoff algorithm.

   The exponential backoff algorithm starts with an initial timeout
   value and a set number of rounds. Each round, the query packet is
   sent to one or more servers in turn, waiting the current timeout
   period for a response before trying the next server. If no response
   is received after trying all servers, then the timeout value is
   doubled then divided by the number of servers and a new round
   initiated. This process is repeated until a response is received or
   the set number of rounds is reached.

   For example, starting with an initial timeout value of 5 seconds and
   a maximum of 4 rounds:

      1 server:   5+     10+    20+    40       = 75 seconds
      2 servers:  5+5+   5+5+   10+10+ 20+20    = 80 seconds
      3 servers:  5+5+5+ 3+3+3+ 6+6+6+ 13+13+13 = 87 seconds

   For example, starting with an initial timeout value of 3 seconds and
   a maximum of 4 rounds:

      1 server:   3+     6+     12+    24       = 45 seconds
      2 servers:  3+3+   3+3+   6+6+   12+12    = 48 seconds
      3 servers:  3+3+3+ 2+2+2+ 4+4+4+ 8+8+8    = 51 seconds

   In the event that no answer is found after the last round, the client
   MUST assume an `UNKNOWN' result and continue to handle the message
   subject to local policy.


3.1 UDP Query Format

   Query & Response UDP packets MUST NOT be longer than 512 octets. If a
   query packet would be longer than 512 octets, an HTTP request MUST be
   performed instead.

   The Query packet has the following format:


                                          1  1  1  1  1  1
            0  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5
          +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
     +0   |        VERSION        |      RESERVED      |QT|
          +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
     +2   |                      ID                       |
          +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
     +4   |                                               |
          /                     IPv6                      /
          /                                               /
          +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--|
    +20   |       QD-LENGTH       |      RD-LENGTH        |
          +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--|
    +22   |                                               |
          /                      QD                       /
          /                                               /
          +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
          |                                               |
          /                      RD                       /
          /                                               /
          +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
   +512 max.


   where:

   VERSION        The packet format described by this document is
                  version one (1).

   QT             Type of query (see section 5.1):

                     0 = MAIL FROM
                     1 = DATA

   ID             A 16 bit identifier assigned by the program that
                  generates any kind of query. This identifier is copied
                  into the corresponding reply and can be used by the
                  requester to match up replies to outstanding queries.

   IP             The octets of the IPv6 address for the connecting mail
                  client in network order. IPv4 addresses are encoded
                  according to [RFC2373] section 2.5.3 "IPv4-compatible
                  IPv6 address". See section 5.2.
                  
   QD-LENGTH      An 8-bit unsigned value that gives the number of
                  octets in the QD section.
                  
   RD-LENGTH      An 8-bit unsigned value that gives the number of
                  octets in the RD section.

   QD             The US-ASCII octets of the MAIL domain argument or a
                  domain from the DATA content.

   RD             The US-ASCII octets of the RCPT domain argument or
                  substitute. The RCPT domain MAY be given here,
                  otherwise this field MAY be empty to indicate that
                  the server should use default processing (possibly a
                  general SIQ server default or possibly a default
                  according to SIQ client IP), or other characters may
                  be sent here to specify other than default
                  processing, if privacy of the MAIL and RCPT
                  combination is a concern.


3.2 UDP Response Format

   A response packet has the following format:

                                          1  1  1  1  1  1
            0  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5
          +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
     +0   |        VERSION        |         SCORE         |
          +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
     +2   |                      ID                       |
          +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
     +4   |       IP-SCORE        |     DOMAIN-SCORE      |
          +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
     +6   |       REL-SCORE       |      TEXT LENGTH      |
          +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
     +8   |                                               |
          /                     TEXT                      /
          /                                               /
          +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--|

   where:

   VERSION        The packet format described by this document is
                  version one (1).

   ID             A 16 bit identifier assigned by the client that
                  generates any kind of query. This identifier is copied
                  into the corresponding reply and can be used by the
                  client to match up replies to outstanding queries.

   SCORE          An 8-bit signed value between -127 and 127, where
                  values other than -3 through 100 are reserved.

                   -3   REDIRECT the request to the SIQ server IP
                        address given in the TEXT section.

                   -2   A TEMPFAIL result. The mail server is
                        recommended to reject the current message with a
                        4xx response.

                   -1   An UNKNOWN result, including all manner of error
                        conditions. TEXT section may contain human
                        readable log information.

                    0..100    
                        Composite Score: 0 SHOULD be interpreted as
                        REJECT or tagged suspect, 100 SHOULD be
                        interpreted as ACCEPT. The TEXT section MAY
                        contain a report summary suitable for logging
                        purposes and/or inclusion in headers for
                        sorting.

                  Any permanent or temporary errors, for example a
                  lookup error or service restarting, MUST be
                  represented as an UNKNOWN result.

                  In the event of an UNKNOWN response, the mail server
                  SHOULD continue to handle the message subject to local
                  policy.

                  The composite score value is the one intended for
                  primary use and is generated by some server specific
                  function based on the other three scores. How this
                  value is computed is beyond the scope of this document
                  and intentionally left unspecified.

   IP-SCORE       Each is an 8-bit value between -1 and 100. Each is a
   DOMAIN-SCORE   standardized percentage based on SIQ server data
   REL-SCORE      pertaining to an individual IP, individual domain, and
                  the relationship between IP and domain respectively.

                   -1   An UNKNOWN result, possibly do to insufficient
                        data.

                    0..100 
                        Percentage score, 0 is unfavourable and 100 is
                        favourable.

                  These three scores MAY be provided as supplemental
                  information by the server. They may be ignored,
                  logged, and/or used for supplemental tests by the
                  query client.

   TEXT LENGTH    An 8-bit unsigned value that gives the number of
                  octets in the TEXT section.

   TEXT           US-ASCII octets that provide a brief commentary about
                  the response.


4. Query & Response HTTP
  
   The HTTP/1.1 protocol [RFC2616] (see section 5.3) is used to
   provide a reliable means of validating an IP and domain name
   combination. It MAY be used instead of the UDP method to obtain a
   more detailed answer.

   A client query consists of a GET, HEAD, or POST request to an HTTP
   server which SHOULD run on TCP/6262 (see section 7). The server MAY
   run on TCP/80 or TCP/443 depending on the needs of the server, such
   as to make efficient use of caching web proxies, to enable it to
   also serve a dual function allowing standard web browsers to query
   it, or provide privacy through an encrypted connection.

   The HEAD method SHOULD be used by unattended clients to minimize
   network traffic. A POST request is used in the event that the
   GET/HEAD request results in a response code 414 (URI too long) or to
   obtain an uncached result.


4.1 HTTP Query Format

   The following is the general format for the HTTP request and adheres
   to [RFC2616] :

      HEAD /siq/protocol-VERSION?QUERY HTTP/1.1
      Host: SIQ-SERVER:PORT

   Where:

   VERSION        The URI format described by this document is version
                  one (1).

   SIQ-SERVER     The fully qualified domain name SIQ server being
                  consulted. This header is required by [RFC2616].

   PORT           When the HTTP port is something other than port 80,
                  then :PORT is that port number. The :PORT is
                  optional when the port is 80 (see [RFC2616]).

   QUERY          The query portion of a URL that contains the following
                  fields in no particular order:

                     ip=IP&qt=QT&qd=QD&rd=RD

   IP             The IPv6 address of the connecting mail client using
                  colon notation, in the US-ASCII charset. IPv4
                  addresses are encoded according to [RFC2373] section
                  2.5.3 using "IPv4- compatible IPv6 address". See
                  section 5.2.

   QT             See QT defined in section 3.1.

   QD             See QD defined in section 3.1.

   RD             See RD defined in section 3.1.


   In the case of a POST request, the following format would be used:

      POST /siq/protocol-VERSION HTTP/1.1
      Host: SIQ-SERVER:PORT
      Content-Length: LENGTH

      ip=IP&qt=QT&qd=QD&rd=RD
   

   where in addition to the fields explained above:

   LENGTH         The length of the URL-encoded POST data.


   Note that the URI query-string or POST data is URL-encoded and must
   be decoded before being interpreted. 
   

4.2 HTTP Response Format

   The HTTP server may be implemented in any manner (see section 5.3),
   such as a custom and dedicated server or as generic server using a
   script or CGI to process the URI. The choice and manner of the
   server side implementation is beyond the scope of this document.

   The response format adheres to [RFC2616]:

      HTTP/1.1 STATUS REASON
      X-SIQ-Score: 23
      X-SIQ-Comment: suspect

   where:

   STATUS         The standard HTTP response code. A successful response
                  can be either "200 OK" or "204 No Content", with
                  supplemental information placed in X-SIQ-* extension
                  headers. In the case of a "200 OK" response, the body
                  is optional and unspecified.

                  A 301, 302, or 303 status codes and a Location: 
                  header indicate that the client redirect the request
                  to another server.

                  A "404 Not Found" MUST be treated as an UNKNOWN
                  result.

                  A "414 URI Too Long" indicates the client MUST either
                  request again using POST, or treat as UNKNOWN.

                  A "500 Internal Server Error" MUST be treated as a
                  UNKNOWN result.

   REASON         Standard HTTP textual description of the STATUS.

   X-SIQ-Score:               See SCORE defined in section 3.2.
   
   X-SIQ-IP-Score:            See IP-SCORE defined in section 3.2.

   X-SIQ-Domain-Score:        See DOMAIN-SCORE defined in section 3.2.

   X-SIQ-Relationship-Score:  See REL-SCORE defined in section 3.2.

   X-SIQ-Comment:             See TEXT defined in section 3.2.

   Additional X-SIQ-* headers MAY be provided, which may contain
   supplemental information. However, such headers would be specific
   to a SIQ server implementation.


5. Rationale


5.1 About Query Type

   There are two types of requests: MAIL FROM or DATA.

   MAIL FROM requests use the MAIL FROM domain along with the connecting
   server IP address. The RCPT domain may be included in the query to
   select different account processing profiles within the SIQ server.
   This pre-DATA type of request is intended for the purpose of
   accepting or rejecting mail addressed to individual recipients prior
   to accepting the message content.

   DATA requests use a domain found in an email address, URL, or some
   other network reference extract from the content of a message
   combined with the connecting server IP and optionally a RCPT domain.
   The RCPT domain may optionally be used by a SIQ server to select
   different account profiles as above. The post-DATA type of request
   allows for reputation content filtering, anti-phishing, et al.

   When DATA requests are made for a single RCPT domain (note there
   may be multiple RCPTs in a given domain), the message MAY be
   rejected if the returned SIQ score favours such an action with
   respect to local policy. If there is more than one RCPT domain and
   not all of the DATA queries return unfavourable SIQ scores, then
   the SIQ client MAY accept and tag the message in some manner, such
   as modified subject and/or extra headers.

   This protocol intentionally never passes the user portion of email
   addresses to the third party SIQ server. Only the domain part of
   MAIL FROM, RCPT, or DATA element arguments is passed, for privacy
   reasons and to prevent SIQ servers from being turned into
   surveillance vectors. RCPT domain may be omitted from queries or
   characters whose meaning is known only to the query client may be
   substituted for RCPT domain. Therefore SIQ server data becomes less
   useful for third party surveillance purposes.


5.2 Use of IPv4-compatible IPv6 address format

   [RFC2373] section 2.5.4 specifies two ways to encode IPv4
   addresses: "IPv4-compatible IPv6 address" and "IPv4-mapped IPv6
   address". The former is essentially the IPv4 address zero padding
   the high order bits to IPv6 and the latter specifies padding the
   IPv4 with 16 one bits then the remainder as zero bits. The former
   was chosen for ease of coding and because "IPv4-mapped IPv6
   address" states for IPv4-only nodes, which the SIQ client cannot
   distinguish; how are we to know if the client connection does or
   does not support IPv6 just because they connected from IPv4 space.


5.3 HTTP 1.0 and 1.1

   Preliminary versions of this document used the HTTP/1.0 version
   number [RFC1945] for HTTP requests to signal to the server that the
   request is not a persistent connection, without the need of extra
   headers. It also allowed the SIQ protocol to be implemented on
   older HTTP servers.
   
   However, for consistency and clarity of interpretation, HTTP/1.1
   [RFC2616] is assumed for complete and conforming client and server
   implementations. HTTP/1.0 semantics may be used for minimalist
   implementations, though not recommended for production systems.
   
   Also HTTP/1.1 allows for persistent connections, which are better
   suited to multiple queries (pre-DATA and post-DATA) reducing the
   overhead of building up and tearing down individual TCP connections.

   
6. Security Considerations

   UDP queries benefit from their compactness and speed, but they are
   sent in the clear and lack any real form of authentication. The
   concept of a query ID being returned in a response packet is similar
   to that used in DNS [RFC1035]. Its purpose is for tracking requests.
   A response containing a matching query ID should not be relied upon
   as proof that the response came from a known and reliable SIQ server,
   even when verified against the source IP, any more than one would
   rely on a response from a UDP DNS server as guaranteed to be
   accurate. IP address spoofing and man in the middle attacks are an
   issue, since they could be used to falsify queries or responses.
   Where security is a higher priority than performance, HTTP queries
   SHOULD be used instead, preferably over a TLS/SSL connection.

   The information passed using either the UDP packet queries or HTTP
   queries, such as the combination of sender's IP, MAIL FROM, and the
   RCPT TO domain may pose some privacy issues. Similar information
   already appears in message trace headers and those headers may have
   already been viewed and logged by intermediate MX servers during
   transit. Taking this perspective, the queries make use of
   information that may have already been revealed else where.

   However, with today's Internet privacy paranoia, a SIQ client MAY
   choose to make HTTP queries over a TLS/SSL connection at the sake
   of the speed and convenience offered by UDP queries or unencrypted
   HTTP queries.


7. IANA Considerations

   Application is to be made for a User (Registered) Port Number using
   TCP and UDP. This port number would replace the proposed value of
   6262 outlined above.
   

8. Normative References

   [RFC2119]      Key words for use in RFCs to Indicate Requirement
                  Levels. S. Bradner. March 1997.
   
   [RFC2373]      IP Version 6 Addressing Architecture. R. Hinden, S.
                  Deering. July 1998.
                  
   [RFC2616]      Hypertext Transfer Protocol -- HTTP/1.1. R. Fielding,
                  J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P.
                  Leach, T. Berners-Lee. June 1999.

   [RFC2821]      Simple Mail Transfer Protocol. J. Klensin, Ed.. April
                  2001.
               

9. Informational References

   [RFC1035]      Domain names - implementation and specification. P.V.
                  Mockapetris. Nov-01-1987

   [RFC1945]      Hypertext Transfer Protocol -- HTTP/1.0. 
                  T. Berners-Lee, R. Fielding, H. Frystyk. May 1996.
     

10. Authors' Addresses

   Anthony C. Howe
   42 av. Isola Bella
   06400 Cannes, France
   <achowe@snert.com>

   April Lorenzen
   PO Box 293, Jamestown 
   RI 02835, USA
   <ietf.siq@codelock.com>

   Petru Paler
   Brasov, Romania
   <petru@paler.net>

   Derek J. Balling
   557 Broadway Apt 37A
   Port Ewen, NY 12466
   <dredd@megacity.org>


Comments

   Comments on this draft are welcome.  In the interests of openness,
   rather than contacting the authors directly, please post to:
   
      http://asrg.sp.am/wiki?Reputation_And_Accreditation_Systems


Copyright Statement

   Copyright (C) The Internet Society (2004).  This document is subject
   to the rights, licenses and restrictions contained in BCP 78, and
   except as set forth therein, the authors retain all their rights.


Disclaimer

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Acknowledgment

   Funding for the RFC Editor function is currently provided by the
   Internet Society.
   

Expires

   This Internet Draft expires 11 Feburary 2005.