IRTF-ASRG-IAR A. C. Howe Internet-Draft A. Lorenzen Expires: 11 Feburary 2005 P. Paler D. J. Balling 11 August 2004 Server Index Query (SIQ) Protocol draft-irtf-asrg-iar-howe-siq-00.txt Status of this Memo By submitting this Internet-Draft, I certify that any applicable patent or other IPR claims of which I am aware have been disclosed, or will be disclosed, and any of which I become aware will be disclosed, in accordance with RFC 3668. By submitting this Internet-Draft, I accept the provisions of Section 4 of RFC 3667. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than a "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Abstract The Server Index Query (SIQ) protocol is intended to provide a standard means by which a mail exchange (MX) server can query one or more external services for scoring based on facts or reputation of an IP/domain pair. This document specifies the communication protocol used to transmit the IP/domain query and return the query response. The implementation, correctness of results, and/or management of SIQ servers is beyond the scope of this document. Table of Contents 1. Introduction 1.2 Terminology 2. Overview 3. Query & Response UDP 3.1 UDP Query Format 3.2 UDP Response Format 4. Query & Response HTTP 4.1 HTTP Query Format 4.2 HTTP Response Format 5. Rationale 5.1 About Query Type 5.2 Use of IPv4-compatible IPv6 address format 5.3 HTTP 1.0 and 1.1 6. Security Considerations 7 IANA Considerations 8. Normative References 9. Informational References 10. Authors' Addresses Copyright Statement Disclaimer Acknowledgment Expires 1. Introduction The proposed SIQ protocol is intended to provide a light, reliable way for an inbound email server to query a "reputation" service and receive a useful response. Because an IP/domain pair is included in SIQ protocol queries, the response may score the IP network, domain ownership, and the quality of the relationship (denied, affirmed, inferred, undetectable) between the IP and domain. A variety of anti-forgery techniques have been proposed in recent years. However, many proposals require the domain owner to announce which outbound servers he authorizes, without third-party verification. This leaves open the possibility for abusive senders to achieve the same status as non-abusive senders, by making use of their own domains. Most of these proposals foresee the need for external reputation systems to close the abuse loophole. The SIQ protocol is put forth as a protocol for inbound servers to use in communicating with such reputation services or systems. One possibility niche for SIQ protocol servers is to do the heavy lifting - discovering, caching and collating all that can be divined from knowing an IP, a domain name, and the relationship of the IP to the domain name for the purposes of sending emails. The result for any particular IP/domain pair query may then be efficiently returned in composite form to the inbound server (SIQ protocol query client). The basis for the "reputation" scoring may be objective factors such as longevity, stability, and identifiability. Some reputation services may choose subjective factors such as judgements about content, morality, historical business practices, etc. The distinction between objective and subjective reputation scoring is beyond the scope of this document; the authors do want to point out that services in the class of "reputation", MAY be objectively based on measurable and observable facts, rather than based on opinion or payment. The SIQ protocol supports differentiated pre-DATA [RFC2821] and post- DATA queries. Pre-DATA queries have a limited scope of information they can provide; they refer to the connecting SMTP client IP and the MAIL FROM (aka envelope-from) domain. Post-DATA queries may pose queries about domains in URLs or email domains found in the body of message, or domains in particular headers such as Errors-To, Reply- To, From, Resent-From, etc. Thus any SIQ protocol reputation server may respond appropriately, according to the specific query type; not treating a post-DATA query with the same scoring or evaluation criteria as a pre-DATA query. As a specific example, query clients may be designed for anti-phishing functions with post-DATA queries, such as via marking the suspicious emails with a warning. The criteria for evaluating these queries would be very different from the criteria for evaluating the pre-DATA, MAIL FROM domain and sending server IP address pairs. 1.2 Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. 2. Overview The SIQ protocol uses a query & response model over UDP with an alternative method over TCP/HTTP. Implementation and specification borrows ideas from the Domain Name Service (DNS) defined by [RFC1035] and the Hypertext Transfer Protocol (HTTP/1.1) defined by [RFC2616]. Upon receiving a new inbound SMTP connection, an MX sends one or more queries via UDP or HTTP. The queries consist of the connecting client IP address, a query domain, along with other housekeeping bits which will be described in detail later in this document. The query domain is either the MAIL FROM domain or a domain found in the DATA content. See section 5.1. Depending on the type of query made, the SIQ score returned can be used to reject or accept, with optional tagging for sorting or further processing. Support for per RCPT domain processing is anticipated by the protocol design and may optionally be provided for multi-RCPT messages, dependent on query client implementation. 3. Query & Response UDP A client attempts to contact one or more SIQ servers with a query packet on port UDP/6262 (see section 7) for a response using an exponential backoff algorithm. The exponential backoff algorithm starts with an initial timeout value and a set number of rounds. Each round, the query packet is sent to one or more servers in turn, waiting the current timeout period for a response before trying the next server. If no response is received after trying all servers, then the timeout value is doubled then divided by the number of servers and a new round initiated. This process is repeated until a response is received or the set number of rounds is reached. For example, starting with an initial timeout value of 5 seconds and a maximum of 4 rounds: 1 server: 5+ 10+ 20+ 40 = 75 seconds 2 servers: 5+5+ 5+5+ 10+10+ 20+20 = 80 seconds 3 servers: 5+5+5+ 3+3+3+ 6+6+6+ 13+13+13 = 87 seconds For example, starting with an initial timeout value of 3 seconds and a maximum of 4 rounds: 1 server: 3+ 6+ 12+ 24 = 45 seconds 2 servers: 3+3+ 3+3+ 6+6+ 12+12 = 48 seconds 3 servers: 3+3+3+ 2+2+2+ 4+4+4+ 8+8+8 = 51 seconds In the event that no answer is found after the last round, the client MUST assume an `UNKNOWN' result and continue to handle the message subject to local policy. 3.1 UDP Query Format Query & Response UDP packets MUST NOT be longer than 512 octets. If a query packet would be longer than 512 octets, an HTTP request MUST be performed instead. The Query packet has the following format: 1 1 1 1 1 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ +0 | VERSION | RESERVED |QT| +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ +2 | ID | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ +4 | | / IPv6 / / / +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--| +20 | QD-LENGTH | RD-LENGTH | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--| +22 | | / QD / / / +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | | / RD / / / +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ +512 max. where: VERSION The packet format described by this document is version one (1). QT Type of query (see section 5.1): 0 = MAIL FROM 1 = DATA ID A 16 bit identifier assigned by the program that generates any kind of query. This identifier is copied into the corresponding reply and can be used by the requester to match up replies to outstanding queries. IP The octets of the IPv6 address for the connecting mail client in network order. IPv4 addresses are encoded according to [RFC2373] section 2.5.3 "IPv4-compatible IPv6 address". See section 5.2. QD-LENGTH An 8-bit unsigned value that gives the number of octets in the QD section. RD-LENGTH An 8-bit unsigned value that gives the number of octets in the RD section. QD The US-ASCII octets of the MAIL domain argument or a domain from the DATA content. RD The US-ASCII octets of the RCPT domain argument or substitute. The RCPT domain MAY be given here, otherwise this field MAY be empty to indicate that the server should use default processing (possibly a general SIQ server default or possibly a default according to SIQ client IP), or other characters may be sent here to specify other than default processing, if privacy of the MAIL and RCPT combination is a concern. 3.2 UDP Response Format A response packet has the following format: 1 1 1 1 1 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ +0 | VERSION | SCORE | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ +2 | ID | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ +4 | IP-SCORE | DOMAIN-SCORE | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ +6 | REL-SCORE | TEXT LENGTH | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ +8 | | / TEXT / / / +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--| where: VERSION The packet format described by this document is version one (1). ID A 16 bit identifier assigned by the client that generates any kind of query. This identifier is copied into the corresponding reply and can be used by the client to match up replies to outstanding queries. SCORE An 8-bit signed value between -127 and 127, where values other than -3 through 100 are reserved. -3 REDIRECT the request to the SIQ server IP address given in the TEXT section. -2 A TEMPFAIL result. The mail server is recommended to reject the current message with a 4xx response. -1 An UNKNOWN result, including all manner of error conditions. TEXT section may contain human readable log information. 0..100 Composite Score: 0 SHOULD be interpreted as REJECT or tagged suspect, 100 SHOULD be interpreted as ACCEPT. The TEXT section MAY contain a report summary suitable for logging purposes and/or inclusion in headers for sorting. Any permanent or temporary errors, for example a lookup error or service restarting, MUST be represented as an UNKNOWN result. In the event of an UNKNOWN response, the mail server SHOULD continue to handle the message subject to local policy. The composite score value is the one intended for primary use and is generated by some server specific function based on the other three scores. How this value is computed is beyond the scope of this document and intentionally left unspecified. IP-SCORE Each is an 8-bit value between -1 and 100. Each is a DOMAIN-SCORE standardized percentage based on SIQ server data REL-SCORE pertaining to an individual IP, individual domain, and the relationship between IP and domain respectively. -1 An UNKNOWN result, possibly do to insufficient data. 0..100 Percentage score, 0 is unfavourable and 100 is favourable. These three scores MAY be provided as supplemental information by the server. They may be ignored, logged, and/or used for supplemental tests by the query client. TEXT LENGTH An 8-bit unsigned value that gives the number of octets in the TEXT section. TEXT US-ASCII octets that provide a brief commentary about the response. 4. Query & Response HTTP The HTTP/1.1 protocol [RFC2616] (see section 5.3) is used to provide a reliable means of validating an IP and domain name combination. It MAY be used instead of the UDP method to obtain a more detailed answer. A client query consists of a GET, HEAD, or POST request to an HTTP server which SHOULD run on TCP/6262 (see section 7). The server MAY run on TCP/80 or TCP/443 depending on the needs of the server, such as to make efficient use of caching web proxies, to enable it to also serve a dual function allowing standard web browsers to query it, or provide privacy through an encrypted connection. The HEAD method SHOULD be used by unattended clients to minimize network traffic. A POST request is used in the event that the GET/HEAD request results in a response code 414 (URI too long) or to obtain an uncached result. 4.1 HTTP Query Format The following is the general format for the HTTP request and adheres to [RFC2616] : HEAD /siq/protocol-VERSION?QUERY HTTP/1.1 Host: SIQ-SERVER:PORT Where: VERSION The URI format described by this document is version one (1). SIQ-SERVER The fully qualified domain name SIQ server being consulted. This header is required by [RFC2616]. PORT When the HTTP port is something other than port 80, then :PORT is that port number. The :PORT is optional when the port is 80 (see [RFC2616]). QUERY The query portion of a URL that contains the following fields in no particular order: ip=IP&qt=QT&qd=QD&rd=RD IP The IPv6 address of the connecting mail client using colon notation, in the US-ASCII charset. IPv4 addresses are encoded according to [RFC2373] section 2.5.3 using "IPv4- compatible IPv6 address". See section 5.2. QT See QT defined in section 3.1. QD See QD defined in section 3.1. RD See RD defined in section 3.1. In the case of a POST request, the following format would be used: POST /siq/protocol-VERSION HTTP/1.1 Host: SIQ-SERVER:PORT Content-Length: LENGTH ip=IP&qt=QT&qd=QD&rd=RD where in addition to the fields explained above: LENGTH The length of the URL-encoded POST data. Note that the URI query-string or POST data is URL-encoded and must be decoded before being interpreted. 4.2 HTTP Response Format The HTTP server may be implemented in any manner (see section 5.3), such as a custom and dedicated server or as generic server using a script or CGI to process the URI. The choice and manner of the server side implementation is beyond the scope of this document. The response format adheres to [RFC2616]: HTTP/1.1 STATUS REASON X-SIQ-Score: 23 X-SIQ-Comment: suspect where: STATUS The standard HTTP response code. A successful response can be either "200 OK" or "204 No Content", with supplemental information placed in X-SIQ-* extension headers. In the case of a "200 OK" response, the body is optional and unspecified. A 301, 302, or 303 status codes and a Location: header indicate that the client redirect the request to another server. A "404 Not Found" MUST be treated as an UNKNOWN result. A "414 URI Too Long" indicates the client MUST either request again using POST, or treat as UNKNOWN. A "500 Internal Server Error" MUST be treated as a UNKNOWN result. REASON Standard HTTP textual description of the STATUS. X-SIQ-Score: See SCORE defined in section 3.2. X-SIQ-IP-Score: See IP-SCORE defined in section 3.2. X-SIQ-Domain-Score: See DOMAIN-SCORE defined in section 3.2. X-SIQ-Relationship-Score: See REL-SCORE defined in section 3.2. X-SIQ-Comment: See TEXT defined in section 3.2. Additional X-SIQ-* headers MAY be provided, which may contain supplemental information. However, such headers would be specific to a SIQ server implementation. 5. Rationale 5.1 About Query Type There are two types of requests: MAIL FROM or DATA. MAIL FROM requests use the MAIL FROM domain along with the connecting server IP address. The RCPT domain may be included in the query to select different account processing profiles within the SIQ server. This pre-DATA type of request is intended for the purpose of accepting or rejecting mail addressed to individual recipients prior to accepting the message content. DATA requests use a domain found in an email address, URL, or some other network reference extract from the content of a message combined with the connecting server IP and optionally a RCPT domain. The RCPT domain may optionally be used by a SIQ server to select different account profiles as above. The post-DATA type of request allows for reputation content filtering, anti-phishing, et al. When DATA requests are made for a single RCPT domain (note there may be multiple RCPTs in a given domain), the message MAY be rejected if the returned SIQ score favours such an action with respect to local policy. If there is more than one RCPT domain and not all of the DATA queries return unfavourable SIQ scores, then the SIQ client MAY accept and tag the message in some manner, such as modified subject and/or extra headers. This protocol intentionally never passes the user portion of email addresses to the third party SIQ server. Only the domain part of MAIL FROM, RCPT, or DATA element arguments is passed, for privacy reasons and to prevent SIQ servers from being turned into surveillance vectors. RCPT domain may be omitted from queries or characters whose meaning is known only to the query client may be substituted for RCPT domain. Therefore SIQ server data becomes less useful for third party surveillance purposes. 5.2 Use of IPv4-compatible IPv6 address format [RFC2373] section 2.5.4 specifies two ways to encode IPv4 addresses: "IPv4-compatible IPv6 address" and "IPv4-mapped IPv6 address". The former is essentially the IPv4 address zero padding the high order bits to IPv6 and the latter specifies padding the IPv4 with 16 one bits then the remainder as zero bits. The former was chosen for ease of coding and because "IPv4-mapped IPv6 address" states for IPv4-only nodes, which the SIQ client cannot distinguish; how are we to know if the client connection does or does not support IPv6 just because they connected from IPv4 space. 5.3 HTTP 1.0 and 1.1 Preliminary versions of this document used the HTTP/1.0 version number [RFC1945] for HTTP requests to signal to the server that the request is not a persistent connection, without the need of extra headers. It also allowed the SIQ protocol to be implemented on older HTTP servers. However, for consistency and clarity of interpretation, HTTP/1.1 [RFC2616] is assumed for complete and conforming client and server implementations. HTTP/1.0 semantics may be used for minimalist implementations, though not recommended for production systems. Also HTTP/1.1 allows for persistent connections, which are better suited to multiple queries (pre-DATA and post-DATA) reducing the overhead of building up and tearing down individual TCP connections. 6. Security Considerations UDP queries benefit from their compactness and speed, but they are sent in the clear and lack any real form of authentication. The concept of a query ID being returned in a response packet is similar to that used in DNS [RFC1035]. Its purpose is for tracking requests. A response containing a matching query ID should not be relied upon as proof that the response came from a known and reliable SIQ server, even when verified against the source IP, any more than one would rely on a response from a UDP DNS server as guaranteed to be accurate. IP address spoofing and man in the middle attacks are an issue, since they could be used to falsify queries or responses. Where security is a higher priority than performance, HTTP queries SHOULD be used instead, preferably over a TLS/SSL connection. The information passed using either the UDP packet queries or HTTP queries, such as the combination of sender's IP, MAIL FROM, and the RCPT TO domain may pose some privacy issues. Similar information already appears in message trace headers and those headers may have already been viewed and logged by intermediate MX servers during transit. Taking this perspective, the queries make use of information that may have already been revealed else where. However, with today's Internet privacy paranoia, a SIQ client MAY choose to make HTTP queries over a TLS/SSL connection at the sake of the speed and convenience offered by UDP queries or unencrypted HTTP queries. 7. IANA Considerations Application is to be made for a User (Registered) Port Number using TCP and UDP. This port number would replace the proposed value of 6262 outlined above. 8. Normative References [RFC2119] Key words for use in RFCs to Indicate Requirement Levels. S. Bradner. March 1997. [RFC2373] IP Version 6 Addressing Architecture. R. Hinden, S. Deering. July 1998. [RFC2616] Hypertext Transfer Protocol -- HTTP/1.1. R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee. June 1999. [RFC2821] Simple Mail Transfer Protocol. J. Klensin, Ed.. April 2001. 9. Informational References [RFC1035] Domain names - implementation and specification. P.V. Mockapetris. Nov-01-1987 [RFC1945] Hypertext Transfer Protocol -- HTTP/1.0. T. Berners-Lee, R. Fielding, H. Frystyk. May 1996. 10. Authors' Addresses Anthony C. Howe 42 av. Isola Bella 06400 Cannes, France April Lorenzen PO Box 293, Jamestown RI 02835, USA Petru Paler Brasov, Romania Derek J. Balling 557 Broadway Apt 37A Port Ewen, NY 12466 Comments Comments on this draft are welcome. In the interests of openness, rather than contacting the authors directly, please post to: http://asrg.sp.am/wiki?Reputation_And_Accreditation_Systems Copyright Statement Copyright (C) The Internet Society (2004). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. Disclaimer This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. Expires This Internet Draft expires 11 Feburary 2005.