Network Working Group P. Rzewski Internet Draft Inktomi Expires 17, May 2001 B. Cain Mirror Image N. Robertson Exodus November 2000 Cross-Network Accounting for HTTP draft-rzewski-cnacct-00.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on May 14, 2001. Copyright Notice Copyright (C) The Internet Society (2000). All Rights Reserved. Abstract One of the goals of "content peering" is to allow content providers to receive detailed information on the utilization of their content in remote networks. In order to provide this, a simple, commonly accepted log format must be used on systems that deliver content. Also, a commonly accepted mechanism is required to pass these logs between networks. This draft proposes a log format and log passing Rzewski, et al Expires May 17, 2001 [Page 1] Internet-Draft CNA for HTTP November 2000 mechanism specifically for HTTP content. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . 2 1.1 Purpose . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Requirements . . . . . . . . . . . . . . . . . . . . 3 1.3 Terminology . . . . . . . . . . . . . . . . . . . . 3 2. Log format for Surrogates and Content Relay Surrogates . . . . . . . . . . . . . . 3 3. Cross-Network Accounting implementation . . . . . . 4 3.1 Role of a Surrogate . . . . . . . . . . . . . . . . 4 3.2 Specific functions of a Surrogate . . . . . . . . . 4 3.3 Role of an Accounting Relay . . . . . . . . . . . . 5 4. Security . . . . . . . . . . . . . . . . . . . . . . 5 5. Acknowledgements . . . . . . . . . . . . . . . . . . 5 References . . . . . . . . . . . . . . . . . . . . . 6 Authors' Addresses . . . . . . . . . . . . . . . . . 6 Full Copyright Statement . . . . . . . . . . . . . . 7 1. Introduction 1.1 Purpose Content providers have a desire to receive detailed information on the utilization of their content in the caches of remote networks. In order to provide them with this data, multiple problems must be solved. First, due to the large amount of log data produced by remote caches ("surrogates") summary logs must be created on these systems. These logs must be of a commonly accepted format so a content provider can collate logs received from multiple sources. Second, an interface must be defined by which these logs can be passed between networks. This draft proposes a simple summary log format that is similar to SQUID format [2] to locally store accounting information on surrogates. This draft also provides requirements for systems that use a simple, HTTP-based implementation to pass log data between networks. See [4] for a description of a full implementation of "content peering" using the systems described in this document. Rzewski, et al Expires May 17, 2001 [Page 2] Internet-Draft CNA for HTTP November 2000 1.2 Requirements The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [1]. An implementation is not compliant if it fails to satisfy one or more of the MUST or REQUIRED level requirements for the protocols it implements. An implementation that satisfies all the MUST or REQUIRED level and all the SHOULD level requirements for its protocols is said to be "unconditionally compliant"; one that satisfies all the MUST level requirements but not all the SHOULD level requirements for its protocols is said to be "conditionally compliant." 1.3 Terminology The following terms are used as defined in [3]: SURROGATE, ACCOUNTING ACCOUNTING POLICY A set of rules defined on a SURROGATE or CONTENT RELAY SURROGATE that defines a set of TARGET URLS for which logs will be generated. CONTENT RELAY SURROGATE Described in detail in [5]. TARGET URL A URL attached to a set of rules defined in an ACCOUNTING POLICY. ACCOUNTING systems described in this document MUST be able to indicate a set of TARGET URLS by Domain Name granularity, and MAY provide additional granularity (e.g. regular expressions). 2. Log format for Surrogates and Content Relay Surrogates In order to allow for cross-network log propagation, SURROGATES and CONTENT RELAY SURROGATES MUST be able to keep logs for all HTTP GET operations on TARGET URLS in Cross-Network Accounting (CNA) format. CNA format logs summarize statistics on cache hits aggregated by URL and time interval. The data is presented in the following fields: start time Start of the time period in SQUID [2] format (seconds since January 1, 1970) URL Rzewski, et al Expires May 17, 2001 [Page 3] Internet-Draft CNA for HTTP November 2000 Universal resource identifier (URI) of request from HTTP client to SURROGATE, with blanks and other non-parsing characters replaced by escape sequences. The escape sequence is the ASCII code number. number of hits Number of cache hits (TCP_HIT) for GET requests for the URL in this period. average hit time Average [Ed note: mean?] transfer time for hits in this period in milliseconds. total hit length Total SURROGATE response content length of cache hits in this period in SQUID [2] format (includes header and content length) number of non-hits Number of SURROGATE non-hits (misses, refreshes, etc.: anything other than TCP_HIT) for GET requests in this period. average non-hit time Average transfer time for non-hits in this period in milliseconds. total non-hit length Total SURROGATE response content length of cache non-hits in this period in SQUID [2] format (includes header and content-length). CNA logs MAY include additional, custom/optional fields at the end of any line. 3. Cross-Network Accounting implementation For the purposes of Section 3, all references to SURROGATES also apply to CONTENT RELAY SURROGATES. 3.1 Role of a Surrogate SURROGATES act as a "source" for logging information. They must keep local logs in CNA format (Section 2) and be able pass them to an ACCOUNTING RELAY system. 3.2 Specific functions of a Surrogate A SURROGATE stores log data such that data associated with a given TARGET URL can be accessed for cross-network posting or retrieval. Rzewski, et al Expires May 17, 2001 [Page 4] Internet-Draft CNA for HTTP November 2000 For example, the SURROGATE could store all logs for hits on URLs in Domain Name "foo.com" together in a file called "foo.com.summary- log". A SURROGATE MUST store log data such that it can be accessed at a regular, designated time interval. The designated time interval MUST allow for a minimum granularity of 1 minute. A SURROGATE MUST be able to pass all log data for a given Domain Name to an ACCOUNTING RELAY at a regular, designated time interval. The designated time interval MUST allow for a minimum granularity of 1 minute. The passing of log data is performed via an HTTP POST operation [Ed note: POST URL?] to a Hostname and Port of an ACCOUNTING RELAY. A SURROGATE MUST re-send log posts in the event an HTTP POST to an ACCOUNTING RELAY is not successful. An ACCOUNTING RELAY MAY send alarms (e-mail, SNMP traps, etc.) in the event that a re-send of a log post is necessary. 3.3 Role of an Accounting Relay An ACCOUNTING RELAY MUST be able to receive a CNA format log (Section 2) from a SURROGATE or another ACCOUNTING RELAY in the form of an HTTP POST operation [Ed note: POST URL?]. Once received, an ACCOUNTING RELAY MUST store, pass, and be able to re-send log posts to an ACCOUNTING RELAY in the same manner as SURROGATES do for locally-generated logs (Section 3.2). 4. Security Because accounting data is transferred over HTTP sessions, an ACCOUNTING RELAY MUST have the ability to restrict incoming HTTP connections to only a specified set of ACCOUNTING RELAYS, SURROGATES, or CONTENT RELAY SURROGATES. An ACCOUNTING RELAY MAY support the ability to perform SSL encryption over these HTTP sessions for added security. [Ed note: Other signed tamper prevention/detection mechanisms may be preferable due to the weight of the SSL negotiation on each connect as well as the hassles of key management.] 5. Acknowledgements Rzewski, et al Expires May 17, 2001 [Page 5] Internet-Draft CNA for HTTP November 2000 The authors acknowledge the contributions and comments of Joe Bai (Adero), Arthur Huston (Inktomi), and Paul Francis (Inktomi). References [1] S. Bradner, "Key words for use in RFCs to Indicate Requirement Levels", RFC2119, March 1997 [2] D. Wessels, "SQUID Frequently Asked Questions", (2000) [3] M. Day, B. Cain, G. Tomlinson, "A Model for CDN Peering", draft- day-cdnp-model-03.txt (work in progress), November 2000 [4] P. Rzewski, N. Robertson, J. Bai, "Origin/Access Content Peering", draft-rzewski-oacp-00.txt (work in progress), November 2000 [5] P. Rzewski, B. Cain, N. Robertson, "Cross-Network Distribution of Content Signals for HTTP", draft-rzewski-cndistcs-00.txt (work in progress), November 2000 Authors' Addresses Phillip A. Rzewski Inktomi Corporation 4100 East Third Avenue MS FC1-4 Foster City, CA 94404 USA Phone: +1 650 653 2487 Email: philr@inktomi.com Brad Cain Mirror Image Internet 49 Dragon Court Woburn, MA 01801 US Phone: +1 781 276 1904 Email: brad.cain@mirror-image.com Rzewski, et al Expires May 17, 2001 [Page 6] Internet-Draft CNA for HTTP November 2000 Niel Robertson Exodus 2900 Nautilus Court Boulder, CO 80301 US Phone: +1 303 381 2302 Email: nielr@servicemetrics.com Full Copyright Statement Copyright (C) The Internet Society (2000). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Acknowledgement Funding for the RFC editor function is currently provided by the Internet Society. Rzewski, et al Expires May 17, 2001 [Page 7]