Internet Engineering Task Force Amit Gupta, Geoff Baehr draft-gupta-efficient-ad-00.txt Sun Microsystems, Inc. Date: Nov 18th, 1998 Expires: May 18th, 1998 Efficiently transporting ad-carrying web pages Status of this Memo This document is an Internet Draft. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts. Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a ``working draft'' or ``work in progress.'' To view the entire list of current Internet-Drafts, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). Distribution of this document is unlimited. Abstract This draft proposes a simple extension to HTTP, using a new set of headers, which can significantly reduce the network traffic used for sending pages containing advertising banners and other related content. We begin by describing the problems generated by the use of 'cache-busting' techniques today and why the Hit-metering scheme [RFC2227], by itself, may not be adequate. We will then describe a new scheme which relies on collaboration between the content providers and the Internet Service Providers (ISPs). draft-gupta-efficient-ad-00.txt [Page 1] INTERNET-DRAFT Efficiently transporting ads October 1998 1 Introduction The current Internet explosion is fueled in part by the availability of cheap (mostly free) services on the Internet; online advertising revenues are critical for continued growth of the Internet, as these revenues pay for most of these services. For increasing these advertising revenues, web sites resort to "cache-busting" [RFC2227] (cache-busting refers to the techniques used by origin servers to prevent the proxies from caching the transmitted data - such cache-busting may take the form of setting the page expiration time very small, or by explicitly requesting that the page not be cached, etc.) so that they know when one of their pages, with the accompanying advertisement(s), is viewed; cache-busting is also done so that the content-provider can vary the advertising that appears on a page, every time that that page is clicked. Cache-busting causes several problems - it needlessly increases the network traffic as the same content is needlessly sent over and over again; it also increases the browsing latencies observed by the clients as the entire page needs to be downloaded from the origin server across the Internet because the page was not cached. The increased traffic leads to more congestion in the network, and it also requires that all parties (the content-provider, the client, the client's ISP) pay more for increased bandwidth needed o download these pages and the accompanying advertisements again and again. Now, the ISPs pay more (for bandwidth) when they honor the requests to not cache data, even though they do not share in the advertising revenues generated. The ISPs, therefore, have no economic incentive to honor the "No-CACHE" directives; indeed, in many cases, various ISPs actively flout these directives to reduce their bandwidth costs. This proposal supports a model similar to that of television and radio networks: the content providers (the broadcast network) also include, in their transmission, slots where other entities (the local station) insert the advertisements. The advertisement revenue is then shared between the content provider and the local station (the ISP in our case). 1.1 Goals and non-goals As mentioned in [RFC2227], with HTTP/1.1, the origin servers can allow/prevent caching of responses, and that at least some of the time, this cache disabling is being done for getting access counts; these access counts are very useful for online advertisers. This proposal provides an optional performance optimization for the HTTP protocol. draft-gupta-efficient-ad-00.txt [Page 2] INTERNET-DRAFT Efficiently transporting ads October 1998 This specification is: o Optional: No server or proxy is required to implement it. o Proxy-centered: Communication is primarily between the servers and the proxies; clients are provided minimal information for possible debugging and troubleshooting purposes only. o Performance optimisation: Use of this proposal should significantly lower bandwidth use. The goals of this specification do not include: o Solving the related content-distribution and mirroring problems. o Avoiding all forms of "cache-busting" o Increasing or reducing the use of online advertising per se. The increased efficiencies due to the use of this system may, as a side-effect, lead to increase in online advertising. This design also suffers from all the limitations described in Section 1.1 of [RFC2227], namely: o If it is not deployed widely in both proxies and servers, it will provide little benefit. o It may, by partially solving the local advertising problem, reduce the pressure to adopt more complete solutions, if any become available. o Even if widely deployed, it might not be widely used, and so might not significantly improve performance. 1.2 Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHOULD", SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. 2 Overview Charging for online advertising can be based on one or a combination of the following criteria: - ad impressions : the user views the advertisement - ad click-throughs : the user clicks on the advertisement banner. draft-gupta-efficient-ad-00.txt [Page 3] INTERNET-DRAFT Efficiently transporting ads October 1998 - ad sales : the user buys from the advertiser's product(s). For the purpose of simplicity, for the rest of this description, we will assume that the charging is based on ad impressions; the underlying ideas and techniques are equally applicable to other charging criteria as well. Just like their off-line counterparts, online advertisers prefer that their advertisement be seen by a member of their "target audience"; for example, new-car buyers will be part of the target audience for a car dealership. Within this set, the more affluent viewers would be the target audience for the luxury cars, while the less-affluent viewers would be the target audience for non-luxury cars etc. The advertisers would prefer to (and do, subject to limitations of current technology) pass some of these "demographic" information to the content providers about the viewers for their advertising campaigns. The hit-metering protocol [RFC2227] describes a simple and elegant scheme by which the proxies can work with the origin servers to keep track of the number of users that viewed a particular advertisement. Unfortunately, the protocol does not distinguish between different users - each view counts as exactly the same, and if a page is shared among multiple users, all users will see the same advertisement. This uniformity is clearly undesirable. Also, the hit-metering system requires the proxies to do significant additional work - they are not being compensated for doing the extra work; their economical benefits only result from savings in the bandwidth costs - the savings that they can anyway obtain by countering the cache-busting techniques in the first place!! We believe that the advertising systems should take into account the viewers' individual preferences and demographics. The advertisements should not be tied to particular web pages; instead, the advertisements should be linked to the individual users. A much better solution would actively involve proxies and enlist their help in ensuring that the ads are seen by the target audience members, and also compensate them for the use of their resources. Such a system would use the proxies to insert the advertisements in the pages served from the origin servers; the choice (of which advertisement to insert) depends on the particular viewer's demographics. With this collaboration, the origin servers need not use cache busting anymore - in fact, cache busting will only serve to increase their bandwidth bills and pure self-interest would motivate the origin servers to avoid cache-busting. Such a scheme would provide the following benefits: - reduction in network traffic draft-gupta-efficient-ad-00.txt [Page 4] INTERNET-DRAFT Efficiently transporting ads October 1998 - precisely targetted advertising - collaboration and revenue sharing between the ISPs and the origin servers. The design described in this document introduces several new features to HTTP: - explicit interactions between the origin servers and the proxy caches. - Ad-interest: Used by the proxy cache (at the ISP) to indicate its willingness to accept pages with empty advertisement slots. - Ad-slot: Used by the origin servers to inform the proxy caches of the availability of an empty ad slot, as well as its properties. Used together, these allow the origin servers and the proxy caches to collaborate in inserting advertisements in the transmitted data stream in a mutually beneficial arrangement. 3 Design For local advertisement replacement/insertion to work correctly, the proxy needs to modify, on-the-fly, the HTML data stream on its way from the origin server to the client browser. A greedy proxy may decide to do this modification on its own, without any prior arrangement with the origin server (by guessing the HTML fragments for advertising in the data stream, or an appropriate place ("the start of the HTML stream"). However, three key problems render such substitution infeasible in practice: o Such guessing is inherently hard - the proxy can try certain combination of patterns, e.g., an tag within an block. Not only will this not capture many advertisements, but it may also mis-identify other useful content as pure advertisements. Even if you only add the advertisement at the beginning of the HTML stream, you may end up corrupting the HTML layout, or writing on top of some other data. o Such searches significantly increase the computational load at the proxies, as well as the browsing latencies observed by clients. o More importantly, such modification is almost guaranteed to be a violation of the content provider's copyright (the advertisement replacement creates an unauthorised derivative work). The better approach is for the proxies to collaborate with the origin servers to provide local advertising; this is the approach that we draft-gupta-efficient-ad-00.txt [Page 5] INTERNET-DRAFT Efficiently transporting ads October 1998 will now describe in further detail. In this collaboration, the server indicates its willingness to let the proxy insert an advertisement locally by providing the proxy with additional information that makes it significantly easier (and more efficient) for the proxy to insert the advertisements locally. Now, one option would be for the servers to return this extra information to each and every proxy - proxies that do not understand this protocol will simply ignore the additional information. Another option is that when the proxy sends the initial HTTP Request message, it also informs the server of its interest in inserting a local advertisement. A similar approach would be for the servers to maintain lists of proxies to which they should return this extra information (these are the proxies that they have a business arrangement with); while this approach also works, it does add to the initial server configuration information. Also, in a world in which multiple proxies may share information (for example, through ICP), the server may find it useful to provide the additional information to proxies that do not support local advertising, in case the page is shared with other proxies that do. This additional information provides the following types of information to the proxy: - Ad Position: This field specifies the portion of the data stream which provides the HTML code for the advertisement. The tuple provides this information; for example, the value <240,74> would inform the proxy that from the advertisement-related replaceable HTML code starts at the character position 240 and is 74 characters long. - Ad Dimensions: This field provides the height and the width of the advertisement slot - for example, the tuple <46,230> specifies an advertisement slot that is 46 pixels tall and 230 pixels wide. - Ad Price: This field provides the amount of money that the proxy has to pay the origin server for the privilege of inserting the local advertisement. A single page may contain multiple advertisement slots - the origin sender may let the proxy replace one or more of the advertisements in these slots; if a page contains multiple replaceable ad slots, the origin server must provide the advertisement position, dimensions, and the price information for all the advertisement slots. 4 Specification 4.1 Proxy to Origin Server draft-gupta-efficient-ad-00.txt [Page 6] INTERNET-DRAFT Efficiently transporting ads October 1998 If the proxy is interested in local advertisement insertion, it SHOULD indicate its interest by adding the following header in the HTTP request: Local-ad: ad-interest=yes 4.2 Origin Server to Proxy If the origin server is not interested in local advertisement insertion, it SHOULD NOT provide any local-advertising related headers in the reply. 4.2.1 Marking advertisements as replaceable If the server is willing to accept local advertisement substitution, it needs to indicate the presence of ad-slots. For example, the following header: Local-ad: replace-ad=yes says that the proxies are allowed to replace up the advertisement slots. 4.2.2 Ad slot As discussed in the previous section, the origin server needs to provide information regarding the advertisement position, dimensions and price. The following set of headers describe a single ad slot: Local-ad: ad-slot Local-ad: ad-start=540 Local-ad: ad-length=72 Local-ad: ad-price=7 cents Local-ad: ad-height=46, ad-width=230 The origin server MUST provide the ad-slot, ad-start, and ad-length information, with the ad-slot header appearing as the first for the series of headers that describe a particular advertisement slot. In addition, it MUST also provide one or more ad-height and ad-width headers. When the advertisement dimension information is provided, the proxy MUST ensure that the inserted advertisement fits within the provided dimensions. It is RECOMMENDED that the server also provide the ad-price information; the currency units (cents in this example) are OPTIONAL - draft-gupta-efficient-ad-00.txt [Page 7] INTERNET-DRAFT Efficiently transporting ads October 1998 if no units are provided, it is assumed that the server and the proxy would have a-priori agreement regarding the currency unit. If the ad-price information is provided, it MUST precede the ad-height and ad-width headers. A single advertisement slot can only include one value for the ad-start and ad-length fields; however, it can include one or more price and height-width fields. The price for a particular choice of advertisement dimensions (ad-height and ad-width) is determined by the last preceding ad-price header for the current advertisement slot. If the ad-price information is not provided, it is assumed that the server and the proxy have a-priori agreements regarding the advertisement prices. 4.3 Multiple intermediate proxies Any intermediate proxy MAY add the "ad-interest" header to the HTTP request; however, an intermediate proxy MUST NOT remove this header from an upstream HTTP request. An intermediate proxy can try to fill in one or more advertisement slots with its own advertisement; when it does so, it MUST correctly adjust all other advertising-replacement related headers before forwarding the HTTP reply to the client; a trivial adjustment would be for the proxy to remove all other advertising-replacement related headers. This adjustment carries the risk that a software flaw may completely corrupt the HTML stream - this risk is similar to the risks posed by flaws in the server code, as well as by the flaws in other proxy code. To aid debugging and trouble-shooting, it is RECOMMENDED that when proxies replace advertisements, they also add the following header: Local-ad: ad-replacement: [, identifier] where identifies the proxy on the Internet, and the OPTIONAL identifier provides the code that should be provided to this host for reporting problems, if any. 5 Discussion 5.1 HTML and XML tags One problem with the afore-mentioned scheme is that it requires use of new HTTP headers; one can argue, with good cause, for the use of XML to provide the additional information in this case. There are two draft-gupta-efficient-ad-00.txt [Page 8] INTERNET-DRAFT Efficiently transporting ads October 1998 considerations that argue against the use of XML here: o XML is geared towards providing additional tags needed for information interchange among a specialised community; these local advertising tags would apply to all content providers and all proxies in the Internet. o The proxies will process most, if not all, HTTP replies for possible local advertisement insertion. If we do not use HTTP headers, the proxies will need to parse the HTML/XML text itself - this additional computational load will be very expensive. Also, the content providers should not need to upgrade their server software for them to benefit from this local advertising scheme. For this backward-compatibility, we can provide additional tags in HTML where the advertisements are appropriately marked and their properties (position, dimension, price) provided. IT will be advantageous to use XML in this case. 5.2 Page Sharing Definitions: A "fast" hit occurs when the proxy immediately answers the client's request (without referred back to the origin server); on the other hand, a "slow" hit occurs when the proxy issues a conditional GET to the origin server, and it gets a "Code 304: Not modified" response [RFC2068]. A key motivation for this work is to enable better caching in the web proxies. The problem with "fast" caching is that the origin server does not even know that the the page was viewed. Also, the origin server may wish to set a different price, or to send a different advertisement depending on the particular viewer - how do we accomplish that? A simple solution would rely on "slow" hits - when the origin server sends the data initially, it marks the pages with the cache-control tag "must-revalidate". When the proxy checks back with the origin server, it MAY return new advertisement headers with the reply - even if the reply is "Code 304: Not modified; use local copy". 6 References [RFC2068] Fielding, R., and Gettys, J., and Mogul, J., and Frystyk, M., and Berners-Lee, T., "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2068. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", RFC 2119. draft-gupta-efficient-ad-00.txt [Page 9] INTERNET-DRAFT Efficiently transporting ads October 1998 [RFC2227] Mogul, J. and Leach, P., "Simple Hit-Metering and Usage-Limiting for HTTP", RFC2227. 7 Address of the Authors Amit Gupta Sun Microsystems 901 San Antonio Road Mail-Stop UMTV29-118 Palo Alto CA 94303 Voice: +1 650-336-4899 Fax: +1 650-969-7269 E-mail: amit.gupta@eng.sun.com Geoff Baehr Sun Microsystems 901 San Antonio Road Mail-Stop UMTV29-118 Palo Alto CA 94303 Voice: +1 650-336-2735 Fax: +1 650-969-7269 E-mail: geoffrey.baehr@eng.sun.com draft-gupta-efficient-ad-00.txt [Page 10]