Internet Engineering Task Force Amit Gupta, Geoff Baehr
draft-gupta-efficient-ad-00.txt Sun Microsystems, Inc.
Date: Nov 18th, 1998
Expires: May 18th, 1998
Efficiently transporting ad-carrying web pages
Status of this Memo
This document is an Internet Draft. Internet Drafts are working documents
of the Internet Engineering Task Force (IETF), its Areas, and its Working
Groups. Note that other groups may also distribute working documents as
Internet Drafts.
Internet Drafts are draft documents valid for a maximum of six months.
Internet Drafts may be updated, replaced, or obsoleted by other documents
at any time. It is not appropriate to use Internet Drafts as reference
material or to cite them other than as a ``working draft'' or ``work in
progress.''
To view the entire list of current Internet-Drafts, please check the
"1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern
Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific
Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast).
Distribution of this document is unlimited.
Abstract
This draft proposes a simple extension to HTTP, using a new
set of headers, which can significantly reduce the network traffic
used for sending pages containing advertising banners and other
related content. We begin by describing the problems generated by the
use of 'cache-busting' techniques today and why the Hit-metering
scheme [RFC2227], by itself, may not be adequate. We will then
describe a new scheme which relies on collaboration between the
content providers and the Internet Service Providers (ISPs).
draft-gupta-efficient-ad-00.txt [Page 1]
INTERNET-DRAFT Efficiently transporting ads October 1998
1 Introduction
The current Internet explosion is fueled in part by the availability
of cheap (mostly free) services on the Internet; online advertising
revenues are critical for continued growth of the Internet, as these
revenues pay for most of these services.
For increasing these advertising revenues, web sites resort to
"cache-busting" [RFC2227] (cache-busting refers to the techniques used
by origin servers to prevent the proxies from caching the transmitted
data - such cache-busting may take the form of setting the page
expiration time very small, or by explicitly requesting that the page
not be cached, etc.) so that they know when one of their pages, with
the accompanying advertisement(s), is viewed; cache-busting is also
done so that the content-provider can vary the advertising that
appears on a page, every time that that page is clicked.
Cache-busting causes several problems - it needlessly increases the
network traffic as the same content is needlessly sent over and over
again; it also increases the browsing latencies observed by the
clients as the entire page needs to be downloaded from the origin
server across the Internet because the page was not cached. The
increased traffic leads to more congestion in the network, and it also
requires that all parties (the content-provider, the client, the
client's ISP) pay more for increased bandwidth needed o download these
pages and the accompanying advertisements again and again. Now, the
ISPs pay more (for bandwidth) when they honor the requests to not
cache data, even though they do not share in the advertising revenues
generated. The ISPs, therefore, have no economic incentive to honor
the "No-CACHE" directives; indeed, in many cases, various ISPs
actively flout these directives to reduce their bandwidth costs.
This proposal supports a model similar to that of television and radio
networks: the content providers (the broadcast network) also include,
in their transmission, slots where other entities (the local station)
insert the advertisements. The advertisement revenue is then shared
between the content provider and the local station (the ISP in our
case).
1.1 Goals and non-goals
As mentioned in [RFC2227], with HTTP/1.1, the origin servers can
allow/prevent caching of responses, and that at least some of the
time, this cache disabling is being done for getting access counts;
these access counts are very useful for online advertisers. This
proposal provides an optional performance optimization for the HTTP
protocol.
draft-gupta-efficient-ad-00.txt [Page 2]
INTERNET-DRAFT Efficiently transporting ads October 1998
This specification is:
o Optional: No server or proxy is required to implement it.
o Proxy-centered: Communication is primarily between the servers and
the proxies; clients are provided minimal information for possible
debugging and troubleshooting purposes only.
o Performance optimisation: Use of this proposal should significantly
lower bandwidth use.
The goals of this specification do not include:
o Solving the related content-distribution and mirroring problems.
o Avoiding all forms of "cache-busting"
o Increasing or reducing the use of online advertising per se. The
increased efficiencies due to the use of this system may, as a
side-effect, lead to increase in online advertising.
This design also suffers from all the limitations described in Section
1.1 of [RFC2227], namely:
o If it is not deployed widely in both proxies and servers, it will
provide little benefit.
o It may, by partially solving the local advertising problem, reduce
the pressure to adopt more complete solutions, if any become
available.
o Even if widely deployed, it might not be widely used, and so might
not significantly improve performance.
1.2 Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHOULD", SHOULD NOT",
"RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
interpreted as described in RFC 2119.
2 Overview
Charging for online advertising can be based on one or a combination
of the following criteria:
- ad impressions : the user views the advertisement
- ad click-throughs : the user clicks on the advertisement banner.
draft-gupta-efficient-ad-00.txt [Page 3]
INTERNET-DRAFT Efficiently transporting ads October 1998
- ad sales : the user buys from the advertiser's product(s).
For the purpose of simplicity, for the rest of this description, we
will assume that the charging is based on ad impressions; the
underlying ideas and techniques are equally applicable to other
charging criteria as well.
Just like their off-line counterparts, online advertisers prefer that
their advertisement be seen by a member of their "target audience";
for example, new-car buyers will be part of the target audience for a
car dealership. Within this set, the more affluent viewers would be
the target audience for the luxury cars, while the less-affluent
viewers would be the target audience for non-luxury cars etc. The
advertisers would prefer to (and do, subject to limitations of current
technology) pass some of these "demographic" information to the
content providers about the viewers for their advertising campaigns.
The hit-metering protocol [RFC2227] describes a simple and elegant
scheme by which the proxies can work with the origin servers to keep
track of the number of users that viewed a particular
advertisement. Unfortunately, the protocol does not distinguish
between different users - each view counts as exactly the same, and if
a page is shared among multiple users, all users will see the same
advertisement. This uniformity is clearly undesirable. Also, the
hit-metering system requires the proxies to do significant additional
work - they are not being compensated for doing the extra work; their
economical benefits only result from savings in the bandwidth costs -
the savings that they can anyway obtain by countering the
cache-busting techniques in the first place!!
We believe that the advertising systems should take into account the
viewers' individual preferences and demographics. The advertisements
should not be tied to particular web pages; instead, the
advertisements should be linked to the individual users. A much better
solution would actively involve proxies and enlist their help in
ensuring that the ads are seen by the target audience members, and
also compensate them for the use of their resources.
Such a system would use the proxies to insert the advertisements in
the pages served from the origin servers; the choice (of which
advertisement to insert) depends on the particular viewer's
demographics. With this collaboration, the origin servers need not use
cache busting anymore - in fact, cache busting will only serve to
increase their bandwidth bills and pure self-interest would motivate
the origin servers to avoid cache-busting. Such a scheme would provide
the following benefits:
- reduction in network traffic
draft-gupta-efficient-ad-00.txt [Page 4]
INTERNET-DRAFT Efficiently transporting ads October 1998
- precisely targetted advertising
- collaboration and revenue sharing between the ISPs and the origin
servers.
The design described in this document introduces several new features
to HTTP:
- explicit interactions between the origin servers and the proxy
caches.
- Ad-interest: Used by the proxy cache (at the ISP) to indicate its
willingness to accept pages with empty advertisement slots.
- Ad-slot: Used by the origin servers to inform the proxy caches of
the availability of an empty ad slot, as well as its properties.
Used together, these allow the origin servers and the proxy caches to
collaborate in inserting advertisements in the transmitted data stream
in a mutually beneficial arrangement.
3 Design
For local advertisement replacement/insertion to work correctly, the
proxy needs to modify, on-the-fly, the HTML data stream on its way
from the origin server to the client browser. A greedy proxy may
decide to do this modification on its own, without any prior
arrangement with the origin server (by guessing the HTML fragments for
advertising in the data stream, or an appropriate place ("the start of
the HTML stream"). However, three key problems render such substitution
infeasible in practice:
o Such guessing is inherently hard - the proxy can try certain
combination of patterns, e.g., an
tag within an block. Not only will this not capture
many advertisements, but it may also mis-identify other useful content
as pure advertisements. Even if you only add the advertisement at the
beginning of the HTML stream, you may end up corrupting the HTML
layout, or writing on top of some other data.
o Such searches significantly increase the computational load at the
proxies, as well as the browsing latencies observed by clients.
o More importantly, such modification is almost guaranteed to be a
violation of the content provider's copyright (the advertisement
replacement creates an unauthorised derivative work).
The better approach is for the proxies to collaborate with the origin
servers to provide local advertising; this is the approach that we
draft-gupta-efficient-ad-00.txt [Page 5]
INTERNET-DRAFT Efficiently transporting ads October 1998
will now describe in further detail.
In this collaboration, the server indicates its willingness to let the
proxy insert an advertisement locally by providing the proxy with
additional information that makes it significantly easier (and more
efficient) for the proxy to insert the advertisements locally. Now,
one option would be for the servers to return this extra information
to each and every proxy - proxies that do not understand this protocol
will simply ignore the additional information. Another option is that
when the proxy sends the initial HTTP Request message, it also informs
the server of its interest in inserting a local advertisement. A
similar approach would be for the servers to maintain lists of
proxies to which they should return this extra information (these are
the proxies that they have a business arrangement with); while this
approach also works, it does add to the initial server configuration
information. Also, in a world in which multiple proxies may share
information (for example, through ICP), the server may find it useful
to provide the additional information to proxies that do not support
local advertising, in case the page is shared with other proxies that
do.
This additional information provides the following types of
information to the proxy:
- Ad Position: This field specifies the portion of the data stream
which provides the HTML code for the advertisement. The tuple
provides this information; for example, the
value <240,74> would inform the proxy that from the
advertisement-related replaceable HTML code starts at the character
position 240 and is 74 characters long.
- Ad Dimensions: This field provides the height and the width of the
advertisement slot - for example, the tuple <46,230> specifies an
advertisement slot that is 46 pixels tall and 230 pixels wide.
- Ad Price: This field provides the amount of money that the proxy has
to pay the origin server for the privilege of inserting the local
advertisement.
A single page may contain multiple advertisement slots - the origin
sender may let the proxy replace one or more of the advertisements in
these slots; if a page contains multiple replaceable ad slots, the
origin server must provide the advertisement position, dimensions, and
the price information for all the advertisement slots.
4 Specification
4.1 Proxy to Origin Server
draft-gupta-efficient-ad-00.txt [Page 6]
INTERNET-DRAFT Efficiently transporting ads October 1998
If the proxy is interested in local advertisement insertion, it SHOULD
indicate its interest by adding the following header in the HTTP
request:
Local-ad: ad-interest=yes
4.2 Origin Server to Proxy
If the origin server is not interested in local advertisement
insertion, it SHOULD NOT provide any local-advertising related headers
in the reply.
4.2.1 Marking advertisements as replaceable
If the server is willing to accept local advertisement substitution,
it needs to indicate the presence of ad-slots. For example, the
following header:
Local-ad: replace-ad=yes
says that the proxies are allowed to replace up the advertisement
slots.
4.2.2 Ad slot
As discussed in the previous section, the origin server needs to
provide information regarding the advertisement position, dimensions
and price. The following set of headers describe a single ad slot:
Local-ad: ad-slot
Local-ad: ad-start=540
Local-ad: ad-length=72
Local-ad: ad-price=7 cents
Local-ad: ad-height=46, ad-width=230
The origin server MUST provide the ad-slot, ad-start, and ad-length
information, with the ad-slot header appearing as the first for the
series of headers that describe a particular advertisement slot. In
addition, it MUST also provide one or more ad-height and ad-width
headers. When the advertisement dimension information is provided, the
proxy MUST ensure that the inserted advertisement fits within the
provided dimensions.
It is RECOMMENDED that the server also provide the ad-price
information; the currency units (cents in this example) are OPTIONAL -
draft-gupta-efficient-ad-00.txt [Page 7]
INTERNET-DRAFT Efficiently transporting ads October 1998
if no units are provided, it is assumed that the server and the proxy
would have a-priori agreement regarding the currency unit. If the
ad-price information is provided, it MUST precede the ad-height and
ad-width headers.
A single advertisement slot can only include one value for the
ad-start and ad-length fields; however, it can include one or more
price and height-width fields. The price for a particular choice of
advertisement dimensions (ad-height and ad-width) is determined by the
last preceding ad-price header for the current advertisement slot.
If the ad-price information is not provided, it is assumed that the
server and the proxy have a-priori agreements regarding the
advertisement prices.
4.3 Multiple intermediate proxies
Any intermediate proxy MAY add the "ad-interest" header to the HTTP
request; however, an intermediate proxy MUST NOT remove this header
from an upstream HTTP request.
An intermediate proxy can try to fill in one or more advertisement
slots with its own advertisement; when it does so, it MUST correctly
adjust all other advertising-replacement related headers before
forwarding the HTTP reply to the client; a trivial adjustment would be
for the proxy to remove all other advertising-replacement related
headers.
This adjustment carries the risk that a software flaw may completely
corrupt the HTML stream - this risk is similar to the risks posed by
flaws in the server code, as well as by the flaws in other proxy
code. To aid debugging and trouble-shooting, it is RECOMMENDED that
when proxies replace advertisements, they also add the following
header:
Local-ad: ad-replacement: [, identifier]
where identifies the proxy on the Internet, and the OPTIONAL
identifier provides the code that should be provided to this host for
reporting problems, if any.
5 Discussion
5.1 HTML and XML tags
One problem with the afore-mentioned scheme is that it requires use of
new HTTP headers; one can argue, with good cause, for the use of XML
to provide the additional information in this case. There are two
draft-gupta-efficient-ad-00.txt [Page 8]
INTERNET-DRAFT Efficiently transporting ads October 1998
considerations that argue against the use of XML here:
o XML is geared towards providing additional tags needed for
information interchange among a specialised community; these local
advertising tags would apply to all content providers and all proxies
in the Internet.
o The proxies will process most, if not all, HTTP replies for
possible local advertisement insertion. If we do not use HTTP headers,
the proxies will need to parse the HTML/XML text itself - this
additional computational load will be very expensive.
Also, the content providers should not need to upgrade their server
software for them to benefit from this local advertising scheme. For
this backward-compatibility, we can provide additional tags in HTML
where the advertisements are appropriately marked and their properties
(position, dimension, price) provided. IT will be advantageous to use
XML in this case.
5.2 Page Sharing
Definitions: A "fast" hit occurs when the proxy immediately answers
the client's request (without referred back to the origin server); on
the other hand, a "slow" hit occurs when the proxy issues a
conditional GET to the origin server, and it gets a "Code 304: Not
modified" response [RFC2068].
A key motivation for this work is to enable better caching in the web
proxies. The problem with "fast" caching is that the origin server
does not even know that the the page was viewed. Also, the origin
server may wish to set a different price, or to send a different
advertisement depending on the particular viewer - how do we
accomplish that?
A simple solution would rely on "slow" hits - when the origin server
sends the data initially, it marks the pages with the cache-control
tag "must-revalidate". When the proxy checks back with the origin
server, it MAY return new advertisement headers with the reply - even
if the reply is "Code 304: Not modified; use local copy".
6 References
[RFC2068] Fielding, R., and Gettys, J., and Mogul, J., and
Frystyk, M., and Berners-Lee, T., "Hypertext Transfer Protocol --
HTTP/1.1", RFC 2068.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", RFC 2119.
draft-gupta-efficient-ad-00.txt [Page 9]
INTERNET-DRAFT Efficiently transporting ads October 1998
[RFC2227] Mogul, J. and Leach, P., "Simple Hit-Metering and
Usage-Limiting for HTTP", RFC2227.
7 Address of the Authors
Amit Gupta
Sun Microsystems
901 San Antonio Road Mail-Stop UMTV29-118
Palo Alto CA 94303
Voice: +1 650-336-4899
Fax: +1 650-969-7269
E-mail: amit.gupta@eng.sun.com
Geoff Baehr
Sun Microsystems
901 San Antonio Road Mail-Stop UMTV29-118
Palo Alto CA 94303
Voice: +1 650-336-2735
Fax: +1 650-969-7269
E-mail: geoffrey.baehr@eng.sun.com
draft-gupta-efficient-ad-00.txt [Page 10]