D. Li Cisco P. Cao Cisco M. Dahlin Univ of Texas Internet Draft Document: draft-danli-wrec-wcip-00.txt November 2000 Category: Experimental WCIP: Web Cache Invalidation Protocol Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract Cache consistency is a major impediment to scalable content delivery. This document describes the Web Cache Invalidation Protocol (WCIP) which uses invalidations and updates to keep changing objects up to date in web caches. Moreover, it allows automatic one-to-may relay and many-to-one aggregation in a CDN (content delivery network) environment. WCIP runs between the invalidation server, the participating web caches, and channel relay points (if any). An invalidation server may maintain one or more invalidation channels, each of which cover a class of related objects. E.g., the CNNfn channel may contain web articles of the day's top financial news and stock quotes. Web caches subscribe to channel(s) they are interested in, while the invalidation server(s) send out invalidations and/or up-to-date objects to the channel(s). WCIP employs heartbeats to guarantee the freshness of the cached objects even under network or server failure. Moreover, WCIP can set up channel relay points via a cache hierarchy or a CDN. A channel Li & Cao & Dahlin Experimental - May 2001 1 Draft-danli-wrec-wcip-00.txt November 2000 relay point performs application-layer multicast, i.e., channel relay (one-to-many) or channel aggregation (many-to-one). Table of Content 1. Introduction ......................................2 2. Terminology .......................................3 3. Design Issues .....................................6 3.1 Freshness Guarantee 3.2 Delivery Modes 3.3 Targeted Service 3.4 Channel Security 4. Deployment Issues ................................11 4.1 Channel Relay and Aggregation 4.2 Detect Changes 4.3 Form Channels 4.4 Join Channels 5. Protocol Overview and Message Format .............14 5.1 Channel Information ........................15 5.2 Channel Setup ..............................16 5.2.1 Registration Request 5.2.2 Channel Header 5.2.3 Body of the Registration Request 5.2.4 Registration Response 5.2.5 Body of Registration Response 5.3 Channel Messages ...........................21 5.3.1 Heartbeat Messages 5.3.2 Invalidation Messages 5.3.3 Object Update Messages 6. Security Concerns ................................23 7. References .......................................24 8. Acknowledgments ..................................24 9. Authors' Addresses ...............................25 1. Introduction In web proxy caching, a document is downloaded once from the web server to the caching proxy, which then serves the document to end- users repeatedly out of the cache. This offsets the load on the web server, improves the response time to the users, and reduces the bandwidth consumption. When the document seldom changes, everything works out wonderfully. However, the hard part is when the document is not only popular but also frequently changing, i.e., the so- called "dynamic content". Dynamic content is quickly becoming a significant percentage of the Web traffic, e.g., news and stock quotes, shopping catalog and prices, product inventory and orders, etc. Because the content is changing, the caching proxy has to frequently poll the web server for a fresh copy and still tends to return stale data to end-users. Specifically, a proxy using "adaptive ttl" is unable to ensure Li & Cao & Dahlin Experimental - May 2001 2 Draft-danli-wrec-wcip-00.txt November 2000 strong cache consistency, and yet "poll every time" is costly. So the content provider usually set a very short expiration time or mark frequently changing documents as non-cacheable all together, defeating the benefit of caching, even though they may be cached, should the proxy know when the document becomes obsolete. Addressing this problem, WCIP (Web Cache Invalidation Protocol) aims at providing freshness guarantees to the content provider and keeping the content up to date at the caching proxies. Using WCIP, a web server can advertise to caching proxies an invalidation channel that carries cache invalidations and/or object updates. A proxy choose to subscribe to the channel; once subscribed, it can start caching those otherwise non-cacheable objects and rely on the channel messages for cache consistency and object updates. An invalidation server feeds cache invalidations and/or object updates (e.g., in delta encoding [1, 2]) to the invalidation channel. It also generates heartbeats to indicate its liveliness and the channel's connectivity so that the freshness guarantee can be met even in the worst case, e.g., upon network partition or server crash. The invalidation channel can be either a persistent TCP connection from the caching proxy to the invalidation server, or a persistent connection to a channel relay point, or a single-source IP multicast channel. In the first two cases, information sent to the caching proxy can be tailored for that particular proxy. A channel registration message allows the proxy to inform the invalidation server or relay point the list of objects and the types of messages it's interested in receiving. To ensure the integrity of the invalidation channel and to avoid the denial-of-service attacks, an invalidation channel supports IP-based weak authentication and/or public-key-based strong authentication. It's up to the owner of the channel to dictate the level of security offered in a channel and up to the caching proxies to decide whether to join a channel of weak or strong security. Section 3 discusses the above designs and their rationales in more detail. Section 4 discusses issues that are out of the scope of WCIP but important end to end, including issues on channel relay points, how to detect changes, how to form a channel, and how to decide to join a channel. Section 5 describes the protocol in terms of sequence of events and lays out the formal syntax of the protocol. 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [3]. Since WCIP makes some extensions to HTTP, please refer to RFC-2616 [4] for HTTP related terminology. Following are WCIP related terms. Li & Cao & Dahlin Experimental - May 2001 3 Draft-danli-wrec-wcip-00.txt November 2000 Cache Invalidation A HTTP message of PURGE method, causing an cached object of a certain URL and/or Etag (or Last-Modified) being marked as stale. Also called "invalidation". A batch-mode invalidation message is described in Section 5.3.1, which can invalidate multiple objects in one message. Object Update A HTTP message of PUT method, causing an cached object of a certain URL and/or Etag (or Last-Modified) being replaces by the new content. If the message uses delta encoding, the cached object is updated by the delta. The old content is discarded. Also called "update". Invalidation Channel A transport abstraction that moves cache invalidations and/or object updates from the invalidation server to the channel subscribers. An invalidation channel carries information on multiple web objects. Also called "channel". Channel Information Information on the name of the channel, the address (domain name) of the invalidation server, and the channel properties such as the heartbeat interval, the security mode, and whether supporting targeted invalidation, etc. Invalidation Server An application program that sends cache invalidations and/or object updates, as well as heartbeats, to the invalidation channel. (The invalidation server logically differs from the origin server because a cache may fill a request from a CDN content server or a replica origin server. The cache may not be able to tell these various sources from the origin server. Besides, the WCIP service may not reside on each or any of them. "invalidation server" uniquely identifies the source of the WCIP service.) Invalidation Client A web cache (usually a proxy) that subscribes to an invalidation channel and follows the semantics of the channel. Also referred to as the "proxy". Channel Replay Point An intermediary program that subscribes to one or multiple invalidation channels on behalf of its clients (e.g., downstream Li & Cao & Dahlin Experimental - May 2001 4 Draft-danli-wrec-wcip-00.txt November 2000 proxies) and relay the channel messages to its clients. It MUST implement both the invalidation server and the invalidation client. Cache Consistency A property that the replica data item reflects its master copy in a certain fashion. There are at least 3 fashions. (1) Strong consistency -- the replica must be always the same as the master. (2) Delta consistency -- the replica must become the same as the master at most "delta" seconds after the master is updated. (3) Eventual consistency -- the replica must become the same as the master at some unknown point in the future. Freshness Guarantee A promise that a proxy subscribed to an invalidation channel will not service content (belonging to that channel) from the cache after X seconds of known or presumed update at the origin server, where X is specified by the content provider. In other words, a proxy subscribed to an invalidation channel never delivers cached content (belonging to that channel) that is more than X seconds stale, regardless of network partition, proxy failure, or server failure. A freshness guarantee provides "delta consistency" and also allows "eventual consistency". Invalidation Latency The time between an object is updated at the origin server to the time it's invalidated at all the participating proxies. The goal of a freshness guarantee of X seconds is to guarantee that the invalidation latency is within X seconds at all times. Channel Heartbeat A periodic channel message to keep the channel from being silent for too long. It allows the invalidation client to verify the channel connectivity and source liveliness. Also called "heartbeat". Heartbeat Interval A property of the invalidation channel. The invalidation server sends heartbeat to the invalidation channel if the channel is silent for the last heartbeat interval. Last Channel Active Time The time when the invalidation client receives the last message from the invalidation channel. Content Delivery Network (CDN) A self-organizing network of geographically distributed content delivery nodes (reverse proxies) for contracted content providers, Li & Cao & Dahlin Experimental - May 2001 5 Draft-danli-wrec-wcip-00.txt November 2000 capable of directing requests to the best delivery node for global load balancing and best client response time. 3. Design Issues Before we talk about the specifics, let's lay out some design principles this protocol tries to follow: (1) Simple and effective: we try to design a lightweight client and leave complexity to the server, then use multicast and/or targeted service to address the server scalability. We also try to leverage off-the-shelf components as much as we can. Example may include HTTP, SSL, PGP, XML, etc. (2) Logical separation of the invalidation server and the origin server: this is because WCIP needs to work with CDNs and distributed data centers. There may be multiple authoritative sources of an object. "Invalidation server" uniquely identifies where the invalidation source is, not where the content initially is fetched from. It also allows for delegation of invalidation service to a 3rd party, possibly a CDN provider. (3) Clear separation of the notification transport and the notification semantics: WCIP includes a transport abstraction and then the cache consistency semantics. Keeping the transport related items in the message header and leaving the notification semantics in the message body will make the protocol clearly layered, much more understandable, and extensible. Moreover, the message body is specified in XML, making the protocol extensible to other types of notifications. 3.1 Freshness Guarantee WCIP cannot simply provide best-effort invalidation. That equates to weak cache consistency and as a result the content providers would still mark their dynamic content as non-cacheable all together. It's important that WCIP guarantees that, in the worst case, a proxy subscribed to an invalidation channel will not service stale content X seconds after the content is updated at the origin server, regardless of network partition or server failure. The content provider can specify the value of X, e.g., to 5 minutes. In the normal case, this is not hard. Using WCIP, the proxy will not deliver any stale object as soon as an invalidation arrives. The invalidation latency only depends on network propagation and queuing delay, which are typically within a second. In other cases, however, when the network or the invalidation server is down, invalidations cannot reach the proxy timely. To ensure an upper bound on the invalidation latency, the proxy must invalidate content automatically if the invalidation channel has been silent for a certain period, assuming the worst case. At the same time, the Li & Cao & Dahlin Experimental - May 2001 6 Draft-danli-wrec-wcip-00.txt November 2000 invalidation server periodically sends heartbeat messages to prevent the channel from being silent for too long, as long as there is no network partition or server failure. Specifically, to control the freshness, the content provider specifies a "heartbeat interval" for the invalidation channel and a "freshness guarantee" for each object covered by that channel. Whenever some message arrives from the invalidation channel (invalidations, updates, or heartbeats), the proxy records the arrival time as the "last channel active time" (see Section 5.3 for clock skew adjustment). Upon serving a client HTTP request, the proxy can use the cached object only if the time elapsed since the last channel active time is smaller than the object's freshness guarantee. Otherwise, the cached object has expired and MUST NOT be served from cache without revalidation. The proxy is RECOMMENDED not to remove the object right away as revalidation may turn out to be "Not Modified". Therefore the freshness guarantee for an object is the freshness guarantee specified by the content provider, plus the normal network propagation and queuing delay (which cannot be escaped even with polling every time). Note that it's advisable to have the channel heartbeat interval smaller than any of the freshness guarantees of the objects covered by that channel, so that in the normal case (which is the common case), no cached objects expire unnecessarily. WCIP uses freshness guarantee to provide "delta consistency". Nonetheless, within the framework of WCIP, a more relaxed form of consistency -- "eventual consistency" can also be supported, e.g., when the freshness guarantee is set to be much larger than the typical object modification interval or even set to infinite. Then WCIP is similar to best-effort invalidation delivery and is subject to network and server failures. But with the reliable but possibly delayed delivery of invalidations, the caches achieve eventual consistency. The content provider can control the level of consistency simply by controlling the value of "freshness guarantee" for each object. 3.2 Delivery Mode The invalidation channel is a transport abstraction. A channel MUST maintain the invariant that the invalidation client MUST never receive a heartbeat without first receiving all preceding invalidations sent to it. The actual delivery of WCIP messages can be any one of three modes: unicast, application-layer multicast, or IP multicast. WCIP is built on top of a reliable transport layer. For unicast, the recommended protocol is TCP because it's reliable, well understood and widely used in HTTP [5]. For IP multicast, there are experimental reliable protocols such as MFTP, RAMP, SCE, SRM, RMTP, etc., but we have no recommendation at this time. Li & Cao & Dahlin Experimental - May 2001 7 Draft-danli-wrec-wcip-00.txt November 2000 3.2.1 unicast The first mode is for the invalidation client to maintain a persistent TCP connection with the invalidation server. Using a persistent connection avoids the TCP set-up latency and cost for every channel message. It also ensures message sequencing and reliability so is better than UDP-based invalidation, too. Once a TCP connection is set up, all channel messages are sent on this connection in a first-come-first-serve order. Whenever the TCP connection is broken, the invalidation client MUST re-synchronize the cache consistency state by the re-registering with the invalidation server (see section 5.2). Using unicast, the invalidation server should be aware of its scalability limitations. E.g., suppose a machine is able to support at most 20000 concurrent persistent connections. Then that machine being an invalidation server can support at most 20000 invalidation clients. Moreover, if it takes 1ms to send out one invalidation message, then the invalidation latency is at least 20 seconds, even under the best network condition. Application-layer multicast alleviates this problem. 3.2.2 application-layer multicast In application-layer multicast, the invalidation client maintains a persistent TCP connection with a channel relay point, which relays messages from the origin invalidation server. The relay point may have multiple clients subscribed to the same invalidation channel. It will in turn only subscribe once to the original invalidation server. By multiplicatively relaying channel messages, it reduces the load on the origin invalidation server and help scale the invalidation channel end-to-end. For the invalidation client, this mode is no different from the unicast mode. When it looks up the address of the invalidation server, the DNS server of a CDN returns the IP address of a local channel relay point. Or, when it connects to the invalidation server, the invalidation server replies with a redirect message, causing it to connect to a relay point. Above is the one-to-many scenario. There is also the many-to-one scenario where the relay points aggregates multiple channels into one channel for its clients, reducing the number of TCP connections at the relay point. Section 4.1 has more detail. 3.2.3 IP multicast In this mode, an IP multicast group is allocated for the invalidation channel and its address is advertised as part of the channel information. The invalidation client subscribes to this multicast group to receive cache invalidations and/or object updates. Ideally, it's a single-source multicast group, meaning that the invalidation client subscribes to the sender and group address Li & Cao & Dahlin Experimental - May 2001 8 Draft-danli-wrec-wcip-00.txt November 2000 pair , where S is the invalidation server address and G is the multicast group address. To ensure sequencing and reliability, WCIP may need to run on top of a reliable multicast protocol. IP multicast removes the scalability concern at the invalidation server in that the invalidation server now only needs to send one copy of any message. Plus, it doesn't maintain per-client state. A multicast invalidation channel is much more efficient than unicast- based cache consistency schemes. Worth noting, however, anything the invalidation server sends to the invalidation channel goes to every subscriber. Therefore, objects covered by a multicast invalidation channel need to be correlated in that if an invalidation client is interested in or has cached some objects of the channel, it's highly likely that it'll cache the other ones. For example, CNNfn top stories should belong to one channel while ESPN top stories belong to another. Void of an off-the-shelf real-time reliable multicast protocol, a possible IP-multicast deployment scenario would be: (1) The invalidation server marks invalidation channel messages (except heartbeats) with incrementing sequence numbers; (2) Whenever the invalidation client sees a sequence number gap, it considers itself lost synch. with the channel and revert to following the normal HTTP Cache-control directives. (3) The invalidation server periodically multicasts out re- synchronization data (e.g., a list of objects' current Etags) to allow invalidation clients re-synchronize the object freshness state. (4) Once the invalidation client resynchronizes the freshness state of certain objects, it switches those objects from HTTP cache- control back to WCIP freshness guarantees. (5) As more re-synchronization messages arrive, the invalidation client gradually reinstates all its objects back to WCIP freshness guarantees. In fact, a cache proxy may join the multicast channel and become gradually synchronized this way without ever directly contacting the invalidation server via unicast. (6) Any IP-multicast reliability enhancing techniques (e.g., PGM) can be layered underneath WCIP to reduce losses and provide better WCIP performance. 3.3 Targeted Service In the unicast and application-layer multicast delivery modes, each invalidation client has its own TCP connection with the invalidation server (or relay point). Therefore it's possible for the invalidation server (or replay point) to track what objects each invalidation client has and thus send invalidations and/or updates targeted to each invalidation client. When an invalidation client connects to an invalidation server (or relay points), it can either indicate interest in all objects of the Li & Cao & Dahlin Experimental - May 2001 9 Draft-danli-wrec-wcip-00.txt November 2000 channel or send over a list of URLs of which it wishes to receive targeted invalidation and updates. This is called "channel registration". Later on, the invalidation client can send incremental messages to remove or add items to the list. Whether an invalidation channel supports targeted service is a property of the channel. To support it, the invalidation server (or relay point) has to keep state of each client. To prevent the state from lingering after a client crashes, the invalidation server specifies a "channel lifetime", after which the invalidation client is required to refresh its state at the invalidation server through re-registration. In IP multicast mode, the invalidation server cannot keep per-cache state because IP multicast goes to every subscriber. Therefore, it's better to use the IP multicast delivery mode for highly popular and highly correlated content. It's also shown that this delivery mode works better if the channel carries not only cache invalidations but also object updates [6]. The information on targeted service can be a list of URLs and their Etag and Last-Modified times. However, more sophisticated target information can be exchanged between the invalidation client and server as long as they both can interpret it. E.g., instead of URLs, they can exchange URL regular expressions, or event dependencies such as "event A with parameter m corresponds to invalidating URLs x(m), y(m), z(m)", etc. In the future, WCIP will specify and carry such notifications. 3.4 Channel Security Web caches are known for not being paranoid about security in that (1) they trust the DNS name lookup results, (2) they trust the source IP address of a HTTP response. For example, the CERT fix against cache pollution on a transparent cache is to do a DNS name lookup to see if the source IP address of the HTTP response matches the hostname specified in the Host header of the HTTP request. In essence, web caches tend to trust the network infrastructure. If one can spoof IP addresses or poison DNS caches, one can poison web caches. In contrast, content providers tend to be concerned about content integrity, besides freshness. With WCIP, web caches should also be concerned about the denial-of-service attack where the malicious keeps invalidating objects in a cache, preventing the cache from doing real work. To accommodate the various security needs of the invalidation servers and clients, WCIP provides three channel security modes: (1) IP-based weak security, i.e., the invalidation client accepts a channel message if the source IP address of the invalidation message matches the invalidation server name. Li & Cao & Dahlin Experimental - May 2001 10 Draft-danli-wrec-wcip-00.txt November 2000 This is for those invalidation server and clients that both do not need strong security. (2) Public-key-based strong security with mandatory verification, i.e., the invalidation client obtains the public key of the channel during channel registration. The invalidation server signs the channel messages with the channel's private key. The invalidation client MUST verify the signature and discard the message if the signature doesn't match. This is when the invalidation server requires strong security for the channel. The invalidation clients have to comply. For unicast, the channel can simply be a SSL connection as in HTTPS. To prevent intermediate node from tampering with the channel information in the first place, the domain name of the channel MUST be identical to that of the object's origin server. Upon channel setup, the origin server MAY then redirect the invalidation client to the true invalidation server via HTTPS. See Section 5.1. (3) Public-key-based strong security with optional verification, i.e., the invalidation client obtains the public key of the channel during channel registration. The invalidation server signs all the channel messages with the channel's private key. However, the invalidation client can choose to verify either the signature (strong) or the source IP address (weak). This is when the invalidation server doesn't need strong security but wants to accommodate both clients that need and need not strong security. The authors cannot determine the necessity of this third option. Option 1 and 2 may be easier to support because they fit in the HTTP and HTTPS model well. The above public-key solution ensures message integrity. To guard against message replay attacks, the Etag or Last-Modified of the updated object has to be part of the invalidation material. 4. Deployment Issues 4.1 Channel Relay and Aggregation For one invalidation channel, there may be tens of thousands of invalidation clients. Some of them may not maintain a direct persistent connection to the invalidation server, but rather to a channel relay point, which then connects to the invalidation server or another relay point upstream. The role of a channel relay point is to improve the scalability of a unicast-based channel using application-layer multicast, It can perform one-to-many channel relay and/or many-to-one channel aggregation. (1) Channel Relay Li & Cao & Dahlin Experimental - May 2001 11 Draft-danli-wrec-wcip-00.txt November 2000 The relay point has multiple clients but maintains only one connection to the invalidation server. See the example below. Invalidation Server | | conn0 | | Channel Relay Point / | \ / | \ conn1 / conn2| \ conn3 / | \ / | \ Client1 Client2 Client3 Messages received on connection "conn0" are relayed immediately to "conn1", "conn2" and "conn3", after proper filtering for targeted service. (2) Channel Aggregation A relay point supports not only multiple clients but also multiple channels. Channel aggregation reduces the number of channels (i.e., TCP connections) the replay point has to maintain with each client. See the example below. Server1 Server2 Server3 \ | / conn1 \ |conn2 / conn3 \ | / \ | / Channel Relay Point / | \ / | \ conn4 / conn5| \ conn6 / | \ / | \ Client1 Client2 Client3 Instead of maintaining 3 connections to each of its clients, it represents one aggregated channel (and hence one connection) to each of its clients. Messages of all three channels are relayed to the same aggregated channel. The aggregate channel also has its own heartbeat. If one of the upper channels is down, the relay point removes objects of that upper channel from the aggregated channel. A channel relay point can be set up via a cache hierarchy or a CDN. Specifically, an invalidation client can discover and then connect to the relay point in one of the following ways. Li & Cao & Dahlin Experimental - May 2001 12 Draft-danli-wrec-wcip-00.txt November 2000 (1) The origin server or replica origin server, being part of a CDN, returns channel information with the relay point's channel information. (2) The relay point, being a configured outgoing proxy to a potential invalidation client, intercepts and replaces the channel information from the origin server with its own information. Note that this option, if applied by a transparent interception proxy, may be detrimental to security. (3) When the invalidation client does DNS name lookup of the channel source name, the DNS server of a CDN returns the IP address of a local channel relay point. (4) When the invalidation client connects to the invalidation server, the invalidation server replies with a redirect message pointing to a channel offered by the relay point; then the invalidation client initiates registration to the relay point. 4.2 Detect Changes Detecting changes is the job of the origin server and/or invalidation server. Web content may change because of updates from the content owner or updates from the content viewer. E.g., the content owner CNN.com updates its front page every 15 minutes, while Ebay updates its content whenever its customers post new auction items or bids. Therefore, changes may be detected in 4 ways. (1) When the script runs that generates content and updates the web source file (e.g., a news article is updated with the latest financial information), the script notifies the invalidation server which then sends out invalidations or delta-encoded updates to all participating caches. (2) When a piece of data in the database is modified via the database interface (e.g., the addition to the inventory of books), a database trigger notifies the invalidation server the event. (3) When a HTTP request comes in (e.g., a POST request to add a new auction item), the origin server or its surrogate (reverse proxy) notifies the invalidation server the event. (4) The last but simplest way is for the invalidation server to poll the origin server periodically to find out if the object has changed. Given that there is only one invalidation server polling, the poll frequency can be very high, e.g., once every minute, offering decent cache consistency as well. There are companies that are in the business of providing change detection of web content. In some cases, an event described above may invalidate multiple URLs. Sometimes the participating cache may have the ability to interpret such events. To accommodate such cases, the invalidation message may carry the description of the event, besides carrying a list of invalidated URLs. Li & Cao & Dahlin Experimental - May 2001 13 Draft-danli-wrec-wcip-00.txt November 2000 4.3 Form Channels An invalidation channel may carry invalidations and updates for multiple objects. With targeted services, the participating proxy receives messages only for the ones that it's interested in. In this case, the sole purpose of having multiple objects share the same channel is to reduce the cost of channel management. More objects per channel, less costs for TCP connections and heartbeats. But with more objects per channel, there will be more cost for each client's targeted service states. Moreover, targeted services may not always be available. E.g., the invalidation server may be so overloaded with client connections and per-client states that it decides to no longer provide targeted services to some clients. Besides, when the invalidation channel runs on top of IP multicast, there is intrinsically no targeted service. Therefore, a channel needs to have a reasonable number of objects and preferably correlated objects, especially in the absence of targeted service. This strategy for forming channels is similar to forming TV channels. This way, if a proxy is interested in some of the channel objects, it's highly likely that it is or will be interested in the other objects in the channel as well. Example channels include CNNfn, ESPN, NAB, etc. 4.4 Join Channels The decision to join a channel can be either (1) configured by the caching proxy's administrator, (2) instructed by the CDN that the proxy is part of, or (3) dynamically decided. It's not the job of this protocol to specify the decision algorithms but there are some common sense ones. E.g., join a channel when the proxy has cached M objects belonging to that channel, or when the proxy has received N requests to objects belonging to that channel. The proxy's administrator can configure M and N. Moreover, the proxy can employ a heuristic [7]: add an object for targeted service only if (1) it is cached and (2) a subsequent request does use the cached copy without discovering it expired or modified. This heuristic avoids objects that either are not very popular or are modified more frequently than accessed, despite it being cached in the mean time. This guideline can be applied to calculating M and N, too. 5. Protocol Overview and Message Format Following is a brief description of the WCIP protocol in the most common and simple case: 1) In a normal HTTP request-&-response exchange, a caching proxy obtains invalidation channel information from the HTTP response Li & Cao & Dahlin Experimental - May 2001 14 Draft-danli-wrec-wcip-00.txt November 2000 header "Invalidated-By", returned by the origin server or its surrogate. 2) To join the channel, the caching proxy establishes a persistent TCP connection with the invalidation server. If the channel-URI specifies "wcips", the persistent connection is run on top of SSL, similar to HTTPS. 3) A registration request is sent to the invalidation server on the persistent connection. In the body of the registration message, the proxy specifies the targeted service by listing the URLs it's interested in. 4) The invalidation server responds 200 OK, plus specifying the lifetime of the channel registration. In the body of the response, for each URL in the targeted service, the invalidation server returns the latest Etags and Last-Modified time, allowing the invalidation client to instantly validate and/or invalidate all the listed URLs. 5) The invalidation server sends invalidations and heartbeats to the invalidation client for the lifetime of the registration. At the end of the lifetime, the invalidation client registers again. The invalidation server can set the lifetime to 0 to avoid ever actively sending anything to the invalidation client, under which case, the invalidation client simply uses the registration message every time to validate a list of URLs, the so called "volume validation" [8]. 6) If the caching proxy doesn't hear from the invalidation server when an object is being requested after the cached copy expired, it sends another registration request on the same TCP connection to the invalidation server. Either it gets back a registration response and thus is able to validate or invalidate the object plus a list of other objects right away, or it times out waiting for a response and concludes the invalidation server or network is down. The following syntax specification uses the augmented Backus-Naur Form (BNF) as described in RFC2234 [9]. Symbols already defined in RFC 2616 or RFC 2396 are not repeated here. 5.1 Channel Information In a normal HTTP request-&-response exchange, a caching proxy obtains invalidation channel information from the HTTP entity headers "Invalidated-By" and "Channel-Object". Invalidated-By = "Invalidated-By" ":" Channel-URI Channel-URI = wcip-channel | wcips-channel wcip-channel = "wcip:" "//" host ":" port "/" channel-name wcips-channel = "wcips:" "//" host ":" port "/" channel-name channel-name = token Li & Cao & Dahlin Experimental - May 2001 15 Draft-danli-wrec-wcip-00.txt November 2000 Here "host" is the hostname or ip address of the invalidation server. Note that "port" field is mandatory for now, until we agree on and obtain the default ports. Channel-Object = "Channel-Object" ":" 1#Object-info Object-info = Object-name | Fresh-guarantee Object-name = "name" "=" quoted-string Fresh-guarantee = "fresh" "=" delta-seconds Here, Fresh-guarantee is the freshness guarantee of the object. Object-name uniquely identifies an object within the invalidation channel. A client may use different URLs to request the same object. Object-name ensures that responses to these multiple URLs are cached under the same Object-name. Example: Invalidated-By: wcip://www.cnn.com:777/allpolitics Channel-Object: name="cnn/allpolitics/obj1", fresh=120 If the cache decides to join the invalidation channel, it SHOULD ignore the normal HTTP Cache-Control directives for objects covered by the channel, such as no-store, expires, and max-age. But the cache SHOULD still honor directives such as "private". (More on this to come.) To prevent intermediate nodes from tampering with the channel information in the first place, the domain name of a wcips channel MUST be identical to that of the object's origin server (as in the object URL or Host header). Upon channel registration, the origin server MAY then redirect the invalidation client to the true invalidation server via HTTPS. See also Section 4.1. 5.2 Channel Setup To participate in a channel, the proxy establishes a persistent TCP connection with the invalidation server. If the channel-URI specifies "wcips", the persistent connection is run on top of SSL, similar to HTTPS. Once the connection is established, all channel messages are sent on this connection to ensure relative sequencing. In essence, a channel MUST maintain the invariant that the invalidation client MUST never receive a heartbeat without first receiving all preceding invalidations sent to it. The invalidation server MAY close the connection any time. When the connection is broken unexpectedly, the invalidation client SHOULD re-register in order to re-synchronize the object freshness state. 5.2.1 Registration Request Li & Cao & Dahlin Experimental - May 2001 16 Draft-danli-wrec-wcip-00.txt November 2000 On the persistent connection, the proxy sends a registration request as follows: Registration = wcip-method SP channel-URI SP WCIP-Version CRLF wcip-method = "POST" WCIP-Version = "WCIP" "/" 1*DIGIT "." 1*DIGIT Example: POST wcip://www.cnn.com:777/allpolitics WCIP/0.1 POST wcips://www.cdn.com:888/amazon WCIP/0.1 If Content-Length is not 0, the request contains a body to specify targeted service parameters and the invalidation server MUST only send messages for objects that the proxy specifically requested. Otherwise, the invalidation server assumes that the invalidation client accepts any types of message. 5.2.2 Channel Header The request MUST include a general header "Channel" to specify the desired channel lifetime, heartbeat interval, and message syntax. Right now, we specify one syntax type "ObjectList". Going forward, more sophisticated syntaxes will be developed. Channel = "Channel" ":" 1#Channel-info Channel-info = registration-lifetime | heartbeat-interval | no-target | message-syntax | seqence-number registration-lifetime = "life" "=" delta-seconds heartbeat-interval = "heartbeat" "=" delta-seconds no-target = "no-target" message-syntax = "syntax" "=" Syntax-name sequence-number = "seqno" "=" 1*DIGIT Syntax-name = "ObjectList" | token ; extension syntaxes By default, targeted service is provided. "no-target" indicates that targeted service is not requested or provided. "sequence-number" is only for use by the invalidation server in the IP-multicast deployment scenario (see Section 3.2.3). Example: POST wcip://cdn.net:88/ch1 WCIP/0.1 Date: Sat, 09 Sep 2000 01:27:36 GMT Connection: keep-alive Channel: life=36000, heartbeat=120, syntax=ObjectList Content-Length: 577 5.2.3 Body of the Registration Request Li & Cao & Dahlin Experimental - May 2001 17 Draft-danli-wrec-wcip-00.txt November 2000 Then, in the body of the registration message, the proxy specifies "exclude" and/or "include" of URLs. We define the message syntax in XML because it's easy to implement and provides extensibility. Following is the XML DTD for the message body: The "ObjectList" element contains one or more "action" elements, which each contain a list of objects. The registration request body uses the "action op" attribute to include or exclude objects for targeted service. Later the registration response body uses the "action state" attribute to indicate the freshness state of the object. The response may also use the "redirect" element and its "to" attribute to inform the invalidation client another WCIP channel URL, which carries the objects listed within the "action" element. Within each "action" element, each "object" element describes an object's unique name in the channel, its freshness guarantee (in seconds), whether updates are allowed for the object, the URL of the object, its last-modified time, and its Etag. Note that either "name" or "url" attribute MUST be present in the "object" element. The "last-modified" and "etag" attributes SHOULD be present so that the invalidation server knows whether to invalidate the object. Example: If no "update=yes" is given, the invalidation server MUST NOT send any object update to the invalidation client. If the invalidation client is being redirected to the current server from a different channel URL, the invalidation client SHOULD include the "redirect" element and its "from" attribute so that the invalidation server knows what additional channel it needs to subscribe to on behalf of the invalidation client. If the invalidation client caches multiple objects under different URLs but the same Object-name, the invalidation client SHOULD send only one "object" element for them, specifying one of the URLs and the common Object-name. Later, when invalidations come, all copies under that Object-name MUST be invalidated. After establish the invalidation channel, the invalidation client can send "increment" registration requests to include or exclude additional objects from the targeted service, provided that the registration is sent on the same persistent connection. Being able to use the same connection means that the last registration has not yet timed out and the invalidation server still maintains the invalidation client's targeted service state (if any). 5.2.4 Registration Response The invalidation server responds with 200 OK if it's able to accept the connection. It MUST include the "Channel" header to specify the registration lifetime. This is especially important in case of targeted service because it allows the invalidation server to release the targeted-service state and close the connection after the lifetime expires. The lifetime MUST be no bigger than that specified in the registration request (if any). The invalidation server also indicates and commits to a heartbeat interval that is preferably smaller than the lowest object freshness guarantee. For example: WCIP/0.1 200 OK Date: Sat, 09 Sep 2000 01:27:36 GMT Channel: life=18000, heartbeat=200, syntax=ObjectList Li & Cao & Dahlin Experimental - May 2001 19 Draft-danli-wrec-wcip-00.txt November 2000 In the case of a CDN, the invalidation server MAY return a redirect message, as described in RFC2616. For example: WCIP/0.1 305 Use Proxy Date: Sat, 09 Sep 2000 01:27:36 GMT Location: wcip://proxy9.cdn.net:888/allsports Otherwise, if the invalidation server cannot accept the registration, it returns an appropriate error status code as described in RFC2616. Before a successful registration response arrives, the invalidation client MUST treat the cached object according to their HTTP Cache- Control directives. In the IP-multicast deployment scenario, the registration response MUST specify the "sequence-number" in the "Channel" header (see Section 3.2.3). The "sequence-number" MUST be included in every channel message and incremented for every channel message other than the heartbeat messages. 5.2.5 Body of the Registration Response Along with a 200 OK response, the invalidation server sends back the freshness information of all the objects which the invalidation client is registered to receive. This information is sent in the response body, in the syntax indicated by the "syntax" directive of the "Channel" header in the response. Following is in XML ObjectList syntax as defined by the XML DTD in the previous section. Example: Li & Cao & Dahlin Experimental - May 2001 20 Draft-danli-wrec-wcip-00.txt November 2000 The attribute state="fresh" indicates that the object is fresh based on the invalidation client's targeted service information in the registration request body. State="stale" means the object is stale based on that information. State="unknown" means that there is not enough information for the invalidation server to tell if the object is fresh or stale. In any case, the invalidation server MUST specify the "last- modified" and/or the "etag" attributes so the invalidation client can verify or infer whether an object expired and what's the new Etag and Last-Modified time. If the invalidation client asks for targeted service for some objects this channel does not cover, the invalidation server MUST respond with op="exclude" to reject it. Plus, it MAY use "redirect to" element to tell the invalidation client where to get WCIP service for those objects listed within the "action" element. If there is targeted service, only objects specified in the targeted service (in the registration request body) are returned in the registration response body. Otherwise, all objects covered by the invalidation channel are returned in the response body. For the invalidation client, only objects returned in the response body without op="exclude" can be considered covered by the invalidation channel. If later the channel stops covering a particular object, the invalidation server MUST send an invalidation message with op="exclude" in the message body to exclude that object from the invalidation clients' understanding. 5.3 Channel Messages During the lifetime of the channel, the invalidation server sends invalidations, object updates, and heartbeats to the channel. All the channel messages MUST include a "Channel" header, where the channel "lifetime" reflects the lifetime remaining for the channel. When the invalidation client receives any of the above messages, all the objects covered by the invalidation channel are revalidated, under the assumption that, otherwise, it would have received an invalidation already, since the network and the invalidation server are up. Revalidation means that, from the time of the message till "freshness guarantee" seconds later, the object can be served from the cache. The freshness guarantee is specified in the "Invalidated- By" header as well as in the ObjectList. The time of the message is indicated in "Date" header of the message. Li & Cao & Dahlin Experimental - May 2001 21 Draft-danli-wrec-wcip-00.txt November 2000 To account for clock skews between the invalidation server and client, the invalidation client can convert the invalidation server's "Date" into its own time using the delta computed from the registration request and response. For example: suppose t1 is the "Date" in the invalidation client's last registration request, t2 is the "Date" in the invalidation server's registration response, and t3 is the "Date" of the current channel message from the invalidation server. Then, the invalidation client MAY serve the object from the cache until time "t1 + (t3 - t2) + freshness guarantee". 5.3.1 Invalidation Messages An invalidation message can be either on a single object or in a batch for multiple objects. Batch invalidation is used when a number of objects become stale within a short period (e.g., 1 minute). Otherwise, single-object invalidation MAY be used. Single-object invalidation is a PURGE message similar to that defined for HTTP 1.1 DELETE, except it applies to caching proxies only. Batch invalidation uses a POST message and puts the object freshness information in the message body. The syntax of this information is specified by the "syntax" directive of the "Channel" header in the registration response. Following is an example in ObjectList syntax: POST wcip://cdn.net:88/ch1 WCIP/0.1 Date: Sat, 09 Sep 2000 01:27:36 GMT Connection: keep-alive Channel: life=10000, heartbeat=120, syntax=ObjectList Content-Length: 127 When an object is invalidated, all cached copies under the same Object-name but different URLs MUST be invalidated as well. Li & Cao & Dahlin Experimental - May 2001 22 Draft-danli-wrec-wcip-00.txt November 2000 The invalidation client MUST respond to the invalidation message with success or error status codes, as defined in HTTP 1.1. In the IP-multicast deployment scenario, the invalidation message MUST specify the "sequence-number" in the "Channel" header (see Section 3.2.3). The "sequence-number" MUST be incremented. 5.3.2 Object Update Messages If the invalidation client specifies "update=yes" for an object, the invalidation server MAY send a PUT message to the invalidation client whenever the object changes. The invalidation client then replaces the cached copy with this up-to-date copy. The syntax and process of this update message follows that defined for HTTP 1.1 PUT message. The invalidation client MUST respond to the invalidation message with success or error status codes, as defined in HTTP 1.1. If both sides support delta encoding, it's preferable to send delta encoding as described in J. Mogul's Internet Draft. In the IP-multicast deployment scenario, the update message MUST specify the "sequence-number" in the "Channel" header (see Section 3.2.3). The "sequence-number" MUST be incremented. 5.3.3 Heartbeat Messages The invalidation server sends a heartbeat to the invalidation client whenever the channel is silent for a period equal to the heartbeat interval. The heartbeat is a POST message. Example: POST wcip://www.cdn.com/amazon WCIP/0.1 Date: Sat, 09 Sep 2000 01:27:36 GMT Connection: keep-alive Channel: life=24000, heartbeat=120, syntax=ObjectList Content-Length: 0 The invalidation client MUST respond to the invalidation message with success or error status codes, as defined in HTTP 1.1. In the IP-multicast deployment scenario, the heartbeat message MUST specify the "sequence-number" in the "Channel" header (see Section 3.2.3). The "sequence-number" MUST be included but NOT incremented. Moreover, the heartbeat message MUST include an ObjectList message body to allow re-synchronization of object freshness state. The body length should not exceed the common MTU of all invalidation clients. 6. Security Considerations Li & Cao & Dahlin Experimental - May 2001 23 Draft-danli-wrec-wcip-00.txt November 2000 See Section 3.4 on the options for a secure invalidation channel. See Section 5.1 for the syntax for a secure invalidation channel that runs on top of SSL. For security reasons, it may be detrimental for a transparent interception proxy to replace the channel information from the origin server with its own information. See also Section 4.1. 7. References 1 Mogul, J.C.; Douglis, F.; Feldmann, A.; Krishnamurthy, B., "Potential benefits of delta encoding and data compression for HTTP", ACM SIGCOMM 97 Conference. 2 Mogul, J.C., etc., "Delta encoding in HTTP", Internet Draft, http://www.ics.uci.edu/pub/ietf/http/draft-mogul-http-delta- 02.txt 3 Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 4 R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 5 Cao, P.; Liu, C.; "Maintaining strong cache consistency in the World Wide Web" 17th International Conference on Distributed Computing Systems. 27-30 May 1997. IEEE Transactions on Computers (April 1998) vol.47, no.4 p. 445-57 6 D. Li and D. R. Cheriton. "Scalable Web Caching of Frequently Updated Objects using Reliable Multicast", 2nd USENIX Symposium on Internet Technologies and Systems (USITS'99). October 1999. 7 Dilley, John; Arlitt, Martin; Perret, Stephane; Jin, Tai. "The Distributed Object Consistency Protocol", HP Labs Technical Report, http://www.hpl.hp.com/techreports/1999/HPL-1999-109.html, September 1999. 8 Yin, J.; Alvisi, L.; Dahlin, M.; Lin, C.; "Using leases to support server-driven consistency in large-scale systems" Proceedings of 18th International Conference on Distributed Computing Systems. 26-29 May 1998. p. 285-94 9 Crocker, D. and Overell, P.(Editors), "Augmented BNF for Syntax Specifications: ABNF", RFC 2234, Internet Mail Consortium and Demon Internet Ltd., November 1997 8. Acknowledgments Li & Cao & Dahlin Experimental - May 2001 24 Draft-danli-wrec-wcip-00.txt November 2000 The draft greatly benefited from the valuable comments from Ian Cooper, Mark Nottingham, and Carl Sutton. 9. Author's Addresses Dan Li Cisco Systems, Inc. Email: lidan@cisco.com Pei Cao Cisco Systems, Inc. Email: cao@cisco.com Mike Dahlin University of Texas Email: dahlin@cs.utexas.edu Li & Cao & Dahlin Experimental - May 2001 25 Draft-danli-wrec-wcip-00.txt November 2000 Full Copyright Statement "Copyright (C) The Internet Society (date). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implmentation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into Li & Cao & Dahlin Experimental - May 2001 26