Internet Draft Ingrid Melve Expires: December 1999 UNINETT Informational Gary Tomlinson WREC Working Group Novell Ian Cooper Mirror Image Internet June, 25 1999 Internet Web Replication and Caching Taxonomy draft-ietf-wrec-taxonomy-01.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This memo specifies standard terminology and the current taxonomy of web replication and caching infrastructure deployed today. It introduces standard concepts and protocols uses today within this application domain. Currently deployed solutions employing this technologies are presented to establish a standard taxonomy. Research issues and HTTP proxy caching known problems are covered in two accompanying document, and are not part of this document. This document presents open protocols and points to published RFCs for each protocol. Melve, Tomlinson, Cooper [Page 1] Replication and Caching Taxonomy June 25, 1999 Contents 1. Introduction 2. Terminology 3. Distributed Relationships 4. Client to Replica Communication 5. Inter-Replica Communication 6. Client to Proxy Configuration 7. Inter-Cache Communication 8. Network Element Communication 9. Security Considerations 10. Acknowledgements 11. References 12. Authors' Addresses 1. Introduction Since its introduction in 1990, the World-Wide Web has evolved from a simple client server model into a sophisticated distributed architecture. This evolution has been driven largely due to the scaling problems associated with exponential growth. Distinct paradigms and solutions have emerged to satisfy specific requirements. Two core infrastructural components being employed to meet the demands of this growth are replication and caching. In man cases, there is a need for web caches and replicated services to be able to coexist. There are many protocols, both open and proprietary, employed in web replication and caching today. A majority of the open protocols include DNS[21], CacheDigest[16], CARP[9], HTTP[6], ICP[10], PAC[7], SOCKS[19], TPACT[22], WPAD[8], and WCCP[18]. Additional protocols are being planned to address emerging solution requirements. This memo specifies standard terminology and the current taxonomy of web replication and caching infrastructure deployed in the Internet today. The principal goal of this document is to establish a common understanding and reference point of this application domain. We also expect that this document will be used in the creation of a standard architectural framework for efficient, reliable, and predictable service in a web which includes both replicas and caches. 2. Terminology Where possible, existing definitions [5, 6] have been used in this document. Additional terminology has been agreed upon and defined in this document. All of the terminology used in this document is considered to be standardized with respect to IETF WREC working group Melve, Tomlinson, Cooper [Page 2] Replication and Caching Taxonomy June 25, 1999 RFCs. In this document a number of terms are used to refer to the roles played by participants in, and objects of, the HTTP communication. The following definitions are used in the HTTP/1.1 specification [6]. However, these definitions may have come to have differing meaning within the Web caching community. In those cases, additional clarification is given.: client An application program that establishes connections for the purpose of sending requests. user agent The client which initiates a request. These are often browsers, editors, spiders (web-traversing robots), or other end user tools. server An application program that accepts connections in order to service requests by sending back responses. Any given program may be capable of being both a client and a server; our use of these terms refers only to the role being performed by the program for a particular connection, rather than to the program's capabilities in general. Likewise, any server may act as an origin server, proxy, gateway, or tunnel, switching behavior based on the nature of each request. origin server The server on which a given resource resides or is to be created. [Ed note; IAN: The following is subtly different from the definition given in HTTP/1.1. (Should we now revert to the definition in HTTP/1.1 and document the difference?) As a community we must be careful about which type of "transparent proxy" is being discussed.] proxy An intermediary system which acts as both a server and a client for the purpose of making requests on behalf of other clients. Requests are serviced internally or by passing them on, with possible translation, to other servers. A proxy MUST implement both the client and server requirements of this specification. A "transparent proxy" is a proxy that does not modify the request or response beyond what is required for proxy authentication and Melve, Tomlinson, Cooper [Page 3] Replication and Caching Taxonomy June 25, 1999 identification. A "non-transparent proxy" is a proxy that modifies the request or response in order to provide some added service to the user agent, such as group annotation services, media type transformation, protocol reduction, or anonymity filtering. Except where either transparent or non-transparent behavior is explicitly stated, the HTTP proxy requirements apply to both types of proxies. Note: The term "transparent proxy" given in [6] has different meaning within the Web caching community. Further unspecified references in this document (including the following paragraph) are to the Web caching community definition, which is given later. The condition requiring implementation of both server and client requirements of HTTP/1.1 is only appropriate for a non-transparent proxy. [Ed note; IAN: The following is also subtly different from HTTP/1.1. Should also consider comments from Joe Touch on whether we should distinguish types of tunnels.] tunnel An intermediary system which is acting as a blind relay between two connections. Once active, a tunnel is not considered a party to the HTTP communication, though the tunnel may have been initiated by an HTTP request. The tunnel ceases to exist when both ends of the relayed connections are closed. [Ed note; IAN: The following has been slightly modified from HTTP/1.1 to consider server load. Need to consider comment from Joe Touch regarding clarification of not using a cache when tunnelling.] cache A program's local store of response messages and the subsystem that controls its message storage, retrieval, and deletion. A cache stores cacheable responses in order to reduce the response time, server load and network bandwidth consumption on future, equivalent requests. Any client or server may include a cache, though a cache cannot be used by a server while it is acting as a tunnel. [Ed note; IAN: The following has been edited from RFC2616 to reference that document.] cacheable Melve, Tomlinson, Cooper [Page 4] Replication and Caching Taxonomy June 25, 1999 A response is cacheable if a cache is allowed to store a copy of the response message for use in answering subsequent requests. The rules for determining the cacheability of HTTP responses are defined in section 13 of [6]. Even if a resource is cacheable, there may be additional constraints on whether a cache can use the cached copy for a particular request. To these we add the following: authoritative reference the owner of data; content production system; possibly an origin server content consumer the user or system that makes requests of an origin server (which may in turn be handled by a proxy). caching proxy A proxy with a cache, acting as server to clients, and a client to servers origin server accelerator an application of a caching proxy where the proxy is placed closer to the origin server than to the content consumers in order to off-load the handling of cacheable responses from the server; also as a means to reduce traffic within the server's network. surrogate [Ed note; IAN: need a definition.] network element router or switch [Ed note; IAN: This term probably needs a better name.] browser a special instance of a user agent that acts as a content presentation device for content consumer cluster a tightly coupled set of devices acting together to share load reverse proxy An intermediary system which acts as both a server and a client for the purpose of serving requests on behalf of Melve, Tomlinson, Cooper [Page 5] Replication and Caching Taxonomy June 25, 1999 origin servers. Requests are serviced internally or by passing them on to the origin server they are representing. A reverse proxy must interpret and, if necessary, rewrite a request message before forwarding it. Reverse proxies are often used as server-side portals through network firewalls and as helper applications for off loading requests from origin servers. [Ed note; IAN: leaving this as a placeholder until we can work out proxies/reverse proxies/surrogates and accelerators] The following definitions are added to describe caching device topology: user agent cache the cache within the user agent program local caching proxy the caching proxy a user agent connects to [Ed note; IAN: should this be renamed 'primary proxy'?] intermediate caching proxy seen from the content consumer's view, all caches participating in the caching mesh that are not the user agent's local caching proxy cache server a server to requests made by local and upper level caching proxies, but which does not act as a proxy cache array diffused array cache cluster a cluster of caching proxies, acting logically as one service and partitioning the URL name space across the array caching mesh a loosely coupled set of co-operating proxy- or caching- servers, or clusters, acting independently but sharing cacheable content between themselves using inter-cache communication protocols (see Section 7) Moves to insert proxies into the network in a manner such at the content consumer is unaware of their presence has created a set of Melve, Tomlinson, Cooper [Page 6] Replication and Caching Taxonomy June 25, 1999 terms whose definitions may not be consistent with other uses. This section references prior definitions but also gives their meaning in the realm of Web caching. [Ed note; IAN: snooping, redirection, interception - need to clarify if we only need the first two] traffic redirection redirection of traffic from a user agent or network element to a specific proxy, used to deploy Web-caching without the need to manually reconfigure individual user agents, or to force the use of a proxy where such use would not otherwise occur network traffic snooping the examination of network traffic within a network element to determine whether it should be redirected transparent proxy (additional definition) the term "transparent proxy" is defined in [6] (and quoted above). However, in the realm of Web caching, this has come to define a proxy which receives traffic as a result of network traffic snooping. The term typically describes the use of a proxy and the additional systems which performing network traffic snooping. The use of the proxy is transparent to the client. Transparent proxies are used to remove the need for configuration of clients to use a proxy. proxy discovery this describes the discovery and configuration for use of a proxy in an environment where the content consumer may be unaware of the proxy's existence. The use of the proxy is transparent to the content consumer, but not to the client. [Ed note; IAN: should we consider the ability of proxies to discover each other? Would this be better titled as "transparent proxy configuration"?] The following terms describe the roles of servers and caches in the realm of caching and replication: [Ed note; IAN: This section needs significant work] temporal domain, sparse working set cache a subset of the content from one or more origin servers, Melve, Tomlinson, Cooper [Page 7] Replication and Caching Taxonomy June 25, 1999 stored temporarily and collected from requests made by content consumers persistent domain a collection of origin servers maintaining a persistent data set from the authoritative reference replica origin server origin server storing a persistent replica of a data set stored at the authoritative reference 3. Distributed System Relationships [Ed note; GARY: Consider eliminating this big picture, its doesn't capture all of the relationships and is difficult to communicate] Diagram of the components that make up a web replication and caching infrastructure, with communication between the components. ------------------ ----------------- ------------------ | Replica Origin |-----| Master Origin |-----| Replica Origin | | Server | | Server | | Server | ------------------ ----------------- ------------------ \ | / \ | / ----------------------------------------- | Client to ----------------- Replica Server | Top-Level | | Caching Proxy | ----------------- / \ Inter Cache / \ Communication ----------------- ----------------- | Upper-Level |-----------| Upper-Level | | Caching Proxy | | Caching Proxy | ----------------- ----------------- / Inter Cache \ / Communication \ Inter Cache / \ Communication / \ / ------------------ \ / ------------------| \ Melve, Tomlinson, Cooper [Page 8] Replication and Caching Taxonomy June 25, 1999 ----------------- ----------------- || ----------------- | First Level |-----| Caching Proxy | |-----| First Level | | Caching Proxy | | Array |-- | Caching Proxy | ----------------- ----------------- ----------------- | Client to | | Proxy Cache | Cache to Network Element ------------- ------------ | Client | | Network | ------------- | Element | ------------ | | ------------ | Client | ------------ 3.1 Replication Relationships [Ed note; describe the replication system relationship domain] 3.1.1 Client to Replica [Ed note; recast this as relationship not the definition which follows in section 4] Client to Replica: cooperation and communication between clients (both browser/user agents and proxy caches) and replica origin servers. Used to discover optimal replica proximity. Persistent Domain Complete Idem-Potent Set Replication ------------------ ----------------- ------------------ | Replica Origin | | Master Origin | | Replica Origin | | Server | | Server | | Server | ------------------ ----------------- ------------------ \ | / \ | / ----------------------------------------- | Client to ----------------- Replica Server | Client | | | ----------------- 3.1.2 Inter-Replica Melve, Tomlinson, Cooper [Page 9] Replication and Caching Taxonomy June 25, 1999 [Ed note; recast this as relationship not the definition which follows in section 5] Inter-Replica: cooperation and communication between replica origin servers. Used in replicating data sets between origin servers. Persistent Domain Complete Idem-Potent Set Replication ------------------ ----------------- ------------------ | Replica Origin |-----| Master Origin |-----| Replica Origin | | Server | | Server | | Server | ------------------ ----------------- ------------------ 3.2 Caching Relationships [Ed note; describe the caching system relationship domain] 3.2.1 Client to Proxy [Ed note; recast this as relationship not the definition which follows in section 6] Client to Proxy: configuration, cooperation and communication between end user clients (browsers and applications) and a caching proxy. Temporal Domain Sparse Working Set Cache ----------------- ----------------- ----------------- | First Level | | First Level | | First Level | | Caching Proxy | | Caching Proxy | | Caching Proxy | ----------------- ----------------- ----------------- \ | / \ | / ----------------------------------------- | ----------------- | Client | ----------------- 3.2.2 Reverse Proxy to Origin Server [Ed note; describe the accelerator relationship] 3.2.2 Inter-Cache [Ed note; recast this as relationship not the definition which follows in section 7] Inter-Cache: cooperation and communication between caching proxies. Melve, Tomlinson, Cooper [Page 10] Replication and Caching Taxonomy June 25, 1999 Temporal Domain Sparse Working Set Cache ----------------- | Top-Level | | Caching Proxy | ----------------- / \ / \ ----------------- ----------------- | Upper-Level |-----------| Upper-Level | | Caching Proxy | | Caching Proxy | ----------------- ----------------- / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ ----------------- ----------------- ----------------- | First Level |-----| First Level |-------| First Level | | Caching Proxy | | Caching Proxy | | Caching Proxy | ----------------- ----------------- ----------------- Network Element to Caching Proxy [Ed note; recast this as relationship not the definition which follows in section 8] Network Element to Proxy Cache: cooperation and communication between caching proxy and network elements. Examples include routes and switches. Generally used for transparent caching and/or diffused arrays. Temporal Domain Sparse Working Set Cache ----------------- ----------------- ----------------- | Caching Proxy | | Caching Proxy | | Caching Proxy | | Array | | Array | | Array | ----------------- ----------------- ----------------- \ | / \ | / ----------------------------------------- | -------------- | Network | | Element | -------------- | | ------------ Melve, Tomlinson, Cooper [Page 11] Replication and Caching Taxonomy June 25, 1999 | Client | ------------ Caching Proxies with Transparency [Ed note: Currently contains citations from NetApp document, need rewording to avoid specific products and concentrate on generic properties. Explain network elements and NATs and other ways interception may happen. Intro to usage and "normal" setup.] Reference [1,2,3,4] for introduction to caching proxies with transparency. The goal of intercepting web traffic is to provide a transparent web proxy, thus avoiding the hassle of individually configuring each client. Transparency means that the user does not need to be aware of the proxy. The origin server see connections coming from the proxy, not from the individual end user. Authentication based on client IP address do not work if there is a transparent proxy cache in the way to the web server. A web cache is said to be transparent if clients can access the cache without the need to configure their browsers, using either a proxy auto-configuration URL or a manual proxy setting. Transparent caches appear as a seamless part of the network infrastructure, rather than a set of discrete proxy servers, and function much like a transparent firewall. Many ISPs and carriers desire transparent caches because it lets them retrofit their network with caching without action at the client. However, when deployed transparently, a web cache must be as fail-safe and scalable as the rest of the network. [2] A transparent cache acts much like a gateway or firewall -- it effectively sits between the users and the network. The advantage of transparent caching is that it eliminates the need to configure browsers to use caching. Another strength (and sometimes a weakness) is that it is impossible to bypass caching. [2] Conceptually, transparency works by modifying the TCP/IP stack of a cache so that it operates in "promiscuous mode" and effectively binds itself to all possible IP addresses. [2] We need to give a far more abstract definition which includes the way that router and switch redirection, and within-router action, operate. Melve, Tomlinson, Cooper [Page 12] Replication and Caching Taxonomy June 25, 1999 Comment on some of the problems: * limited number of ports which can be captured * due to "unexpected" data on other ports (or even on well known ports), as experienced by setting up various services on port 80 * well known problems with use of HTTP for transport [20] Out-of-path Transparent Caching Proxies An Out-of-path Transparent Caching Proxy performs the same proxy and caching functions as a Transparent Caching Proxy and is similarly transparent to the client. However it does not lie on the forwarding path between a client and a server and does not perform web traffic interception. Instead it relies upon a redirecting network element in the path between client and server to intercept and redirect web traffic to it. One advantage of this method of transparent caching is that in the case of cache failure the network element can, providing it monitors the state of the caches, revert to forwarding web traffic direct to the server. It is also possible for the network element to distribute the web traffic load across a group of caches. This method of transparent caching generally requires a protocol to be run between the redirecting network element and the cache or caches. 4. Client to Replica Communication This section describes the cooperation and communication between clients (both user agents and proxy caches) and replica origin web servers. Used to discover a optimal web origin server replica for a web client to establish service with. Optimality is a policy based decision, often based upon proximity, but may be based on other criteria such as load. 4.1 Navigation Hyperlinks Authoritative reference: This memo. Description: The simplest of client to replica communication mechanisms. This utilizes hyperlink URL's embedded in web pages that point to the mirror sites. The human user manually selects the link of the replica origin server they wish to use. Security: Relies on the protocol security associated with the URL scheme. Melve, Tomlinson, Cooper [Page 13] Replication and Caching Taxonomy June 25, 1999 Deployment: Probably the most commonly deployed client to replica communication mechanism. Ubiquitous interoperability with humans. Submitter: Document editors. 4.2 URL Redirection Authoritative reference: This memo. Description: A simple and commonly used mechanism to connect web clients with origin server replicas is to use URL redirection. Clients are redirected to a optimal web server replica via the use of the HTTP [6] protocol response code 307 Temporary Redirect. A web client establishes HTTP communication with one of the web server replicas. The initially contacted replica origin web server can either choose to accept the service or redirect the client to the proper replica. Refer to section 10.3.8 in HTTP/1.1 RFC2616 for information on HTTP response code 307. Security: Relies entirely upon HTTP security. Deployment: Observed at a number of large web sites. Extent of usage in the Internet is unknown at this time. Submitter: Document editors. 4.3 DNS Redirection [21] Authoritative reference: Load balancing: RFC1794 DNS Support for Load Balancing Proximity: This memo [Ed note; it would have been nice to cite SONAR, but draft has expired] Description: The Domain Name Service (DNS) provides a more Melve, Tomlinson, Cooper [Page 14] Replication and Caching Taxonomy June 25, 1999 sophisticated client to replica communication mechanism. This is accomplished by DNS servers that implement order of addresses based upon quality of service policies. When a web client resolves the name of a web server, the enhanced DNS server orders the IP addresses of the web server starting with the most optimal replica and ending with the least optimal replica. Security: Relies entirely upon DNS security. Deployment: Observed at a number of large web sites and large ISP web hosted services. Extent of usage in the Internet is unknown at this time. Submitter: Document editors. 5. Inter-Replica Communication This section describes the cooperation and communication between replica origin servers. Used in replicating data sets between origin servers. 5.1 Batch Driven Mirror Replication Authoritative reference: This memo. Description: In this model, the replica web server to be updated initiates communication with a master origin web server. The communication is established at intervals based upon queued transactions which are scheduled for deferred processing. The scheduling mechanism policies vary, but generally are reoccuring at a specified time. Once communication is established, data sets are copied to the initiating replica web server. Security: Relies upon the protocol being used to transfer the data set. FTP and RDIST are the most common protocols observed. Deployment: Very common for mirror synchronization in the Internet. Melve, Tomlinson, Cooper [Page 15] Replication and Caching Taxonomy June 25, 1999 Submitter: Document editors. 5.2 Demand Driven Mirror Replication Authoritative reference: This memo. Description: In this model, the replica web server acquires the content as needed due to demand. This is generally done by web server accelerators (reverse proxy) operating as origin server replicas. When a web client requests a URL that is not in the data set or the replica origin server, the replica server attempts to acquire it from a master origin server and forwarded on to the requesting web client. Security: Relies upon the protocol being used to transfer the URLs. FTP, Gopher, HTTP and ICP are the most common protocols observed. Deployment: Observed at several large web sites. Extent of usage in the Internet is unknown at this time. Submitter: Document editors. 5.3 Synchronized Replication Authoritative reference: This memo. [Ed note; there is no IETF protocol specified at this time. The editors are aware of at least two open source protocols, AFS and CODA, along with one expired IETF draft and one proprietary protocol Novell NRS; none of which can be considered an authoritative reference] Description: In this model, the replicated origin servers cooperate using synchronized strategies and specialized replica protocols to keep the replica data sets coherent. Synchronization strategies range from tightly coherent (a few minutes) to loosely coherent (a few or more hours). Updates occur between replicas based upon the Melve, Tomlinson, Cooper [Page 16] Replication and Caching Taxonomy June 25, 1999 synchronization time constraints of the coherency model employed and are generally in the form of deltas only. Security: All of the known protocols utilize strong cryptographic key exchange methods, which are either based upon the Kerberos shared secret model or the public/private key RSA model. Deployment: Observed at a few sites, primarily at university campuses. Submitter: Document editors. 6. Client to Proxy Configuration This section describes the configuration, cooperation and communication between end user clients (browsers and applications) a proxy. 6.1 Manual Proxy Configuration Authoritative reference: This memo. Description: Each user needs to configure its web client by typing in information pertaining to proxied protocols and local policies. Security: The potential for doing wrong is high, as each user individually sets preferences. Deployment: Widely deployed, used in all current browsers. Most browsers support other options as well. Submitter: Document editors. 6.2 Proxy Auto Configuration (PAC) [7] [Ed note: Does it really need to be submitted for Informational RFC?] Authoritative reference: Melve, Tomlinson, Cooper [Page 17] Replication and Caching Taxonomy June 25, 1999 No RFC published, no Internet-Draft Navigator Proxy Auto-Config File Format. Available from http://home.netscape.com/eng/mozilla/2.0/ relnotes/demo/proxy-live.html Description: A JavaScript page on a web server hands out information on where to find proxies. Clients need to point at the URL of this page. No bootstrap mechanism, manual configuration necessary. Manual configuration is made easier by centralizing the script to one URL. Security: Common policy per organization possible. Does still require manual configuration. PAC is better than "manual proxy configuration" because with PAC administrators can update the proxy configuration without user intervention. Interoperability of PAC files is not as good as wanted, since more popular browsers have slightly different interpretation of the script, and this may lead to undesired effects. Deployment: Implemented in most web clients. Submitter: Document editors. 6.3 Cache Array Routing Protocol (CARP) v1.0 [9] [Ed note: Current draft expired. A new draft must submitted and this section completed for this protocol to be considered in the Taxonomy] Authoritative reference: Expired Internet-Draft draft-vinod-carp-v1-03.txt Work in progress. Description: Clients may use CARP directly as a hash function based proxy selection mechanism. They need to be configured with the location of the cluster information. Security: Melve, Tomlinson, Cooper [Page 18] Replication and Caching Taxonomy June 25, 1999 Deployment: Submitter: 6.4 Web Proxy Auto-Discovery Protocol (WPAD) [8] Authoritative reference: Internet Draft [Ed note; I-D submission anticipated by 6/25/99] Work in progress. Description: WPAD uses a collection of pre-existing Internet resource discovery mechanisms to perform web proxy auto-discovery. The only goal of WPAD is to locate the PAC URL. WPAD does not specify which proxies will be used. WPAD gets you to the PAC URL, and the PAC script chooses the proxies for you. The WPAD protocol specifies the following: + how to use each mechanism for the specific purpose of web proxy auto-discovery + the order in which the mechanisms should be performed + the minimal set of mechanisms which must be attempted by a WPAD compliant web client The resource discovery mechanisms utilized by WPAD are as follows: + Dynamic Host Configuration Protocol DHCP + Service Location Protocol SLP + "Well Known Aliases" using DNS A records + DNS SRV records + "service: URLs" in DNS TXT records Security: Relies upon DNS and HTTP security. Deployment: Implemented in web clients and caching proxy servers. More than two independent implementations. Submitter: Melve, Tomlinson, Cooper [Page 19] Replication and Caching Taxonomy June 25, 1999 Josh Cohen, Microsoft, joshco@microsoft.com 7. Inter-Cache Communication [Ed note: INGRID. Review and chase submissions (push Duane)] This section describes the cooperation and communication between caching proxies. 7.1 Internet Cache Protocol (ICP) [10, 11, 12, 13, 14] Authoritative reference: RFC 2186 Internet Cache Protocol (ICP), version 2 Description: ICP is used by caches to query other caches about web objects, to see if a web object is present at the other cache. ICP uses UDP. Since UDP is unreliable, an estimate of network congestion and availability may be calculated by ICP loss. This rudimentary loss measurement does, together with round trip times provide a load balancing method for caches. Security: ICP does not convey information about HTTP headers associated with a web object. HTTP headers may include access control and cache directives, Since caches ask for objects, and then download the objects using HTTP, false cache hits may occur (object present in cache, but not accessible for sibling cache is one example). ICP suffer from all the security problems of UDP. Deployment: Widely deployed. Most current cache implementations support ICP in one form or the other. Submitter: Document editors. 7.2 Hyper Text Caching Protocol (HTCP/0.0) [15] [Ed note: Current draft expired. A new draft must submitted for this protocol to be considered in the Taxonomy. Based upon reviewers Melve, Tomlinson, Cooper [Page 20] Replication and Caching Taxonomy June 25, 1999 comments, the editors would like to drop this protocol from current Taxonomy consideration, due to its experimental nature] Authoritative reference: Expired Internet Draft draft-vixie-htcp-proto-03.txt, Work in Progress Description: HTCP is a protocol for discovering HTTP caches and cached data, managing sets of HTTP caches, and monitoring cache activity. HTCP includes HTTP headers, while ICPv2 does not. HTTP headers are vital information for web proxy caches. Security: Optionally uses the MD5 shared secret authentication. Lack of authentication option make protocol subject to attack. Deployment: Implemented in caching proxies (two independent implementations) Submitter: Document editors. 7.3 Cache Array Routing Protocol (CARP) v1.0 [9] [Ed note: Current draft expired. A new draft must submitted and this section completed for this protocol to be considered in the Taxonomy] Authoritative reference: Work in Progress: Internet-Draft draft-vinod-carp-v1-03.txt Description: CARP is a hashing function for dividing URL-space among a cluster of proxy caches. Included in CARP is the definition of a Proxy Array Membership Table, and ways to download this information. An HTTP client agent (either a proxy server or a client browser) which implements CARP v1.0 can allocate and intelligently route requests for the correct URLs to any member of the Proxy Array. Due to the resulting sorting of requests through these proxies, duplication of cache contents is eliminated and global cache hit rates may be improved. Melve, Tomlinson, Cooper [Page 21] Replication and Caching Taxonomy June 25, 1999 Security: Deployment: Implemented in caching proxy servers. More than two independent implementations. Submitter: 7.4 Cache Digest [16] [Ed note: Does it really need to be submitted for Informational RFC?] Authoritative reference: No RFC published, no Internet-Draft Cache Digest specification http://squid.nlanr.net/Squid/CacheDigest/ cache-digest-v5.txt Squid Digest FAQ entry http://squid.nlanr.net/Squid/FAQ/FAQ-16.html Description: Cache Digests are a response to the problems of latency and congestion associated with previous inter-cache communications mechanisms such as the Internet Cache Protocol (ICP) [10, 11] and the HyperText Cache Protocol [15]. Unlike most of these protocols, Cache Digests support peering between cache servers without a request-response exchange taking place. Instead, a summary of the contents of the server (the Digest) is fetched by other servers which peer with it. Using Cache Digests it is possible to determine with a relatively high degree of accuracy whether a given URL is cached by a particular server. Cache Digests are both an exchange protocol and a data format [16a,16b]. Security: If the contents of a Digest is sensitive, it should be protected from access by The Wrong People. Any methods which would normally be applied to secure an HTTP connection can be applied to Cache Digests. A 'Trojan horse' attack is currently possible in a cache mesh: Cache A can build a fake peer Digest for cache B and serve it to B's peers if requested. This way A can direct traffic toward/from B. The impact of this problem is Melve, Tomlinson, Cooper [Page 22] Replication and Caching Taxonomy June 25, 1999 minimized by the 'pull' model of transferring Cache Digests from one server to another. Cache Digests provide knowledge about peer cache content on a URL level. Hence, they do not dictate a particular level of policy management and can be used to implement various policies on any level (user, organization, etc.). Deployment: Cache Digests are supported in Squid; several commercial vendors are looking into Digest support. Cache Meshes: + NLANR Mesh + TF-CACHE mesh (European Academic networks) Submitter: Alex Rousskov, NLANR, rousskov@nlanr.net 7.5 Cache Pre-filling [23] Authoritative reference: Internet Draft Work in progress. Description: Cache pre-filling is a push-caching implementation. It is particularly well adapted to IP-multicast networks because it allows preselected URLs to be inserted in one single time within all the caches that belong to the targeted multicast group. Different implementations of cache pre-filling already exist, especially in satellite contexts. However, there is still no standard for this kind of push-caching and vendors propose solutions either based on dedicated equipments or public domain caches extended with a pre-filling module. Security: Relies on the inter cache protocols being employed. Deployment: Observed in two commercial content distribution service providers. Submitter: Ivan Lovric, France Telecom, ivan.lovric@cnet.francetelecom.fr Melve, Tomlinson, Cooper [Page 23] Replication and Caching Taxonomy June 25, 1999 8. Network Element Communication This section describes the cooperation and communication between caching proxy and network elements. Examples include routers and switches. Generally used for transparent caching and/or diffused arrays. 8.1 Web Cache Coordination Protocol (WCCP) Authoritative reference: Internet Draft [18] Work in progress. Description: WCCP V1 runs between a router functioning as a redirecting network element and out-of-path transparent caching proxies. The protocol allows one or more caching proxies to register themselves with a single router to receive redirected web traffic. It also allows one of the proxies, the designated proxy, to dictate to the router how redirected web traffic is distributed across the caching proxies. Security: WCCP V1 has no security features. Deployment: Network elements: WCCP V1 is deployed on a wide range of Cisco routers. Caching proxies: WCCP V1 is deployed on a number of vendors' caches. Submitter: David Forster, CISCO, dforster@cisco.com 8.2 Transparent Proxy Agent Control Protocol (TPACT) Authoritative reference: [Ed note; anticipated submission] Internet Draft [22] [Ed note; I-D submission anticipated by 6/25/99] Work in progress. Description: TPACT runs between a network elements (router or switch) functioning as a redirecting network element and out-of-path transparent caching proxies. The protocol Melve, Tomlinson, Cooper [Page 24] Replication and Caching Taxonomy June 25, 1999 allows one or more caching proxies to register themselves with a single network element to receive redirected web traffic. All of the participating caching proxies operate as a quorum in the diectating of web traffic distribution across the group. Security: MD5 is optionally employed for authentication. Sequence numbers are employed as security against replay attacks. Deployment: Network elements: TPACT is prototyped and being evaluated on multiple vendor L4 switches. Caching proxies: TPACT is prototyped and being evaluated on multiple vendor caches. Submitter: John Martin, Network Appliance, jmartin@netapp.com 8.3 SOCKS [19] Authoritative reference: RFC1928 SOCKS Protocol Version 5 Description: SOCKS is primarily used as a proxy cache to firewall protocol. Although, firewalls don't conform to the narrowly defined network element definition of routers and switches, they are a integral part of the network infrastructure. When used in conjunction with a firewall, SOCKS provides a authenticated tunnel between the proxy cache and the firewall. Security: A extensive framework provides for multiple authentication methods. Currently, SSL, CHAP, DES, 3DES are known to be available. Deployment: SOCKS is been widely deployed in the Internet. Submitter: Document editors. 9. Security Considerations Melve, Tomlinson, Cooper [Page 25] Replication and Caching Taxonomy June 25, 1999 [Ed note: INGRID. Send to list, more information needed] Information on security in each protocol is provided in the description of the protocol, and in the accompanying RFC for each protocol. Refer to section 15 in HTTP/1.1bis draft-ietf-http-v11-spec- rev-06.txt Man in the middle attacks Refer to HTTP/1.1bis, chapter 15.7 HTTP proxies are men-in-the-middle, the perfect place for a man-in- the-middle-attack. Denial of service Individual protocols See documentation for each protocol for discussion of security issues. Trusted parties You need to trust your proxy. Stupid configuration It is quite easy to have a stupid configuration which will harm service for end users. Privacy Logs from proxies need to be kept secure, as they provide information about users and end user patterns. A proxy log is even more sensitive than a web server log, as all requests from the user population goes through the proxy. Logs from replication servers may need to be amalgamated to get aggregated statistics from a service, transporting logs across borders may have legal implications. Log handling is restricted by law in some countries. Requirements for object security and privacy are the same in a web replication and caching system as it is in the Internet at large. The only reliable solution is strong cryptography. End to end encryption does not necessarily make objects cacheable, as is the Melve, Tomlinson, Cooper [Page 26] Replication and Caching Taxonomy June 25, 1999 case of SSL encrypted web sessions. Communication Transient copies The legislative forces of the world are still out on the question of transient copies, like those kept in replication and caching system, being legal. Legal implications of replication and caching is subject to local law. 10. Acknowledgements [Ed note: No decision made on authors list. Submitters of individual entries are acknowledged in the text. Need to sort out how to give credits where they are due.] David Forster, Cisco, dforster@cisco.com provided info on Out-of-path Transparent Caching Proxies. Alex Rousskov, David Forster, Josh Cohen and John Martin for protocol information. John Dilley, Ivan Lovric and Joe Touch for terminology and taxonomy information. David Forster, Josh Cohen, Henrik Nordstrom and Patrick McManus for their help in defining proxy transparency. 11. References [1] Duane Wessels. Squid FAQ: Transparent Caching/Proxying. National Laboratory for Applied Network Research. Available from: http://squid.nlanr.net/Squid/FAQ/FAQ-17.html [2] Peter Danzig and Karl L. Swartz. Transparent, Scalable, Fail- Safe Web Caching. Network Appliance, Inc. Available from http://www.netapp.com/technology/level3/3033.html [3] Bert Williams. Transparent Web Caching Solutions. Alteon Networks. Available from Transparent Web Caching Solutions [4] Tony Hain. Architectural Implications of NAT. Internet Architecture Board. Internet Draft (Work in Progress). Available from ftp://ftp.nordu.net/internet-drafts/draft-iab-nat-implications-02.txt Melve, Tomlinson, Cooper [Page 27] Replication and Caching Taxonomy June 25, 1999 [5] Ingrid Melve, Lars Slettjord, Ton Verschuren, Henny Bekker, Technical report European Union RE1004-M4.3 "Web caching architecture" [6] Fielding, et al. Hypertext Transfer Protocol -- HTTP/1.1. IETF RFC2616. Available from http://www.rfc-editor.org/rfc/rfc2616.txt [7] Netscape, Inc. Navigator Proxy Auto-Config File Format. Available from http://home.netscape.com/eng/mozilla/2.0/relnotes/demo/proxy- live.html [8] Paul Gauthier, J. Cohen, Martin Dunsmuir and Charles Perkins. The Web Proxy Auto-Discovery Protocol. Internet Draft. Available from http://www.ietf.org/internet-drafts/draft-ietf-wrec-wpad-00.txt [9] Vinod Valloppillil and Keith W. Ross. Cache Array Routing Protocol. Internet Draft (Work in Progress) Available from ftp://ftp.nordu.net/internet-drafts/draft-vinod-carp-v1-03.txt [10] D. Wessels and K. Claffy. Internet Cache Protocol (ICP), version 2. 'RFC2186. Available from ftp://ftp.nordu.net/rfc/rfc2186.txt [11] D. Wessels and K. Claffy. Application of Internet Cache Protocol (ICP), version 2, RFC2187. Available from ftp://ftp.nordu.net/rfc/rfc2187.txt [12] Ivan Lovric. Internet Cache Protocol Extension Internet Draft (Work in Progress) Available from ftp://ftp.nordu.net/internet- drafts/draft-lovric-icp-ext-01.txt [13] Duane Wessels. ICP Home Page, National Laboratory for Applied Research. Available from [52]http://ircache.nlanr.net/Cache/ICP/ [14] University of Southern California. Internet Cache Protocol Specification 1.4. Available from http://excalibur.usc.edu/icpdoc/icp.html [15] Paul Vixie and Duane Wessels. Hyper Text Caching Protocol (HTCP/0.0). Internet Draft (Work in Progress) Available from ftp://ftp.nordu.net/internet-drafts/draft-vixie-htcp-proto-03.txt [16] Alex Rouskov and Duane Wessels. Cache Digests. National Laboratory for Applied Network Research. Available from [16a] Cache Digest specification http://squid.nlanr.net/Squid/CacheDigest/cache- digest-v5.txt [16b] Squid Digest FAQ entry http://squid.nlanr.net/Squid/FAQ/FAQ-16.html Melve, Tomlinson, Cooper [Page 28] Replication and Caching Taxonomy June 25, 1999 [17] Berners-Lee, et al. Hypertext Transfer Protocol -- HTTP/1.0 IETF RFC1945 Available from http://www.rfc-editor.org/rfc/rfc1945.txt [18] Cisco Web Cache Coordination Protocol V1.0. Internet Draft. Available from http://www.ietf.org/internet-drafts/draft-ietf-wrec- web-pro-00.txt [19] Leech, et al. SOCKS Protocol Version 5, RFC1928 Available from http://www.rfc-editor.org/rfc/rfc1928.txt [20] Keith Moore, On the use of HTTP as a Substrate for Other Protocols. Internet Draft (Work in Progress) Available from ftp://ftp.nordu.net/internet-drafts/draft-iesg-using-http-00.txt [21] Brisco, T. DNS Support for Load Balancing. RFC1794. Available from http://www.rfc-editor.org/rfc/rfc1794.txt [22] Cerpa, et al. Transparent Proxy Agent Control Protocol. Internet Draft. Available from http://www.ietf.org/internet- drafts/draft-ietf-wrec-tpact-00.txt [23] Goutard, et al. Pre-filling a cache - A satellite overview. Internet Draft. Available from http://www.ietf.org/internet-drafts/ draft-lovric-francetelecom-satellites-00.txt 12. Authors' Addresses Ingrid Melve UNINETT Tempeveien 22, Trondheim, NORWAY Phone: +47 73 55 79 07 Email: Ingrid.Melve@uninett.no Gary Tomlinson Novell, Inc. 122 East 1700 South Provo, Utah 84606 USA Phone: +1 801 861 7021 Email: garyt@novell.com Ian Cooper Mirror Image Internet, Inc. 18 Commerce Way, Suite 4800 Woburn, MA 01801 USA Phone: +1 800 353 2923 Email: ian@mirror-image.com Melve, Tomlinson, Cooper [Page 29]