Internet Draft                                          Ingrid Melve
Expires: December 1999                                       UNINETT
Informational                                         Gary Tomlinson
WREC Working Group                                            Novell
                                                          Ian Cooper
                                               Mirror Image Internet
                                                       June, 25 1999


             Internet Web Replication and Caching Taxonomy

                    draft-ietf-wrec-taxonomy-01.txt


Status of this Memo


   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet- Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

Abstract

   This memo specifies standard terminology and the current taxonomy of
   web replication and caching infrastructure deployed today. It
   introduces standard concepts and protocols uses today within this
   application domain. Currently deployed solutions employing this
   technologies are presented to establish a standard taxonomy.

   Research issues and HTTP proxy caching known problems are covered in
   two accompanying document, and are not part of this document. This
   document presents open protocols and points to published RFCs for
   each protocol.


Melve, Tomlinson, Cooper                                        [Page 1]


Replication and Caching Taxonomy       June 25, 1999


Contents

       1. Introduction
       2. Terminology
       3. Distributed Relationships
       4. Client to Replica Communication
       5. Inter-Replica Communication
       6. Client to Proxy Configuration
       7. Inter-Cache Communication
       8. Network Element Communication
       9. Security Considerations
      10. Acknowledgements
      11. References
      12. Authors' Addresses

1. Introduction

   Since its introduction in 1990, the World-Wide Web has evolved from a
   simple client server model into a sophisticated distributed
   architecture. This evolution has been driven largely due to the
   scaling problems associated with exponential growth. Distinct
   paradigms and solutions have emerged to satisfy specific
   requirements.  Two core infrastructural components being employed to
   meet the demands of this growth are replication and caching. In man
   cases, there is a need for web caches and replicated services to be
   able to coexist.

   There are many protocols, both open and proprietary, employed in web
   replication and caching today.  A majority of the open protocols
   include DNS[21], CacheDigest[16], CARP[9], HTTP[6], ICP[10], PAC[7],
   SOCKS[19], TPACT[22], WPAD[8], and WCCP[18]. Additional protocols are
   being planned to address emerging solution requirements.

   This memo specifies standard terminology and the current taxonomy of
   web replication and caching infrastructure deployed in the Internet
   today. The principal goal of this document is to establish a common
   understanding and reference point of this application domain.

   We also expect that this document will be used in the creation of a
   standard architectural framework for efficient, reliable, and
   predictable service in a web which includes both replicas and caches.

2. Terminology

   Where possible, existing definitions [5, 6] have been used in this
   document.  Additional terminology has been agreed upon and defined in
   this document.  All of the terminology used in this document is
   considered to be standardized with respect to IETF WREC working group


Melve, Tomlinson, Cooper                                        [Page 2]


Replication and Caching Taxonomy       June 25, 1999


   RFCs.

   In this document a number of terms are used to refer to the roles
   played by participants in, and objects of, the HTTP communication.
   The following definitions are used in the HTTP/1.1 specification [6].
   However, these definitions may have come to have differing meaning
   within the Web caching community. In those cases, additional
   clarification is given.:

      client
             An application program that establishes connections for the
             purpose of sending requests.

      user agent
             The client which initiates a request. These are often
             browsers, editors, spiders (web-traversing robots), or
             other end user tools.

      server
             An application program that accepts connections in order to
             service requests by sending back responses. Any given
             program may be capable of being both a client and a server;
             our use of these terms refers only to the role being
             performed by the program for a particular connection,
             rather than to the program's capabilities in
             general. Likewise, any server may act as an origin server,
             proxy, gateway, or tunnel, switching behavior based on the
             nature of each request.

      origin server
             The server on which a given resource resides or is to be
             created.

      [Ed note; IAN: The following is subtly different from the
                     definition given in HTTP/1.1.  (Should we now
                     revert to the definition in HTTP/1.1 and document
                     the difference?)  As a community we must be
                     careful about which type of "transparent proxy" is
                     being discussed.]
      proxy
             An intermediary system which acts as both a server and a
             client for the purpose of making requests on behalf of
             other clients. Requests are serviced internally or by
             passing them on, with possible translation, to other
             servers. A proxy MUST implement both the client and server
             requirements of this specification. A "transparent proxy"
             is a proxy that does not modify the request or response
             beyond what is required for proxy authentication and


Melve, Tomlinson, Cooper                                        [Page 3]


Replication and Caching Taxonomy       June 25, 1999


             identification. A "non-transparent proxy" is a proxy that
             modifies the request or response in order to provide some
             added service to the user agent, such as group annotation
             services, media type transformation, protocol reduction,
             or anonymity filtering. Except where either transparent or
             non-transparent behavior is explicitly stated, the HTTP
             proxy requirements apply to both types of proxies.

      Note:  The term "transparent proxy" given in [6] has different
             meaning within the Web caching community.  Further
             unspecified references in this document (including the
             following paragraph) are to the Web caching community
             definition, which is given later.

             The condition requiring implementation of both server and
             client requirements of HTTP/1.1 is only appropriate for a
             non-transparent proxy.


      [Ed note; IAN: The following is also subtly different from
                     HTTP/1.1.  Should also consider comments from Joe
                     Touch on whether we should distinguish types of
                     tunnels.]
      tunnel
             An intermediary system which is acting as a blind relay
             between two connections. Once active, a tunnel is not
             considered a party to the HTTP communication, though the
             tunnel may have been initiated by an HTTP request. The
             tunnel ceases to exist when both ends of the relayed
             connections are closed.

      [Ed note; IAN: The following has been slightly modified from
                     HTTP/1.1 to consider server load. Need to consider
                     comment from Joe Touch regarding clarification of
                     not using a cache when tunnelling.]
      cache
             A program's local store of response messages and the
             subsystem that controls its message storage, retrieval, and
             deletion. A cache stores cacheable responses in order to
             reduce the response time, server load and network
             bandwidth consumption on future, equivalent requests. Any
             client or server may include a cache, though a cache
             cannot be used by a server while it is acting as a tunnel.


      [Ed note; IAN: The following has been edited from RFC2616 to
                     reference that document.]
      cacheable


Melve, Tomlinson, Cooper                                        [Page 4]


Replication and Caching Taxonomy       June 25, 1999


             A response is cacheable if a cache is allowed to store a
             copy of the response message for use in answering
             subsequent requests. The rules for determining the
             cacheability of HTTP responses are defined in section 13
             of [6]. Even if a resource is cacheable, there may be
             additional constraints on whether a cache can use the
             cached copy for a particular request.


   To these we add the following:

      authoritative reference
             the owner of data; content production system; possibly an
             origin server

      content consumer
             the user or system that makes requests of an origin server
             (which may in turn be handled by a proxy).

      caching proxy
             A proxy with a cache, acting as server to clients, and
             a client to servers

      origin server accelerator
             an application of a caching proxy where the proxy is
             placed closer to the origin server than to the content
             consumers in order to off-load the handling of cacheable
             responses from the server; also as a means to reduce
             traffic within the server's network.

      surrogate
             [Ed note; IAN: need a definition.]

      network element
             router or switch
             [Ed note; IAN: This term probably needs a better name.]

      browser
             a special instance of a user agent that acts as a content
             presentation device for content consumer

      cluster
             a tightly coupled set of devices acting together to share
             load

      reverse proxy
             An intermediary system which acts as both a server and a
             client for the purpose of serving requests on behalf of


Melve, Tomlinson, Cooper                                        [Page 5]


Replication and Caching Taxonomy       June 25, 1999


             origin servers. Requests are serviced internally or by
             passing them on to the origin server they are representing.
             A reverse proxy must interpret and, if necessary, rewrite a
             request message before forwarding it. Reverse proxies are
             often used as server-side portals through network firewalls
             and as helper applications for off loading requests from
             origin servers.
             [Ed note; IAN: leaving this as a placeholder until we can
                            work out proxies/reverse proxies/surrogates
                            and accelerators]


   The following definitions are added to describe caching device
   topology:


      user agent cache
             the cache within the user agent program

      local caching proxy
             the caching proxy a user agent connects to
             [Ed note; IAN: should this be renamed 'primary proxy'?]

      intermediate caching proxy
             seen from the content consumer's view, all caches
             participating in the caching mesh that are not the user
             agent's local caching proxy

      cache server
             a server to requests made by local and upper level caching
             proxies, but which does not act as a proxy

      cache array
      diffused array
      cache cluster
             a cluster of caching proxies, acting logically as one
             service and partitioning the URL name space across the
             array

      caching mesh
             a loosely coupled set of co-operating proxy- or caching-
             servers, or clusters, acting independently but sharing
             cacheable content between themselves using inter-cache
             communication protocols (see Section 7)


   Moves to insert proxies into the network in a manner such at the
   content consumer is unaware of their presence has created a set of


Melve, Tomlinson, Cooper                                        [Page 6]


Replication and Caching Taxonomy       June 25, 1999


   terms whose definitions may not be consistent with other uses.  This
   section references prior definitions but also gives their meaning in
   the realm of Web caching.

      [Ed note; IAN: snooping, redirection, interception - need to
                     clarify if we only need the first two]

      traffic redirection
             redirection of traffic from a user agent or network
             element to a specific proxy, used to deploy Web-caching
             without the need to manually reconfigure individual user
             agents, or to force the use of a proxy where such use
             would not otherwise occur

      network traffic snooping
             the examination of network traffic within a network
             element to determine whether it should be redirected

      transparent proxy (additional definition)
             the term "transparent proxy" is defined in [6] (and quoted
             above).  However, in the realm of Web caching, this has
             come to define a proxy which receives traffic as a result
             of network traffic snooping. The term typically
             describes the use of a proxy and the additional systems
             which performing network traffic snooping.  The use of
             the proxy is transparent to the client. Transparent
             proxies are used to remove the need for configuration of
             clients to use a proxy.

      proxy discovery
             this describes the discovery and configuration for use of
             a proxy in an environment where the content consumer may
             be unaware of the proxy's existence.  The use of the proxy
             is transparent to the content consumer, but not to the
             client.
             [Ed note; IAN: should we consider the ability of proxies
                            to discover each other?  Would this be
                            better titled as "transparent proxy
                            configuration"?]


   The following terms describe the roles of servers and caches in the
   realm of caching and replication:

      [Ed note; IAN: This section needs significant work]

      temporal domain, sparse working set cache
             a subset of the content from one or more origin servers,


Melve, Tomlinson, Cooper                                        [Page 7]


Replication and Caching Taxonomy       June 25, 1999


             stored temporarily and collected from requests made by
             content consumers

      persistent domain
             a collection of origin servers maintaining a persistent
             data set from the authoritative reference

      replica origin server
             origin server storing a persistent replica of a data set
             stored at the authoritative reference


3. Distributed System Relationships


   [Ed note; GARY: Consider eliminating this big picture, its doesn't
   capture all of the relationships and is difficult to communicate]

   Diagram of the components that make up a web replication and caching
   infrastructure, with communication between the components.


    ------------------     -----------------     ------------------
    | Replica Origin |-----| Master Origin |-----| Replica Origin |
    |     Server     |     |    Server     |     |     Server     |
    ------------------     -----------------     ------------------
             \                    |                      /
              \                   |                     /
               -----------------------------------------
                                  |                 Client to
                           -----------------        Replica Server
                           |   Top-Level   |
                           | Caching Proxy |
                           -----------------
                             /            \      Inter Cache
                            /              \     Communication
              -----------------           -----------------
              |  Upper-Level  |-----------|  Upper-Level  |
              | Caching Proxy |           | Caching Proxy |
              -----------------           -----------------
                     /         Inter Cache       \
                    /         Communication       \  Inter Cache
                   /                               \  Communication
                  /                                 \
                 /         ------------------        \
                /         ------------------|         \


Melve, Tomlinson, Cooper                                        [Page 8]


Replication and Caching Taxonomy       June 25, 1999


    -----------------     ----------------- ||    -----------------
    |  First Level  |-----| Caching Proxy | |-----|  First Level  |
    | Caching Proxy |     |    Array      |--     | Caching Proxy |
    -----------------     -----------------       -----------------
            | Client to         |
            | Proxy Cache       |   Cache to Network Element
       -------------       ------------
       |  Client   |       | Network  |
       -------------       | Element  |
                           ------------
                                |
                                |
                           ------------
                           |  Client  |
                           ------------


 3.1 Replication Relationships

   [Ed note; describe the replication system relationship domain]

  3.1.1 Client to Replica

   [Ed note; recast this as relationship not the definition which
   follows in section 4] Client to Replica: cooperation and
   communication between clients (both browser/user agents and proxy
   caches) and replica origin servers.  Used to discover optimal replica
   proximity.


                    Persistent Domain
                      Complete Idem-Potent Set Replication
    ------------------     -----------------     ------------------
    | Replica Origin |     | Master Origin |     | Replica Origin |
    |     Server     |     |    Server     |     |     Server     |
    ------------------     -----------------     ------------------
             \                    |                      /
              \                   |                     /
               -----------------------------------------
                                  |                 Client to
                           -----------------        Replica Server
                           |     Client    |
                           |               |
                           -----------------

  3.1.2 Inter-Replica


Melve, Tomlinson, Cooper                                        [Page 9]


Replication and Caching Taxonomy       June 25, 1999


   [Ed note; recast this as relationship not the definition which
   follows in section 5] Inter-Replica: cooperation and communication
   between replica origin servers.  Used in replicating data sets
   between origin servers.

                   Persistent Domain
                      Complete Idem-Potent Set Replication
    ------------------     -----------------     ------------------
    | Replica Origin |-----| Master Origin |-----| Replica Origin |
    |     Server     |     |    Server     |     |     Server     |
    ------------------     -----------------     ------------------


 3.2 Caching Relationships

   [Ed note; describe the caching system relationship domain]

  3.2.1 Client to Proxy

   [Ed note; recast this as relationship not the definition which
   follows in section 6] Client to Proxy: configuration, cooperation and
   communication between end user clients (browsers and applications)
   and a caching proxy.

                        Temporal Domain
                          Sparse Working Set Cache
    -----------------     -----------------     -----------------
    |  First Level  |     |  First Level  |     |  First Level  |
    | Caching Proxy |     | Caching Proxy |     | Caching Proxy |
    -----------------     -----------------     -----------------
             \                    |                      /
              \                   |                     /
               -----------------------------------------
                                  |
                           -----------------
                           |     Client    |
                           -----------------

  3.2.2 Reverse Proxy to Origin Server

   [Ed note; describe the accelerator relationship]

  3.2.2 Inter-Cache

   [Ed note; recast this as relationship not the definition which
   follows in section 7] Inter-Cache: cooperation and communication
   between caching proxies.


Melve, Tomlinson, Cooper                                       [Page 10]


Replication and Caching Taxonomy       June 25, 1999


                        Temporal Domain
                          Sparse Working Set Cache
                           -----------------
                           |   Top-Level   |
                           | Caching Proxy |
                           -----------------
                             /            \
                            /              \
              -----------------           -----------------
              |  Upper-Level  |-----------|  Upper-Level  |
              | Caching Proxy |           | Caching Proxy |
              -----------------           -----------------
                     / \                    /    \
                    /   \                  /      \
                   /     \                /        \
                  /       \              /          \
                 /         \            /            \
                /           \          /              \
    -----------------     -----------------       -----------------
    |  First Level  |-----|  First Level  |-------|  First Level  |
    | Caching Proxy |     | Caching Proxy |       | Caching Proxy |
    -----------------     -----------------       -----------------

  Network Element to Caching Proxy

   [Ed note; recast this as relationship not the definition which
   follows in section 8] Network Element to Proxy Cache: cooperation and
   communication between caching proxy and network elements.  Examples
   include routes and switches.  Generally used for transparent caching
   and/or diffused arrays.

                        Temporal Domain
                          Sparse Working Set Cache
    -----------------     -----------------     -----------------
    | Caching Proxy |     | Caching Proxy |     | Caching Proxy |
    |     Array     |     |     Array     |     |     Array     |
    -----------------     -----------------     -----------------
             \                    |                      /
              \                   |                     /
               -----------------------------------------
                                  |
                             --------------
                             |  Network   |
                             |  Element   |
                             --------------
                                  |
                                  |
                              ------------


Melve, Tomlinson, Cooper                                       [Page 11]


Replication and Caching Taxonomy       June 25, 1999


                              |  Client  |
                              ------------

  Caching Proxies with Transparency

   [Ed note: Currently contains citations from NetApp document, need
   rewording to avoid specific products and concentrate on generic
   properties. Explain network elements and NATs and other ways
   interception may happen. Intro to usage and "normal" setup.]

   Reference [1,2,3,4] for introduction to caching proxies with
   transparency.

   The goal of intercepting web traffic is to provide a transparent web
   proxy, thus avoiding the hassle of individually configuring each
   client.

   Transparency means that the user does not need to be aware of the
   proxy.

   The origin server see connections coming from the proxy, not from the
   individual end user. Authentication based on client IP address do not
   work if there is a transparent proxy cache in the way to the web
   server.

   A web cache is said to be transparent if clients can access the cache
   without the need to configure their browsers, using either a proxy
   auto-configuration URL or a manual proxy setting. Transparent caches
   appear as a seamless part of the network infrastructure, rather than
   a set of discrete proxy servers, and function much like a transparent
   firewall. Many ISPs and carriers desire transparent caches because it
   lets them retrofit their network with caching without action at the
   client. However, when deployed transparently, a web cache must be as
   fail-safe and scalable as the rest of the network. [2]

   A transparent cache acts much like a gateway or firewall -- it
   effectively sits between the users and the network. The advantage of
   transparent caching is that it eliminates the need to configure
   browsers to use caching. Another strength (and sometimes a weakness)
   is that it is impossible to bypass caching. [2]

   Conceptually, transparency works by modifying the TCP/IP stack of a
   cache so that it operates in "promiscuous mode" and effectively binds
   itself to all possible IP addresses. [2]

   We need to give a far more abstract definition which includes the way
   that router and switch redirection, and within-router action,
   operate.


Melve, Tomlinson, Cooper                                       [Page 12]


Replication and Caching Taxonomy       June 25, 1999


   Comment on some of the problems:
        * limited number of ports which can be captured
        * due to "unexpected" data on other ports
          (or even on well known ports), as experienced by setting up
          various services on port 80
        * well known problems with use of HTTP for transport [20]


  Out-of-path Transparent Caching Proxies

   An Out-of-path Transparent Caching Proxy performs the same proxy and
   caching functions as a Transparent Caching Proxy and is similarly
   transparent to the client. However it does not lie on the forwarding
   path between a client and a server and does not perform web traffic
   interception. Instead it relies upon a redirecting network element in
   the path between client and server to intercept and redirect web
   traffic to it. One advantage of this method of transparent caching is
   that in the case of cache failure the network element can, providing
   it monitors the state of the caches, revert to forwarding web traffic
   direct to the server. It is also possible for the network element to
   distribute the web traffic load across a group of caches. This method
   of transparent caching generally requires a protocol to be run
   between the redirecting network element and the cache or caches.

4. Client to Replica Communication

   This section describes the cooperation and communication between
   clients (both user agents and proxy caches) and replica origin web
   servers.  Used to discover a optimal web origin server replica for a
   web client to establish service with.  Optimality is a policy based
   decision, often based upon proximity, but may be based on other
   criteria such as load.

 4.1 Navigation Hyperlinks

      Authoritative reference:
             This memo.

      Description:
             The simplest of client to replica communication
             mechanisms.  This utilizes hyperlink URL's embedded in web
             pages that point to the mirror sites.  The human user
             manually selects the link of the replica origin server
             they wish to use.

      Security:
             Relies on the protocol security associated with the URL
             scheme.


Melve, Tomlinson, Cooper                                       [Page 13]


Replication and Caching Taxonomy       June 25, 1999


      Deployment:
             Probably the most commonly deployed client to replica
             communication mechanism.  Ubiquitous interoperability
             with humans.

      Submitter:
             Document editors.


 4.2 URL Redirection

      Authoritative reference:
             This memo.

      Description:
             A simple and commonly used mechanism to connect web
             clients with origin server replicas is to use URL
             redirection.  Clients are redirected to a optimal web
             server replica via the use of the HTTP [6] protocol
             response code 307 Temporary Redirect. A web client
             establishes HTTP communication with one of the web server
             replicas.  The initially contacted replica origin web
             server can either choose to accept the service or redirect
             the client to the proper replica. Refer to section 10.3.8
             in HTTP/1.1 RFC2616 for information on HTTP response code
             307.

      Security:
             Relies entirely upon HTTP security.

      Deployment:
             Observed at a number of large web sites.  Extent of usage
             in the Internet is unknown at this time.

      Submitter:
             Document editors.

 4.3 DNS Redirection [21]

      Authoritative reference:
             Load balancing: RFC1794 DNS Support for Load Balancing
             Proximity: This memo

   [Ed note;  it would have been nice to cite SONAR, but draft has
   expired]

      Description:
             The Domain Name Service (DNS) provides a more


Melve, Tomlinson, Cooper                                       [Page 14]


Replication and Caching Taxonomy       June 25, 1999


             sophisticated client to replica communication mechanism.
             This is accomplished by DNS servers that implement order
             of addresses based upon quality of service policies. When
             a web client resolves the name of a web server, the
             enhanced DNS server orders the IP addresses of the web
             server starting with the most optimal replica and ending
             with the least optimal replica.

      Security:
             Relies entirely upon DNS security.

      Deployment:
             Observed at a number of large web sites and large ISP web
             hosted services.  Extent of usage in the Internet is
             unknown at this time.

      Submitter:
             Document editors.


5. Inter-Replica Communication

   This section describes the cooperation and communication between
   replica origin servers.  Used in replicating data sets between origin
   servers.

 5.1 Batch Driven Mirror Replication

      Authoritative reference:
             This memo.

      Description:
             In this model, the replica web server to be updated
             initiates communication with a master origin web server.
             The communication is established at intervals based upon
             queued transactions which are scheduled for deferred
             processing. The scheduling mechanism policies vary, but
             generally are reoccuring at a specified time.  Once
             communication is established, data sets are copied to the
             initiating replica web server.

      Security:
             Relies upon the protocol being used to transfer the data
             set. FTP and RDIST are the most common protocols observed.

      Deployment:
             Very common for mirror synchronization in the Internet.


Melve, Tomlinson, Cooper                                       [Page 15]


Replication and Caching Taxonomy       June 25, 1999


      Submitter:
             Document editors.

 5.2 Demand Driven Mirror Replication

      Authoritative reference:
             This memo.

      Description:
             In this model, the replica web server acquires the content
             as needed due to demand.  This is generally done by web
             server accelerators (reverse proxy) operating as origin
             server replicas.  When a web client requests a URL that is
             not in the data set or the replica origin server, the
             replica server attempts to acquire it from a master origin
             server and forwarded on to the requesting web client.

      Security:
             Relies upon the protocol being used to transfer the URLs.
             FTP, Gopher, HTTP and ICP are the most common protocols
             observed.

      Deployment:
             Observed at several large web sites. Extent of usage in
             the Internet is unknown at this time.

      Submitter:
             Document editors.


 5.3 Synchronized Replication

      Authoritative reference:
             This memo. [Ed note; there is no IETF protocol specified at
                         this time.  The editors are aware of at least
                         two open source protocols, AFS and CODA, along
                         with one expired IETF draft
                         <draft-leach-cifs-v1-spec-01.txt> and one
                         proprietary protocol Novell NRS; none of which
                         can be considered an authoritative reference]

      Description:
              In this model, the replicated origin servers cooperate
              using synchronized strategies and specialized replica
              protocols to keep the replica data sets coherent.
              Synchronization strategies range from tightly coherent (a
              few minutes) to loosely coherent (a few or more hours).
              Updates occur between replicas based upon the


Melve, Tomlinson, Cooper                                       [Page 16]


Replication and Caching Taxonomy       June 25, 1999


              synchronization time constraints of the coherency model
              employed and are generally in the form of deltas only.

      Security:
             All of the known protocols utilize strong cryptographic key
             exchange methods, which are either based upon the Kerberos
             shared secret model or the public/private key RSA model.

      Deployment:
             Observed at a few sites, primarily at university campuses.

      Submitter:
             Document editors.

6. Client to Proxy Configuration

   This section describes the configuration, cooperation and
   communication between end user clients (browsers and applications) a
   proxy.


 6.1 Manual Proxy Configuration

      Authoritative reference:
             This memo.

      Description:
             Each user needs to configure its web client by typing in
             information pertaining to proxied protocols and local
             policies.

      Security:
             The potential for doing wrong is high, as each user
             individually sets preferences.

      Deployment:
             Widely deployed, used in all current browsers. Most
             browsers support other options as well.

      Submitter:
             Document editors.


 6.2 Proxy Auto Configuration (PAC) [7]

   [Ed note: Does it really need to be submitted for Informational RFC?]

      Authoritative reference:


Melve, Tomlinson, Cooper                                       [Page 17]


Replication and Caching Taxonomy       June 25, 1999


             No RFC published, no Internet-Draft
             Navigator Proxy Auto-Config File Format. Available from
             http://home.netscape.com/eng/mozilla/2.0/
               relnotes/demo/proxy-live.html

      Description:
             A JavaScript page on a web server hands out information on
             where to find proxies. Clients need to point at the URL of
             this page. No bootstrap mechanism, manual configuration
             necessary.

             Manual configuration is made easier by centralizing the
             script to one URL.

      Security:
             Common policy per organization possible. Does still require
             manual configuration. PAC is better than "manual proxy
             configuration" because with PAC administrators can update
             the proxy configuration without user intervention.

             Interoperability of PAC files is not as good as wanted,
             since more popular browsers have slightly different
             interpretation of the script, and this may lead to
             undesired effects.

      Deployment:
             Implemented in most web clients.

      Submitter:
             Document editors.


 6.3 Cache Array Routing Protocol (CARP) v1.0 [9]

   [Ed note: Current draft expired. A new draft must submitted and this
   section completed for this protocol to be considered in the Taxonomy]

      Authoritative reference:
             Expired Internet-Draft draft-vinod-carp-v1-03.txt
             Work in progress.

      Description:
             Clients may use CARP directly as a hash function based
             proxy selection mechanism. They need to be configured with
             the location of the cluster information.

      Security:


Melve, Tomlinson, Cooper                                       [Page 18]


Replication and Caching Taxonomy       June 25, 1999


      Deployment:

      Submitter:


 6.4 Web Proxy Auto-Discovery Protocol (WPAD) [8]


      Authoritative reference:
             Internet Draft <draft-ietf-wrec-wpad-00.txt>

   [Ed note; I-D submission anticipated by 6/25/99]
             Work in progress.

      Description:
             WPAD uses a collection of pre-existing Internet resource
             discovery mechanisms to perform web proxy auto-discovery.

             The only goal of WPAD is to locate the PAC URL. WPAD does
             not specify which proxies will be used. WPAD gets you to
             the PAC URL, and the PAC script chooses the proxies for
             you.

             The WPAD protocol specifies the following:

             + how to use each mechanism for the specific purpose of
               web proxy auto-discovery
             + the order in which the mechanisms should be performed
             + the minimal set of mechanisms which must be attempted
               by a WPAD compliant web client

             The resource discovery mechanisms utilized by WPAD are as
             follows:

             + Dynamic Host Configuration Protocol DHCP
             + Service Location Protocol SLP
             + "Well Known Aliases" using DNS A records
             + DNS SRV records
             + "service: URLs" in DNS TXT records

      Security:
             Relies upon DNS and HTTP security.

      Deployment:
             Implemented in web clients and caching proxy servers. More
             than two independent implementations.

      Submitter:


Melve, Tomlinson, Cooper                                       [Page 19]


Replication and Caching Taxonomy       June 25, 1999


             Josh Cohen, Microsoft, joshco@microsoft.com


7. Inter-Cache Communication

   [Ed note: INGRID.  Review and chase submissions (push Duane)]

   This section describes the cooperation and communication between
   caching proxies.

 7.1 Internet Cache Protocol (ICP) [10, 11, 12, 13, 14]

      Authoritative reference:
             RFC 2186 Internet Cache Protocol (ICP), version 2

      Description:
             ICP is used by caches to query other caches about web
             objects, to see if a web object is present at the other
             cache.

             ICP uses UDP. Since UDP is unreliable, an estimate of
             network congestion and availability may be calculated
             by ICP loss. This rudimentary loss measurement does,
             together with round trip times provide a load balancing
             method for caches.

      Security:
             ICP does not convey information about HTTP headers
             associated with a web object. HTTP headers may include
             access control and cache directives, Since caches ask for
             objects, and then download the objects using HTTP, false
             cache hits may occur (object present in cache, but not
             accessible for sibling cache is one example).

             ICP suffer from all the security problems of UDP.

      Deployment:
             Widely deployed. Most current cache implementations support
             ICP in one form or the other.

      Submitter:
             Document editors.

 7.2 Hyper Text Caching Protocol (HTCP/0.0) [15]


   [Ed note: Current draft expired. A new draft must submitted for this
   protocol to be considered in the Taxonomy. Based upon reviewers


Melve, Tomlinson, Cooper                                       [Page 20]


Replication and Caching Taxonomy       June 25, 1999


   comments, the editors would like to drop this protocol from current
   Taxonomy consideration, due to its experimental nature]

      Authoritative reference:
             Expired Internet Draft draft-vixie-htcp-proto-03.txt,
             Work in Progress

      Description:
             HTCP is a protocol for discovering HTTP caches and cached
             data, managing sets of HTTP caches, and monitoring cache
             activity.

             HTCP includes HTTP headers, while ICPv2 does not. HTTP
             headers are vital information for web proxy caches.

      Security:
             Optionally uses the MD5 shared secret authentication.
             Lack of authentication option make protocol subject to
             attack.

      Deployment:
             Implemented in caching proxies (two independent
             implementations)

      Submitter:
             Document editors.

 7.3 Cache Array Routing Protocol (CARP) v1.0 [9]

   [Ed note: Current draft expired. A new draft must submitted and this
   section completed for this protocol to be considered in the Taxonomy]

      Authoritative reference:
             Work in Progress: Internet-Draft draft-vinod-carp-v1-03.txt

      Description:
             CARP is a hashing function for dividing URL-space among a
             cluster of proxy caches. Included in CARP is the definition
             of a Proxy Array Membership Table, and ways to download
             this information.

             An HTTP client agent (either a proxy server or a client
             browser) which implements CARP v1.0 can allocate and
             intelligently route requests for the correct URLs to any
             member of the Proxy Array. Due to the resulting sorting of
             requests through these proxies, duplication of cache
             contents is eliminated and global cache hit rates may be
             improved.


Melve, Tomlinson, Cooper                                       [Page 21]


Replication and Caching Taxonomy       June 25, 1999


      Security:

      Deployment:
             Implemented in caching proxy servers. More than two
             independent implementations.

      Submitter:


 7.4 Cache Digest [16]

   [Ed note: Does it really need to be submitted for Informational RFC?]

      Authoritative reference:
             No RFC published, no Internet-Draft
             Cache Digest specification
             http://squid.nlanr.net/Squid/CacheDigest/
                    cache-digest-v5.txt
             Squid Digest FAQ entry
             http://squid.nlanr.net/Squid/FAQ/FAQ-16.html

      Description:
             Cache Digests are a response to the problems of latency
             and congestion associated with previous inter-cache
             communications mechanisms such as the Internet Cache
             Protocol (ICP) [10, 11] and the HyperText Cache Protocol
             [15]. Unlike most of these protocols, Cache Digests
             support peering between cache servers without a
             request-response exchange taking place. Instead, a summary
             of the contents of the server (the Digest) is fetched by
             other servers which peer with it. Using Cache Digests it
             is possible to determine with a relatively high degree of
             accuracy whether a given URL is cached by a particular
             server.

             Cache Digests are both an exchange protocol and a data
             format [16a,16b].

      Security:
             If the contents of a Digest is sensitive, it should be
             protected from access by The Wrong People. Any methods
             which would normally be applied to secure an HTTP
             connection can be applied to Cache Digests.

             A 'Trojan horse' attack is currently possible in a cache
             mesh: Cache A can build a fake peer Digest for cache B and
             serve it to B's peers if requested. This way A can direct
             traffic toward/from B. The impact of this problem is


Melve, Tomlinson, Cooper                                       [Page 22]


Replication and Caching Taxonomy       June 25, 1999


             minimized by the 'pull' model of transferring Cache
             Digests from one server to another.

             Cache Digests provide knowledge about peer cache content
             on a URL level. Hence, they do not dictate a particular
             level of policy management and can be used to implement
             various policies on any level (user, organization, etc.).

      Deployment:
             Cache Digests are supported in Squid; several commercial
             vendors are looking into Digest support.

             Cache Meshes:
             + NLANR Mesh
             + TF-CACHE mesh (European Academic networks)

      Submitter:
             Alex Rousskov, NLANR, rousskov@nlanr.net

 7.5 Cache Pre-filling [23]

      Authoritative reference:
             Internet Draft <draft-lovric-francetelecom-
   satellites-00.txt>
             Work in progress.

      Description:
             Cache pre-filling is a push-caching implementation. It is
             particularly well adapted to IP-multicast networks because
             it allows preselected URLs to be inserted in one single
             time within all the caches that belong to the targeted
             multicast group.  Different implementations of cache
             pre-filling already exist, especially in satellite
             contexts.  However, there is still no standard for this
             kind of push-caching and vendors propose solutions either
             based on dedicated equipments or public domain caches
             extended with a pre-filling module.

      Security:
             Relies on the inter cache protocols being employed.

      Deployment:
             Observed in two commercial content distribution service
             providers.

      Submitter:
             Ivan Lovric, France Telecom,
   ivan.lovric@cnet.francetelecom.fr


Melve, Tomlinson, Cooper                                       [Page 23]


Replication and Caching Taxonomy       June 25, 1999


8. Network Element Communication

   This section describes the cooperation and communication between
   caching proxy and network elements.  Examples include routers and
   switches.  Generally used for transparent caching and/or diffused
   arrays.

 8.1 Web Cache Coordination Protocol (WCCP)


      Authoritative reference:
             Internet Draft <draft-ietf-wrec-web-pro-00.txt> [18]
             Work in progress.

      Description:
             WCCP V1 runs between a router functioning as a redirecting
             network element and out-of-path transparent caching
             proxies. The protocol allows one or more caching proxies
             to register themselves with a single router to receive
             redirected web traffic. It also allows one of the proxies,
             the designated proxy, to dictate to the router how
             redirected web traffic is distributed across the caching
             proxies.

      Security:
             WCCP V1 has no security features.

      Deployment:
             Network elements: WCCP V1 is deployed on a wide range of
             Cisco routers.
             Caching proxies: WCCP V1 is deployed on a number of
             vendors' caches.

      Submitter:
             David Forster, CISCO, dforster@cisco.com


 8.2 Transparent Proxy Agent Control Protocol (TPACT)

      Authoritative reference: [Ed note; anticipated submission]
             Internet Draft <draft-ietf-wrec-tpact-00.txt> [22] [Ed
   note; I-D submission anticipated by 6/25/99]
             Work in progress.

      Description:
             TPACT runs between a network elements (router or switch)
             functioning as a redirecting network element and
             out-of-path transparent caching proxies. The protocol


Melve, Tomlinson, Cooper                                       [Page 24]


Replication and Caching Taxonomy       June 25, 1999


             allows one or more caching proxies to register themselves
             with a single network element to receive redirected web
             traffic. All of the participating caching proxies operate
             as a quorum in the diectating of web traffic distribution
             across the group.

      Security:
             MD5 is optionally employed for authentication.  Sequence
             numbers are employed as security against replay attacks.

      Deployment:
             Network elements: TPACT is prototyped and being evaluated
             on multiple vendor L4 switches.
             Caching proxies: TPACT is prototyped and being evaluated
             on multiple vendor caches.

      Submitter:
             John Martin, Network Appliance, jmartin@netapp.com


 8.3 SOCKS [19]

      Authoritative reference:
             RFC1928 SOCKS Protocol Version 5

      Description:
             SOCKS is primarily used as a proxy cache to firewall
             protocol.  Although, firewalls don't conform to the
             narrowly defined network element definition of routers and
             switches, they are a integral part of the network
             infrastructure.  When used in conjunction with a firewall,
             SOCKS provides a authenticated tunnel between the proxy
             cache and the firewall.

      Security:
             A extensive framework provides for multiple authentication
             methods.  Currently, SSL, CHAP, DES, 3DES are known to be
             available.

      Deployment:
             SOCKS is been widely deployed in the Internet.

      Submitter:
            Document editors.


   9. Security Considerations


Melve, Tomlinson, Cooper                                       [Page 25]


Replication and Caching Taxonomy       June 25, 1999


   [Ed note: INGRID.  Send to list, more information needed]

   Information on security in each protocol is provided in the
   description of the protocol, and in the accompanying RFC for each
   protocol.


   Refer to section 15 in HTTP/1.1bis draft-ietf-http-v11-spec-
   rev-06.txt

 Man in the middle attacks

   Refer to HTTP/1.1bis, chapter 15.7

   HTTP proxies are men-in-the-middle, the perfect place for a man-in-
   the-middle-attack.

 Denial of service

 Individual protocols

   See documentation for each protocol for discussion of security
   issues.

 Trusted parties

   You need to trust your proxy.

 Stupid configuration

   It is quite easy to have a stupid configuration which will harm
   service for end users.

 Privacy

   Logs from proxies need to be kept secure, as they provide information
   about users and end user patterns.  A proxy log is even more
   sensitive than a web server log, as all requests from the user
   population goes through the proxy.  Logs from replication servers may
   need to be amalgamated to get aggregated statistics from a service,
   transporting logs across borders may have legal implications.  Log
   handling is restricted by law in some countries.

   Requirements for object security and privacy are the same in a web
   replication and caching system as it is in the Internet at large.
   The only reliable solution is strong cryptography.  End to end
   encryption does not necessarily make objects cacheable, as is the


Melve, Tomlinson, Cooper                                       [Page 26]


Replication and Caching Taxonomy       June 25, 1999


   case of SSL encrypted web sessions.


   Communication

 Transient copies

   The legislative forces of the world are still out on the question of
   transient copies, like those kept in replication and caching system,
   being legal.  Legal implications of replication and caching is
   subject to local law.


10. Acknowledgements

   [Ed note: No decision made on authors list. Submitters of individual
   entries are acknowledged in the text. Need to sort out how to give
   credits where they are due.]

   David Forster, Cisco, dforster@cisco.com provided info on Out-of-path
   Transparent Caching Proxies.

   Alex Rousskov, David Forster, Josh Cohen and John Martin for protocol
   information.

   John Dilley, Ivan Lovric and Joe Touch for terminology and taxonomy
   information.

   David Forster, Josh Cohen, Henrik Nordstrom and Patrick McManus for
   their help in defining proxy transparency.

11. References

   [1] Duane Wessels.  Squid FAQ: Transparent Caching/Proxying.
   National Laboratory for Applied Network Research.  Available from:
   http://squid.nlanr.net/Squid/FAQ/FAQ-17.html

   [2] Peter Danzig and Karl L. Swartz.  Transparent, Scalable, Fail-
   Safe Web Caching.  Network Appliance, Inc. Available from
   http://www.netapp.com/technology/level3/3033.html

   [3] Bert Williams. Transparent Web Caching Solutions.  Alteon
   Networks.  Available from Transparent Web Caching Solutions

   [4] Tony Hain.  Architectural Implications of NAT. Internet
   Architecture Board. Internet Draft (Work in Progress). Available from
   ftp://ftp.nordu.net/internet-drafts/draft-iab-nat-implications-02.txt


Melve, Tomlinson, Cooper                                       [Page 27]


Replication and Caching Taxonomy       June 25, 1999


   [5] Ingrid Melve, Lars Slettjord, Ton Verschuren, Henny Bekker,
   Technical report European Union RE1004-M4.3 "Web caching
   architecture"

   [6] Fielding, et al. Hypertext Transfer Protocol -- HTTP/1.1.  IETF
   RFC2616. Available from http://www.rfc-editor.org/rfc/rfc2616.txt

   [7] Netscape, Inc. Navigator Proxy Auto-Config File Format.
   Available from
   http://home.netscape.com/eng/mozilla/2.0/relnotes/demo/proxy-
   live.html

   [8] Paul Gauthier, J. Cohen, Martin Dunsmuir and Charles Perkins.
   The Web Proxy Auto-Discovery Protocol. Internet Draft. Available from
   http://www.ietf.org/internet-drafts/draft-ietf-wrec-wpad-00.txt

   [9] Vinod Valloppillil and Keith W. Ross.  Cache Array Routing
   Protocol. Internet Draft (Work in Progress) Available from
   ftp://ftp.nordu.net/internet-drafts/draft-vinod-carp-v1-03.txt

   [10] D. Wessels and K. Claffy. Internet Cache Protocol (ICP), version
   2. 'RFC2186. Available from ftp://ftp.nordu.net/rfc/rfc2186.txt

   [11] D. Wessels and K. Claffy. Application of Internet Cache Protocol
   (ICP), version 2, RFC2187. Available from
   ftp://ftp.nordu.net/rfc/rfc2187.txt

   [12] Ivan Lovric. Internet Cache Protocol Extension Internet Draft
   (Work in Progress) Available from ftp://ftp.nordu.net/internet-
   drafts/draft-lovric-icp-ext-01.txt

   [13] Duane Wessels. ICP Home Page, National Laboratory for Applied
   Research.  Available from [52]http://ircache.nlanr.net/Cache/ICP/

   [14] University of Southern California.  Internet Cache Protocol
   Specification 1.4. Available from
   http://excalibur.usc.edu/icpdoc/icp.html

   [15] Paul Vixie and Duane Wessels. Hyper Text Caching Protocol
   (HTCP/0.0). Internet Draft (Work in Progress) Available from
   ftp://ftp.nordu.net/internet-drafts/draft-vixie-htcp-proto-03.txt

   [16] Alex Rouskov and Duane Wessels. Cache Digests.  National
   Laboratory for Applied Network Research.  Available from [16a] Cache
   Digest specification http://squid.nlanr.net/Squid/CacheDigest/cache-
   digest-v5.txt [16b] Squid Digest FAQ entry
   http://squid.nlanr.net/Squid/FAQ/FAQ-16.html


Melve, Tomlinson, Cooper                                       [Page 28]


Replication and Caching Taxonomy       June 25, 1999


   [17] Berners-Lee, et al. Hypertext Transfer Protocol -- HTTP/1.0 IETF
   RFC1945 Available from http://www.rfc-editor.org/rfc/rfc1945.txt

   [18] Cisco Web Cache Coordination Protocol V1.0. Internet Draft.
   Available from http://www.ietf.org/internet-drafts/draft-ietf-wrec-
   web-pro-00.txt

   [19] Leech, et al. SOCKS Protocol Version 5, RFC1928 Available from
   http://www.rfc-editor.org/rfc/rfc1928.txt

   [20] Keith Moore, On the use of HTTP as a Substrate for Other
   Protocols. Internet Draft (Work in Progress) Available from
   ftp://ftp.nordu.net/internet-drafts/draft-iesg-using-http-00.txt

   [21] Brisco, T. DNS Support for Load Balancing.  RFC1794. Available
   from http://www.rfc-editor.org/rfc/rfc1794.txt

   [22] Cerpa, et al. Transparent Proxy Agent Control Protocol.
   Internet Draft.  Available from http://www.ietf.org/internet-
   drafts/draft-ietf-wrec-tpact-00.txt

   [23] Goutard, et al.  Pre-filling a cache - A satellite overview.
   Internet Draft.  Available from http://www.ietf.org/internet-drafts/
   draft-lovric-francetelecom-satellites-00.txt


12. Authors' Addresses

   Ingrid Melve
   UNINETT
   Tempeveien 22, Trondheim, NORWAY
   Phone: +47 73 55 79 07
   Email: Ingrid.Melve@uninett.no

   Gary Tomlinson
   Novell, Inc.
   122 East 1700 South
   Provo, Utah 84606 USA
   Phone: +1 801 861 7021
   Email: garyt@novell.com

   Ian Cooper
   Mirror Image Internet, Inc.
   18 Commerce Way, Suite 4800
   Woburn, MA 01801 USA
   Phone: +1 800 353 2923
   Email: ian@mirror-image.com


Melve, Tomlinson, Cooper                                       [Page 29]