Network Working Group                                      M. Nottingham
Internet-Draft                                       Akamai Technologies
Expires: January 5, 2001                                    July 7, 2000


        Requirements for Demand-Driven Surrogate Origin Servers
                     draft-nottingham-surrogates-00

Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that
   other groups may also distribute working documents as
   Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other documents
   at any time. It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."
   The list of current Internet-Drafts can be accessed at
     http://www.ietf.org/ietf/1id-abstracts.txt

     The list of Internet-Draft Shadow Directories can be accessed at
     http://www.ietf.org/shadow.html.
   This Internet-Draft will expire on January 5, 2001.

Copyright Notice

   Copyright (C) The Internet Society (2000). All Rights Reserved.

Abstract

   This document states requirements for demand-driven surrogate origin
   servers, also known as reverse proxies and Web accelerators. 


Nottingham              Expires January 5, 2001                 [Page 1]

Internet-Draft    Demand-Driven Surrogate Origin Servers       July 2000


1. Introduction

   A surrogate origin server (also known as a reverse proxy or HTTP
   accelerator) is a device that authoritatively serves requests on
   behalf of an origin server (known as its master origin server)[1].

   Demand-driven surrogate origin servers are populated by the traffic
   flowing through them; when a client requests an object which is not
   resident, they will fetch it from the master origin server.

   It may be useful to conceptualize a demand-driven surrogate as an
   origin server that happens to be populated via the HTTP on the back
   end.

   In many ways, they are similar to proxy/caches, and often leverage
   proxy/cache software. However, surrogates serve content
   authoritatively, and therefore take the role of an origin server,
   not a proxy, to downstream clients.

   Unfortunately, the use of a proxy/cache as a surrogate origin server
   introduces several problems in protocol implementation, due to this
   changing of roles. This document attempts to rectify such
   inconsistencies.

   Additionally, master origin server administrators usually have a
   greater degree of control over the activity and use of surrogates
   than they would over proxies. Because of this close relationship,
   more precise control over the behavior of the surrogate can be given
   to the administrator.

   This document specifies acceptable mechanisms for doing so.

1.1 Requirements

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119[4].

   An implementation is not compliant if it fails to satisfy one or
   more of the MUST or REQUIRED level requirements. An implementation
   that satisfies all the MUST or REQUIRED level and all the SHOULD
   level requirements is said to be "unconditionally compliant"; one
   that satisfies all the MUST level requirements but not all the
   SHOULD level requirements is said to be "conditionally compliant".

1.2 Terminology

   This document uses terms defined and explained in the WREC
   Taxonomy[1], and the HTTP/1.1 specification[2]. The reader is


Nottingham              Expires January 5, 2001                 [Page 2]

Internet-Draft    Demand-Driven Surrogate Origin Servers       July 2000


   expected to be familiar with both.

   In this document, the term "surrogate" is shorthand for a
   demand-driven surrogate origin server, unless explicitly stated
   otherwise. Similarly, "origin server" refers to a surrogate's master
   origin server.

2. Overview of Demand-Driven Surrogate Origin Servers

2.1 Uses and Characteristics

   In normal operation, demand-driven surrogate origin servers are
   deployed and maintained by (or on behalf of) the publisher of a Web
   site, rather than directly for end users (as a proxy would be). This
   is often done for a number of reasons, including (a non-exhaustive
   list):
   o  Reduction of load on the master origin server
   o  Reduction of network traffic to the master origin server
   o  Distribution of objects, in order to improve perceived latency by
      storing them closer to end users
   o  Introduction of content transformation or other value-added
      services

   Surrogate deployments may vary in several ways, including:
   o  Proximity - surrogates may be deployed close to the master origin
      server to reduce load on it, or near end users to reduce network
      traffic and improve perceived latency.
   o  Selection of surrogate objects - entire Web sites may be routed
      through surrogates, or a subset of a site's objects may be
      nominated for publication through them, depending on the effect
      desired, and the nature of the surrogate.
   o  Number of surrogates - surrogates may be deployed in any number.
      Localized surrogates may use any of a number of mechanisms to
      distribute requests between them, while distributed surrogates
      usually use wide-area DNS load balancing.

   By their nature, surrogates are never the parent or child of other
   surrogates. However, they MAY have such relationships with
   proxy/caches.

2.2 General Operation

2.2.1 Configuration

   In order to accept and properly handle requests on behalf of a
   master origin server, a surrogate needs to be aware of its master's
   identity, and the profile of traffic that will be served on its
   behalf.


Nottingham              Expires January 5, 2001                 [Page 3]

Internet-Draft    Demand-Driven Surrogate Origin Servers       July 2000


   Additionally, it may be desirable to configure surrogates with other
   information, including:
   o  Any encryption or authentication information required by the
      master origin server
   o  Default object handling information, including coherence
   o  Specific object handling information
   o  Other special instructions to the surrogate

   Surrogates may be configured by a variety of mechanisms, including
   manual, out-of-band, or vendor-specific.

   Some types of surrogate configuration may be communicated in-band,
   by HTTP headers described in this document. However, such
   information is not neccessarily limited to that form of
   communication.

   Manual and out-of-band configuration mechanisms may vary in
   implementation; specification of them is out of scope for this
   document.

2.2.2 Request Handling

   A surrogate is configured to forward traffic to a master origin
   server, so that the hostname of the surrogate may be used in
   published URLs.

   A surrogate MAY be configured to forward traffic to multiple master
   origin servers by using the Host request header to differentiate
   requests. In this scenario, requests without a Host header SHOULD be
   replied to with a 502 Gateway Error response status code.

   Surrogates MUST accept Absolute-URI[3] as well as Relative-URI
   requests and forward them to the master origin server, as
   configured. They MUST NOT forward Absolute-URI requests to origin
   servers that they have not been configured to serve.

   Surrogates MAY use encryption (SSL or TLS) on downstream, upstream
   or both connections.

2.3 Origin Server to Surrogate Optimizations

   Surrogates serve content on behalf of nominated origin servers,
   implying that the origin server administrator has access to
   configure, monitor and receive logs from the surrogate.

   Because of this, a greater degree of trust exists between them than
   there would be between an origin server and third-party proxies.
   This allows modification or extension of the relationship between
   them, to offer greater control and functionality.


Nottingham              Expires January 5, 2001                 [Page 4]

Internet-Draft    Demand-Driven Surrogate Origin Servers       July 2000


2.3.1 Separation of Coherence

   Origin server administrators are wary of trusting third-party caches
   to keep objects coherent, because they do not always implement
   coherence in a predictable or correct manner.

   Surrogate coherence behavior can be both predicted and tested by
   origin server administrators. However, there is still need to be
   able to describe object coherence to downstream caches.

   This leads to the need for separate coherence mechanisms; one
   between the master origin server and surrogates, and another between
   surrogates and their clients.

   This is accomplished by defining new, surrogate-specific mechanisms,
   while traditional coherence mechanisms retain their meaning for
   downstream caches. While the new mechanisms are introduced as HTTP
   headers here, they MAY also be communicated by separate
   configuration of the surrogate.

2.3.2 Protocol Feature Manipulation

   Surrogates MAY add end-to-end protocol features that are not
   supported by the origin server, in order to offer greater
   functionality to downstream clients.  For example, a surrogate could
   add ETag validators to objects, to improve downstream cacheability.

   Surrogates may also implement hop-by-hop mechanisms (such as
   transfer encoding for compression and persistent connections) that
   are lacking on the master origin server, to offer improved quality
   of service to their clients.

   When offering extended end-to-end features, surrogates MUST defer to
   support on the origin server; if a feature is present there, it
   cannot be overridden by the surrogate implementation.

2.4 Problems Introduced by Use of Proxies as Origin Servers

2.4.1 Dates and Age Calculation

   In HTTP/1.1[2] The Date response header is required to reflect the
   time that an object is generated on its origin server. Since
   surrogates serve content authoritatively, objects obtained from them
   can always be considered fresh, and SHOULD contain a current Date
   header.

   Passing non-current Date headers causes downstream caches to handle
   objects with an overly conservative freshness lifetime, if it is
   derived from either Cache-Control: max-age or some heuristic-based


Nottingham              Expires January 5, 2001                 [Page 5]

Internet-Draft    Demand-Driven Surrogate Origin Servers       July 2000


   freshness algorithms.

2.4.2 Interpretation of Proxy-Specific Information

   Request headers such as Pragma: no-cache and some Cache-Control
   headers, if honored by surrogates, may cause excessive and
   unnecessary load on the master origin server.

2.4.3 Logging

   Proxy-specific log formats may not be appropriate for use by a
   surrogate. In particular, master origin servers often log
   information such as the User-Agent and Referer presented by the
   client.

   Surrogates SHOULD be capable of logging such information, in a
   manner compatible with common origin server logs.

3. Specific Requirements

   Requirements for a surrogate are the same as those for a gateway or
   proxy in HTTP/1.1[2], except as noted.

3.1 Protocol Version Interpretation

   Implementations MUST satisfy the requirements of RFC 2145[5],
   including those behaviors specific to proxies.

3.2 Methods

   A surrogate MUST NOT accept CONNECT requests, or forward them to the
   master origin server.

   TRACE requests MAY be responded to as if max-forwards=0 were
   present, to keep the surrogate's relationship with the origin server
   private.

3.3 Status Codes

3.3.1 Redirections

   Surrogates receiving redirections (301, 302 and 307 status codes)
   SHOULD resolve them and serve the resulting object to clients.

   If surrogate-specific coherence is specified in a redirect, but not
   available for the resulting object, it SHOULD be applied to the
   object.


Nottingham              Expires January 5, 2001                 [Page 6]

Internet-Draft    Demand-Driven Surrogate Origin Servers       July 2000


3.3.2 Error Conditions

   Surrogates MUST NOT change the semantics of 4xx and 5xx series
   status codes obtained from origin servers. However, these responses
   MAY be cached for a short period.

   401 Unauthorized status codes MAY be generated to propagate HTTP
   authentication; see "Working with Protocol Extensions".

   Surrogates SHOULD send a 502 Bad Gateway error when
   surrogate-specific directives are incomplete, contradict themselves
   or don't parse correctly.

   A 504 Gateway Timeout response SHOULD be sent under any of the
   following conditions:
   o  DNS failure when resolving the origin server
   o  no route to origin server
   o  refused connection to origin server
   o  connection timeout to origin server

   However, a surrogate MAY be configured to use a cached resource, a
   different resource, or redirect to a different location under these
   conditions.

3.4 Cache Coherence and Correctness

   The RECOMMENDED mechanism for assuring coherence on surrogates is
   use of Surrogate-Control request and response headers.

   Surrogates MAY be configured to fall back to HTTP cache coherence
   (such as Expires and Cache-Control response headers), if
   surrogate-specific mechanisms are not available.

   Surrogate origin servers MAY also be configured to use a heuristic
   freshness algorithm to ensure coherence if no other freshness
   information is available.

   Because surrogates separate upstream and downstream coherence, they
   MAY also implement proprietary mechanisms for assuring coherence
   with the master origin server.

3.5 End-to-End Headers

   Because a surrogate assumes the role of an origin server in
   downstream connections, the scope of end-to-end headers is changed.
   Although many headers can be propagated from the origin server, some
   must be changed in order to ensure protocol compliance, and others
   can be changed to enhance or optimize downstream connections.


Nottingham              Expires January 5, 2001                 [Page 7]

Internet-Draft    Demand-Driven Surrogate Origin Servers       July 2000


3.5.1 Age

   Surrogates MUST strip any Age header from responses before
   forwarding them to clients.

   Surrogates MUST NOT add Age headers to responses.

   Age headers SHOULD be used by surrogates in Age calculations, when
   determining coherence with the master origin server.

3.5.2 Cache-Control Request Header

   Cache-Control headers in requests MUST NOT be honored by surrogates.

3.5.3 Cache-Control Response Header

   By default, Cache-Control headers in responses from a master origin
   server MUST NOT be honored by surrogates, and MUST be forwarded to
   clients.

   Surrogates SHOULD be able to be configured to honor Cache-Control
   response headers.

3.5.4 Date

   Surrogate origin servers MUST serve a current Date header with each
   response; they MUST NOT serve a cached Date header.

3.5.5 ETag

   If none are present, a surrogate MAY insert weak ETags as
   validators, if separate coherence with the master origin server has
   been established.

3.5.6 Expires

   By default, Expires response headers SHOULD NOT be honored by
   surrogates, unless configured to do so. Surrogates MUST forward
   Expires headers to clients.

   It has been observed that that if a Cache-Control: max-age response
   header is set, many origin servers will set a complimentary Expires:
   value, to duplicate the intended freshness effect for HTTP/1.0
   clients. To accommodate this, surrogates SHOULD recalculate the
   Expires header to match the delta communicated in Cache-Control:
   max-age, but only if both are present in a response, and are
   equivalent.

   Some older Web servers have been observed to set an Expires header


Nottingham              Expires January 5, 2001                 [Page 8]

Internet-Draft    Demand-Driven Surrogate Origin Servers       July 2000


   based on an offset from the Date, without setting a Cache-Control:
   max-age header. This is problematic, as it is difficult to
   distinguish these responses from those which wish to expire content
   at an absolute date.  Surrogates MAY compensate for this by
   considering objects which specify an Expires without a
   Cache-Control: max-age directive stale when the Expires time is
   reached; however, this may have undesirable effects in some
   situations.

3.5.7 Host

   Surrogate origin servers MUST replace any Host header in requests
   with the name of the appropriate master origin server before
   forwarding it.

3.5.8 Last-Modified

   Last-Modified response headers MUST NOT be modified by a surrogate.

3.5.9 Pragma

   Surrogate origin servers MUST NOT honor Pragma request directives.

3.5.10 Proxy-Authenticate

   Surrogates MUST NOT include a Proxy-Authenticate header in responses
   to clients.

3.5.11 Proxy-Authorization

   Surrogates MUST ignore Proxy-Authorization headers in requests from
   clients.

3.5.12 Server

   Surrogates MAY set their own Server response header, replacing any
   present.

3.5.13 Via

   Surrogates SHOULD append a Via header to requests, as outlined in
   RFC2616[2].

3.6 Surrogate-Control HTTP Headers

   Surrogate-specific HTTP headers allow specification of metadata in
   requests or responses to the surrogate. These can be though of as
   analogies of cache-affecting headers such as Cache-Control.


Nottingham              Expires January 5, 2001                 [Page 9]

Internet-Draft    Demand-Driven Surrogate Origin Servers       July 2000


   Surrogate-Specific headers MUST be consumed before forwarding a
   request or response.

3.6.1 Surrogate-Control Request Header

   Surrogate-Control request directives have similar semantics and
   effects as Cache-Control request headers. Defined directives are: 

   no-cache 
      Has same meaning as a Cache-Control: max-age request directive to
      a proxy. 
   only-if-cached 
      Has same meaning as a Cache-Control: only-if-cached request
      directive to a proxy. 

3.6.2 Surrogate-Control Response Header

   Surrogate-Control response directives have similar semantics and
   effects as Cache-Control response headers. Defined directives are: 

   max-age 
      Has same meaning as a Cache-Control: max-age response directive
      to a proxy. 
   no-cache 
      Has same meaning as a Cache-Control: no-cache response directive
      to a proxy. 
   must-revalidate 
      Has same meaning as a Cache-Control: must-revalidate response
      directive to a proxy. 

   Surrogates SHOULD require some form of client authentication when
   honoring Surrogate-Control response directives.

3.7 Surrogate-Generated Headers

3.7.1 X-Forwarded-For Request Header

   Surrogates SHOULD be capable of adding a header that denotes the
   client which requested the object.

3.7.2 X-Served-For Response Header

   Surrogates MAY add a response header which denotes the name of the
   master origin server, if it is not obvious in the Request-URI, in
   order to enable third parties to identify the source of the content.

4. Working with Protocol Extensions


Nottingham              Expires January 5, 2001                [Page 10]

Internet-Draft    Demand-Driven Surrogate Origin Servers       July 2000


4.1 HTTP Authentication

   Surrogates receiving responses with WWW-Authenticate headers MUST
   NOT serve them without assuring that the client has presented proper
   credentials.

   HTTP Authentication may also be used to prevent access to the origin
   server by unauthorised clients, while allowing unauthenticated
   access to the objects through the surrogate. To accomplish this, a
   surrogate MAY be configured to send Authorization request headers,
   with a predetermined authentication realm.

5. Controlling Effects of Upstream Proxies

   Surrogates SHOULD append appropriate Cache-Control and Pragma
   request headers to assure that any intermediate proxy/caches do not
   serve a response without validation on the master origin server.

6. Security Considerations

6.1 Surrogate to Origin Authentication and Security

   Surrogates SHOULD allow use of SSL on the connection to the origin
   server, while serving objects unencrypted, to increase security
   between them.

   They SHOULD also support at least one of the following
   authentication mechanisms for origin server access:
   o  Client-Side SSL Certificates
   o  HTTP Authentication into a specific realm (see "HTTP
      Authentication")
   o  Cookie-based authentication (using cookie value as shared secret)

6.2 Knowledge of Surrogate/Origin Relationship

   It may or may not be necessary to hide the relationship between
   surrogates and origin servers, depending on the nature of their use.

   Surrogates SHOULD allow configuration to accomplish this.
   Specifically, this includes all HTTP headers that identify responses
   as coming from a surrogate, TRACE requests, and error responses and
   warnings that identify the surrogate.

References

   [1]  Cooper, I., Melve, I. and G. Tomlinson, "Internet Web
        Replication and Caching Taxonomy", November 1999.

   [2]  Fielding, R., Gettys, J., Mogul, J. C., Frystyk, H., Masinter,


Nottingham              Expires January 5, 2001                [Page 11]

Internet-Draft    Demand-Driven Surrogate Origin Servers       July 2000


        L., Leach, P. and T. Berners-Lee, "Hypertext Transfer Protocol
        - HTTP/1.1", RFC 2616, June 1999.

   [3]  Berners-Lee, T., Fielding, R.T. and L. Masinter, "Uniform
        Resource Identifiers (URI): Generic Syntax", RFC 2396, August
        1998.

   [4]  Bradner, S., "Key words for use in RFCs to Indicate Requirement
        Levels", RFC 2119, March 1997.

   [5]  Fielding, R., Gettys, J., Mogul, J. C. and H. Frystyk, "Use and
        Intepretation of HTTP Version Numbers", RFC 2145, May 1997.


Author's Address

   Mark Nottingham
   Akamai Technologies
   Suite 703, 1400 Fashion Island Bvld
   San Mateo, CA  94404
   US

   EMail: mnot@akamai.com
   URI:   http://www.akamai.com/

Appendix A. Acknowledgements

   The author gratefully acknowledges the contributions of: John
   Dilley, John Martin, Joel Wein, Peter Danzig, Chuck Neerdaels, and,
   David Karger.


Nottingham              Expires January 5, 2001                [Page 12]

Internet-Draft    Demand-Driven Surrogate Origin Servers       July 2000


Full Copyright Statement

   Copyright (C) The Internet Society (2000). All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph
   are included on all such copies and derivative works. However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Acknowledgement

   Funding for the RFC editor function is currently provided by the
   Internet Society.


Nottingham              Expires January 5, 2001                [Page 13]