WREC Working Group                                     Nicolas Saillard
INTERNET-DRAFT                                              Ivan Lovric
draft-saillard-cache-mesh-design-00.txt            
February 26, 1999
Expires August 26, 1999


                       Cache Mesh Evaluation and Design


Status of this Memo

     This document is an Internet-Draft and is in full conformance
     with all provisions of Section 10 of RFC2026.

     Internet-Drafts are working documents of the Internet Engineering
     Task Force (IETF), its areas, and its working groups.  Note that
     other groups may also distribute working documents as
     Internet-Drafts.

     Internet-Drafts are draft documents valid for a maximum of six
     months and may be updated, replaced, or obsoleted by other
     documents at any time.  It is inappropriate to use Internet-
     Drafts as reference material or to cite them other than as
     "work in progress."

     The list of current Internet-Drafts can be accessed at
     http://www.ietf.org/ietf/1id-abstracts.txt

     The list of Internet-Draft Shadow Directories can be accessed at
     http://www.ietf.org/shadow.html.

Abstract

This Draft describes a general method of determining a caching solution
adapted to a given traffic.

This method uses a simulation software developed at France Telecom 
which allows different existing architectures, protocols and also 
different streams of traffic to be simulated. Using this software, we 
explain how to design completely a new cache mesh and to calculate its 
parameters in order to maximize performances and minimize costs.


FRANCE TELECOM                 Expires:  August 1999          [Page 1]
INTERNET-DRAFT     Cache Mesh Evaluation and Design     February 1999

Table of contents

0 Overview
1 How to determine a caching solution
  1.1 Characterizing a real system
      1.1.1 Architecture and measurements
      1.1.2 Cost-benefit function
  1.2 Capturing traffic characteristics
  1.3 Running the simulation
      1.3.1 Validity of using the simulator
      1.3.2 Parameters step by step
      1.3.3 Validity of the solution
      1.3.4 Anticipation of further evolutions of traffic
  1.4 Interpreting results
2 The simulation software
  2.1 The controller
      2.1.1 HTTP protocol
      2.1.2 Time management
      2.1.3 Measurements
  2.2 Elements of a Web architecture represented as objects
      2.2.1 Client
      2.2.2 Server
      2.2.3 Cache
      2.2.4 Link  
  2.3 Graphical User Interface
3 First results
  3.1 Problem to solve
  3.2 Simulation software used
      3.2.1 The controller 
      3.2.2 Client
      3.2.3 Server
      3.2.4 Cache
      3.2.5 Evolutions of the software
  3.3 Experiments
      3.3.1 Cost function
      3.3.2 Traffic model
      3.3.3 Validation
      3.3.4 Simulations
      3.3.5 Results
4 Conclusion and further work
5 References
6 Acknowledgments
7 Authors' Addresses


FRANCE TELECOM                 Expires:  August 1999          [Page 2]
INTERNET-DRAFT     Cache Mesh Evaluation and Design     February 1999


0 Overview

In a network dedicated to the web, caching solutions permit latencies 
to be reduced. To configure such a solution, many different choices are
available. Even in a single cache solution, we have to determine the 
cache size, and much of the criteria in the configuration file which 
affect the replacement and consistency policy. With more complex 
architectures, we will have to choose between cooperation protocols.

But, our choice must take into account the actual network topology, the
clients' traffic, the latencies due to the Internet and all the 
evolutions of all these factors. This problem is complex and many 
solutions are empirically determined. Several single cache solutions 
are proposed by cache constructors in function of the context of 
utilization ( isp, intranet, university ) and the number of clients. 
Generally these solutions are oversized to make sure that the solution 
will work but without giving any estimation of its viability. It would 
be more interesting to be able to determine a solution adapted to a 
given traffic and that ensures good performances for a given period, 
for the lower cost. This solution would also be able to evolve for a 
lower cost with traffic.

In order to determine such a solution, we describe a method based on a 
simulation software. Then we use this method and the simulation 
software to determine the size of a parent cache.

1 How to determine a caching solution

The objective of using the simulator is to find a solution of caches 
adapted to a given traffic. This solution has to provide the best 
services for the lowest cost. That is why the first step is to define 
the existing architecture.

1.1 Characterizing a real system

For our objectives we need a description of the actual architecture 
in many points. Then to determine a better architecture, we need to
evaluate architectures by means of a cost-benefit analysis.

1.1.1 Architecture and measurements

The complete description of the actual web architecture is composed of:

- Each node description (cache, server, ..)
- Each link description
- Each protocol description


FRANCE TELECOM                 Expires:  August 1999          [Page 3]
INTERNET-DRAFT     Cache Mesh Evaluation and Design     February 1999


Another important point is the traffic. That is why we need trace 
files of caches in order to get informations on streams of requests 
that arrive in input to caches. These log files must be captured over 
specific periods ( a day for example )and on different locations, and 
measurements must be made for the same periods. These measurements 
could be :

- Latency
- Bandwidth gain
- hit rate (byte or request)
...

The list of these measurements depends on the objective of the new 
architecture.
Some of these measurements can be performed directly by analyzing the 
log traces, others require specific tools.

With all these data, we dispose of knowledge of the actual system and 
of its performances.

1.1.2 Cost-benefit function

The cost-benefit function [7] is the function which calculates the 
benefit cost ratio for a given configuration and for performance
measurements. This function helps us to evaluate each configuration 
and to determine the best caching solution. This function is dependent 
on the objectives and on the actual architecture.

The benefits for this function are the increase of quality of service 
that we determine by using measurements. The costs are administration 
cost and material cost, taking into account the existing architecture.

1.2 Capturing traffic characteristics

We want to replay the captured traces on different configurations of 
caches in order to know which of these configurations is the best 
adapted to the traces. But using a log file is not flexible and for 
our simulations, it would be more efficient to play a shorter part of 
the traffic and to divide it, or to play it for a given period. That 
is why we prefer to simulate this traffic. This generation of traffic 
must reproduce the characteristics of the real traffic which are 
constant from one log file to an other. A random process must be used 
in order to avoid adding specific characteristic. With such a system
 we aim at determining a caching solution adapted to the global 
characteristic of the traffic and not to a specific log file.


FRANCE TELECOM                 Expires:  August 1999          [Page 4]
INTERNET-DRAFT     Cache Mesh Evaluation and Design     February 1999


Characteristics we want to generate can vary with the given problem 
and are:

- Size of document
- Popularity of document
- Spatial locality of document
- Reference locality of document
- Cacheability of document
- Type of document
- Network distance of server which contains the document
- Type of protocol (FTP, HTTP)
- Number of clients
- Time-To-Live of document
...

The goal is to find a model that is accurate enough to assist Web 
cache design and configuration.

All these characteristics can present different values and there are a 
lot of models of traffic that capture part of these characteristics 
with different qualities of reproduction. A lot of work on modeling 
web traffic has already been made like in [5,6], for developing server 
and cache benchmarks. The choice among all these different models is 
determined by the problem we want to solve and the objects ( caches , 
protocols, .. ) we want to use in the simulation. The difficulty of 
this modeling is the exponential growth of the complexity, along with 
the increase of the number of criteria taken into account. The 
objective is to take the smallest model that is valid for the 
measurements we want to conduct.

To validate the choice of a model, we have to make a first simulation 
with it on the existing architecture and to compare the results with a
log file simulation.

1.3 Running the simulation

Now, using the model of traffic determined before, we have to simulate 
the traffic on different architectures in order to find the 
architecture which maximizes the cost-benefit function. We propose here
a method to achieve this objective.

1.3.1 Validity of using the simulator

To validate the use of the simulator, the first point is to see how 
precisely we can reproduce the real architecture. For this, we will use
clients that can play the different log files for the simulated 
architecture and make measurements during the simulation. Comparing 
these measurements with real ones allows us to know how realistic the 
simulation is and which measurements can be taken into account for our 
experiments.


FRANCE TELECOM                 Expires:  August 1999          [Page 5]
INTERNET-DRAFT     Cache Mesh Evaluation and Design     February 1999


1.3.2 Parameters step by step

Simulations should be made on different configurations for a given 
period. To have significant measurements, the period must be longer 
than the time needed to fill the different caches. A pre-filling 
algorithm can be used to get results faster.

To obtain the optimal configuration for the cost-benefit function, we 
will have to modify one or several of these parameters of the 
architecture:

- Architecture design
- Number of Caches
- Cooperation protocol
- Network bandwidth
- Size of caches
- Power of caches
- Replacement policy
- Coherency policy
...

So, to obtain an optimal result, we have to evaluate a lot of possible 
configurations which modify all or part of these different parameters. 
The best methodology is to make comparisons between configurations 
which differ on only one parameter.

That is why, the simulation must be divided into different steps. 
There will be a step for each parameter we wish to modify. During each 
step, measurements will be performed and cost-benefit function 
calculated. For each of these steps, we fix all parameters except one.
Different values of this parameter will be simulated in order to get 
the best value for this parameter (determined by the cost-benefit 
function) under the current configuration. Then, for the following 
step, we will set this parameter to this optimal value and will repeat
the simulation on another parameter on the new configuration. Even if
we have a local optimum, we hope to converge to an optimal 
configuration by modifying one criteria after another. This process 
allows us to work once again on a parameter that has already been 
determined, but it is important to minimize the number of steps.

This method can take a long time to apply but, if we have an idea of 
the lowest set of parameters of the configuration to modify, it will 
be successful. In general, we will concentrate our effort on the most 
significant parameters. This method will generally be more useful in 
answering a precise question with less parameters to modify rather than
in determining the best caching solution for a complex system. To 
answer such a difficult question, it will be useful to decompose the 
problem in separate basic questions.


FRANCE TELECOM                 Expires:  August 1999          [Page 6]
INTERNET-DRAFT     Cache Mesh Evaluation and Design     February 1999


1.3.3 Validity of the solution

To validate the configuration determined by the experiments we have to 
use the simulator for playing real log files on the determined 
configuration. If these values do not correspond, it is certainly 
because the traffic model is not as realistic as it would be for the 
set of parameters that has been modified. A last experiment in real 
conditions of this configuration, if it is possible, will validate the 
result. Then if measurements are confirmed, we are sure of the 
adequacy of the new configuration to the traffic.

1.3.4 Anticipation of further evolutions of traffic

When a new caching configuration is installed, its objective is not 
only to provide the best caching services but also to stay a good 
caching solution for a given time. Because of the exponential growth 
of web traffic, our experiments must take into account evolutions of 
the traffic.

By analyzing traffic, from one year to another for example, we can 
detect how different criteria are evolving. So, it is possible to 
anticipate these evolutions by changing some of the different 
characteristics of the chosen model. Cost-benefit function can also be 
modified in order to take into account the time objectives of the new 
configuration.

1.4 Interpreting results

With this method, we can obtain a range of cost-benefit values for 
different caching meshes. A single optimal value corresponds to the 
best architecture. In other case, if we have several optimum or none 
optimum, we will have to choice by taking into account the quality of 
services objectives and the cost objectives.

2 The simulation software

Different simulation softwares have already been used, like in [2,4] in
order to answer a specific question like: which replacement policy 
gives the highest hit rate ?

We present here a general model of simulator which could permit the 
simulation of all different kinds of traffic and caching architectures.

The different components of an architecture of cooperating caches can 
be represented very well by objects, which justify the use of an object
oriented language, like JAVA, which also simplifies objects 
communication. The other object of the software is the controller which
coordinates all the simulation.


FRANCE TELECOM                 Expires:  August 1999          [Page 7]
INTERNET-DRAFT     Cache Mesh Evaluation and Design     February 1999


2.1 The controller

This object of the simulation has to treat all nodes and all the events
 (in a stack) which are requests. The object controller is responsible 
for the activation of the architecture of the simulator and for the 
generation of all of the objects constituting this architecture: client
 workstations, caches, servers and links. It also permits us to define 
measurements to be conducted.

2.1.1 HTTP protocol

HTTP protocol is defined in document [9].

For each request, between two nodes of the architecture, which is sent 
to the event stack, the controller opens a thread and keeps the 
connection alive until the reply has not been returned. The parameters 
that affect the connection are described in the link object between the
two nodes.
So there are three steps for each request during the thread life: 
- sending the request by the link to the upper node
- waiting for the reply of the upper node
- sending the answer by the link to the lower node

For the simulation, an HTTP request is composed of:

- Request Identification number
- protocol: HTTP
- URL identification number
- server addressed
- client
- link concerned
- receiver (server itself or a low-level cache)
- emission date

An HTTP answer is for the simulation composed of:

- Request Identification number
- protocol: HTTP
- URL identification number
- document size
- server addressed
- client
- link concerned
- receiver
- emission date

Different methods of request management can be implemented. The 
simplest way is to wait for a reply before sending another request 
without considering bandwidth limitations. But in this case, 
measurements we can conduct are limited.


FRANCE TELECOM                 Expires:  August 1999          [Page 8]
INTERNET-DRAFT     Cache Mesh Evaluation and Design     February 1999


The controller might have to treat other requests than HTTP requests. 
For example ICP [8] requests will have specific parameters.

2.1.2 Time Management

To simulate bottlenecks, we need to introduce time management. In this
case, the controller has to define the time clock with his events 
stack. A controller method will permit all nodes to access to this 
time.

All nodes and links will have a temporization parameter which is 
constant and temporization methods which calculate time for replying to
 the request, thus simulating and taking into account parameters like 
bandwidth limitation, maximum number of connections supported and 
document size.

2.1.3 Measurements

All measurements are controlled by the controller. The measurements 
being made by each object of the simulation give local performance 
information. The controller decides when and how often to collect these
values, which are used to calculate performances values for the whole 
system. 

2.2 Elements of a Web architecture represented as objects

The software has to describe the following components of an 
architecture of cooperating caches:

- client workstation
- cache
- server
- link

To each of these elements corresponds a specific class. The global 
class controller, permits us to launch the application and to generate
 all objects constituting the architecture. Common parameters and 
measurements can be implemented in a parent class. Different instances 
of these classes can be used in the same simulation.

2.2.1 Client

Each client object aims to simulate a number of real users sending HTTP
requests towards a Web server. Two different ways of simulating 
clients can be implemented. The first one and the easiest is to play a 
real log file. The second is to implement a model of traffic 
(cf. 1.2 ). Different realistic levels of models can be implemented. 
For each client object, parameters can be:


FRANCE TELECOM                 Expires:  August 1999          [Page 9]
INTERNET-DRAFT     Cache Mesh Evaluation and Design     February 1999


- Number of clients
- Diversity of URL
- Popularity of URL
- Redundancy rate
- Spatial locality
- Temporal locality
- Inter-reference times
- Off times
...

Some of these parameters are bound and lot of models exist. The most 
used is Zipf Law [3] which defines the popularity of URLs. If different
streams of traffic have to be simulated on different points of the 
architecture, we have to model interference between the streams, with 
for example the percentage of common URLs.

The principal measurement that can be conducted at client level is the 
mean latency.

2.2.2 Server

The server object replies to requests of one or several clients and has
to simulate a population of servers. The server object not only 
simulates a server on the Internet but also the entire part of the 
network between the server and the simulated intranet. 

The answer of the server is the size of the document. This size can 
follow a non linear law based on measurements done in real 
experimentations. All parameters of a server are:

- Size of URLs
- Mean size
- cacheability of document
- type of url (image, text, dynamic)
- Time-to-live of document
- Temporization time
...

Temporization time is an important value which permits us to simulate 
the network distance to a server, and bottlenecks on the Internet. 
Otherwise, it is possible to simulate an overloaded server by modifying
the parameter temporization which indicates to the thread the time to 
wait before sending the reply. 

No specific measurements have to be performed on the server side 
because the objective is to design our own network, not the Internet.

2.2.3 Cache 

The cache is the most complex element of the objects of the simulator. 
It has the possibility to function with cooperating capabilities or 
not. Most common parameters are:


FRANCE TELECOM                 Expires:  August 1999          [Page 10]
INTERNET-DRAFT     Cache Mesh Evaluation and Design     February 1999


- parents
- replacement policy (LRU, LFU, SIZE, ... )
- consistency policy
- Size of cache
- Size of memory
- activation of traces
- Size range of cacheable documents
- Maximum number of connection supported
...

When receiving a HTTP request, the cache verifies the presence of the 
requested URL in its internal content. In case of missing, the request 
is transmitted to the upper level toward the parent cache or the 
server. Every received request generates a thread responsible for the 
treatment of the request. Once the request is transmitted to the upper 
level, the thread waits for the reply coming from the upper level. 
After receiving the reply from the upper level, the cache verifies the 
size of the reply.

In the case of ICP cooperation, when receiving a client request that 
does not match with the local content, the cache generates an ICP 
request for each of its sibling caches. The simplest way to implement 
the ICP protocol is the following one: Each sibling cache will reply an
"ICP_HIT" if it contains the requested document and an "ICP_MISS" 
otherwise. The first "ICP_HIT" received determines the cache which will
received the HTTP request. If no "ICP_HIT" are received, the request 
will be transmitted to the upper level.

2.2.4 Link

The link object defines the bandwidth between two nodes.

2.3 Graphical User Interface

The graphical user interface permits us to change each parameter of 
each object and to see performances measurements.

Graphical user interface is very easy to use and allows us to 
comfortably modify parameters of the system in real time. It is thus 
possible to either activate or disactivate objects of the architecture 
without being obliged to recompile the code in JAVA, which permits 
tests to be done quickly. 

3 First results

In this part, we present our work in progress. By testing the above 
method, and developing a simulator, we not only want to determine the 
best caching solution, but also to increase our understanding of web 
caching meshes.

That is why, our first work was to study a basic problem.


FRANCE TELECOM                 Expires:  August 1999          [Page 11]
INTERNET-DRAFT     Cache Mesh Evaluation and Design     February 1999


3.1 Problem to solve


                       _______/\______
                      /               \
                     <  World Wide Web >
                     /\_______________/\
                    /                   \
                   /                     \
              ____/______              ___\_______
              |         |              |         |
              | Cache 1 |              | Cache 2 |
              | Size I  |              | Size I  |
              |_________|              |_________|
                  
                  / \                      / \
                 /| |\                    /| |\
                  | |                      | |
                Stream of                Stream of
                Requests 1               Requests 2


                Figure 1 : Starting Configuration


We have two caches of the same size (100 MB) which address directly the
Internet. We would like to keep these two symmetric caches at a low 
level and to add cache size at an upper level. We have the choice of 
the amount of cache to be placed at this upper level. The objective is
to find the optimal parent cache size if it exists.

For this first study we restrict the performance measurements to the 
hit rate.

3.2 Simulation software used

In this part we present our first version of a simulator which was
elaborated in order to answer the question: How to determine a parent 
cache size ?

For this problem we need to simulate two clients, three caches and one 
server. Since the only measurement is the hit rate, we do not need to 
manage time and to simulate links.

The simulation software in its present version has been written using 
the JDK1.1.7 of SUN. The software consists in 2 client workstations, 2
sibling caches, 1 parent cache, 1 Web server.

3.2.1 Controller

This first version of the controller object waits for the reply of each
sent request before sending another one. Time management will be 
implemented in a further version.

FRANCE TELECOM                 Expires:  August 1999          [Page 12]
INTERNET-DRAFT     Cache Mesh Evaluation and Design     February 1999


Measurements conducted are hit rate of each cache.

The controller object contains a "Reinit" button that reinitializes in 
real time all of the hit rate indicators for each cache when the user 
clicks over. This button permits us to become independent from the 
initial conditions of working of the system when caches begin to fill 
their storage space. Once the caches are filled and the user pushes on 
this button, hit rates stabilize more quickly because the system 
functions then in permanent mode and converge more quickly.

3.2.2 Client

Two types of client have been developed. The first one plays a log file
and the second sends requests following general version of Zipf Law [3]
which add a parameter named alpha.

3.2.3 Server

The server object only contains every document size. These sizes can be
initialized by using a log file.

3.2.4 Cache

The cache replacement policy implemented here is LRU. Parameters used 
by the cache are:

- size min: minimum size of an object that can be stored in the cache
- size max: maximum size of an object that can be stored in the cache 
- total size: the cache size
- button on/off: activates or disactivates the cache. If disactivated, 
the cache functions as a proxy and does not store any URL. 

Otherwise, the cache object shows its hit rate during its use. 

3.2.5 Evolutions of the software 

ICP cooperation protocol has already been implemented in the present 
version with the possibility to activate or disactivate the local 
storage of the ICP replies. But we will need to add time management in 
order to have more realistic results.

Network bandwidth are not treated yet in the software which means that 
latency times do not take into account this bandwidth that is yet very 
important in the case of weak bandwidths (as those gotten in a PSTN 
network) or with important transit signal in the case of satellite 
links. It is foreseen therefore in the future extensions of the 
software to add a class of object "link" that will simulate this 
bandwidth of links between objects.

Graphical user interface will also integrate the possibility to 
dynamically modify the architecture while adding objects and links 
dynamically between objects.


FRANCE TELECOM                 Expires:  August 1999          [Page 13]
INTERNET-DRAFT     Cache Mesh Evaluation and Design     February 1999


3.3 Experiments

The two low-level caches are two Squid [10] under Linux of 100 MB cache
size. The objective for this first experiment is to solve a simple 
problem in order to show how the simulator can help us in resolving a 
problem.

3.3.1 Cost-benefit function

For this experiment, the objective is to obtain a better hit rate by 
adding a parent cache . The benefit is the hit rate of the parent 
cache. The cost depends on the cache size we have added. We have 
considered, for this experiment, that this cost is proportional to the 
addition of cache size at the parent level.

           Parent Hit Rate
F  = __________________________
          Parent Cache Size

3.3.2 Traffic model

We have captured the two streams of request that are directed to the 
two caches over a period of 15 days. Characteristics we would like to 
reproduce in order to study Hit rate of the parent cache are:

- Document size distribution
- Popularity of documents
- Number of common urls between the two streams
- Percentage of cacheable documents

In order to simplify the problem we have only considered cacheable 
traffic which represents 55% of the total traffic for our two Squid. 
Non cacheable traffic does not affect number of hit either for child 
caches and the parent.

Analysis of this traffic gives the following results :

Total traffic:
95794 unique url requested
Mean size of documents is 13136 bytes
About 10% of requests are common in the two streams.

First stream of request:
 167790 requests
 62657 URLs
The stream fits the general Zipf's law with alpha = 0.67 (cf. 3.2.2)

Second stream of request:
 74811 requests
 38974 urls
The stream fits the general Zipf's law with alpha = 0.55


FRANCE TELECOM                 Expires:  August 1999          [Page 14]
INTERNET-DRAFT     Cache Mesh Evaluation and Design     February 1999


For this simulation, we have considered that all documents have the 
same size (the mean size) and that the two streams of requests follow 
general Zipf law with the corresponding alpha parameter. We have take
care of respect of redundancy rate of each stream. For each request of 
the second stream, two requests of the first stream are sent. 
Proportion of common URLs has been respected. 

For this former simulation we did not try to get a realistic traffic, 
in further experiments we will refine the model in order to validate 
our results for other measurements. This experiment is simple but can 
help us in making opinion of what can be the answer to our question. 
And we prefer having at first time, informal answer with no realistic 
model rather than using a realistic model that can not give us results 
because of its complexity.

7 % of urls are common between the two streams which represent for each
stream about 10 % of requests on common urls. 

3.3.3 Validation

With this model, using the simulator for two caches of 100MB, we 
achieved for the two caches respectively : 36% and 30% hit rate.

Whereas, for log file simulation we obtain 55 % and 44 % which 
correspond to the real hit rate of our two Squid. The objective is not 
to achieve the real hit rate but that the comparison of hit rate on the
simulation are valid in a real context. To achieve a real hit rate we 
would have to add reference locality to this model which is not 
captured by Zipf model [11]. We have calculated the hit rate for 
different size of cache, and the result is that the hit rate of our 
simulation follows the same evolution than hit rate obtained by playing
logs. 

When using a parent cache, we will have to validate that these 
differences of hit rate will have no bad effect on the parent level.

3.3.4 Simulations

Here, we present the simulations we have performed.

Given X, The parent cache size, Figure 2 presents the simulated 
architecture.

These experiments have been made with different values of X.


FRANCE TELECOM                 Expires:  August 1999          [Page 15]
INTERNET-DRAFT     Cache Mesh Evaluation and Design     February 1999


                 _______/\______       
                /               \    
               <  World Wide Web >     
                \_______________/ 
                        |
                    ____|______ 
                    | Parent  |  
                    | Cache   |  
            |------>| Size X  |<------|
            |       |_________|       |     
        ____|_______              ____|______
        |          |              |          |
        | Cache 1  |              | Cache 2  |
        |Size 100MB|              |Size 100MB|
        |__________|              |__________|
              
            / \                      / \
           /| |\                    /| |\
            | |                      | |
         Stream of                 Stream of
         Requests 1                Requests 2


       Figure 2  : Configuration with a parent cache


The results are the following :

  X (in MB)  Parent hit rate        F(X) (cost-benefit function)
    25          0.01156        6.07548378376
    50          0.02267        5.95651970277
    75          0.03514        6.15538567668
   100          0.04472        5.87559020436
   125          0.05724        6.01622570231
   150          0.07789        6.82191660538
   175          0.09874        7.41303993689
   200          0.12175        7.99747329107
   225          0.14120        8.24473801671
   250          0.16582        8.71365958982
   275          0.18342        8.76267415973
   300          0.20139        8.81922127053
   325          0.21969        8.88058185076
   350          0.23489        8.81669975107
   375          0.24953        8.74178753153
   400          0.25915        8.5114581022
   425          0.27058        8.36406473512


The cost-benefit function presents an optimal value for a size about 
325 MB. In order to valid this result we have performed the same 
experiment by using log files :

FRANCE TELECOM                 Expires:  August 1999          [Page 16]
INTERNET-DRAFT     Cache Mesh Evaluation and Design     February 1999


  X (in MB)  Parent hit rate        F(X) (cost-benefit function)
   125          0.02540        2.03266453203
   325          0.08009        2.46445867141
   425          0.09759        2.29624641084

These three values show that this optimum is present on this version of
the experiment. Nevertheless, parent hit rate achieved is different but
its evolution seems to be the same. By calculating more values using 
log traces, we have confirmed this result.

3.3.5 Results

With these experiments, we have been able to draw a graphic, which 
represents the cost-quality rate for different sizes of parent cache. 
This value grows with this cache size in order to become maximal for a 
parent cache size which is about 1.5 times bigger than the total child 
cache size. And then this value decreases slowly.

The first result is that, as we predicted, dimensioning a parent cache 
is a difficult question. And under our conditions (traffic model and 
cost-benefit function) , using a parent cache is interesting only if 
we are able to dimension it correctly.

4 Conclusion and further work

With our first experiment, we have shown that such a method and 
simulator can help us in designing cache meshes.

A lot of experiments show that fixing a cache size is important because
hit rate is not proportional to this size. In fact hit rate is 
proportional to the logarithm of cache size. That is why increasing 
size is not a good solution to increase hit rate if we take into 
account the cost of increasing this size.

We used our simulator to evaluate a solution. The use of a parent 
cache, whether we are using ICP or not between low level caches, 
can be a solution, and sizing this parent cache is the same problem as 
for a single cache. But traffic in input for this cache is dependent on
a child cache configuration. That is why lower cache size is an 
important criteria for dimensioning upper cache size. The simulations 
we have performed for two low level caches have helped us to confirm 
this hypothesis. The result is that, for our traffic characteristics 
and under our cost-benefit function, the best hit rate is achieved 
with a parent cache size 1.5 times bigger than total child cache size.

For this precise problem of sizing a parent cache, our objectives are 
to understand how the number of cache, size of low-level caches, hit 
rate of low-level caches and the use of the ICP cooperation affect the 
optimal size of a parent cache. Our final objective is to answer the 
following question : 


FRANCE TELECOM                 Expires:  August 1999          [Page 17]
INTERNET-DRAFT     Cache Mesh Evaluation and Design     February 1999


For a given total cache system size, what is the best solution ?

- sharing new cache size between the low-level caches.
- creating a parent cache with this size.
- a mixed solution.

This method of designing a cache mesh do not avoid testing under real 
conditions the determined configuration. The model chosen depends on 
the questions we would like to answer. In our previous problem, if we 
have wanted to compare another solution which uses an intelligent load
balancing solution, we would have to use another model of traffic. 
Another point is that our model was suited to compare hit rate but it 
would be interesting to measure a load on the different levels of 
caches in order to quantify the power of each cache which is dependent
on cache size. This would also affect the cost-benefit function.

By using the simulator, our first objective is to increase our 
knowledge of cache meshes in order to answer more complex problem. 
For this task, we will have to extend the simulator abilities. We have
noticed that the traffic model chosen depends on measurements we want 
to conduct and parameters of the traffic we would like to modify. It 
will be interesting to establish a list of dependences between 
characteristics of traffics, parameters of a caching architecture and 
measurements. An other important part of this work is to determine
realistic cost-benefit functions.

5 References

[1] Bradley M.Duska and David Marwood and Michael J.Feeley, "The 
Measured Access Characteristics of World-Wide-Web Client Proxy Caches",
 In Proceedings of 1997 USENIX Symposium on Internet Technology and 
systems, december 1997.

[2] Guillaume Pierre, "Conception d'un systeme de cache Web adapte aux 
specificites des utilisateurs", In proceedings of the NoTeRe '98 
colloquium, October 1998.

[3] Lee Breslau, Pei Cao, Li Fan, Graham Phillips, and Scott Shenher, 
"On the Implications of Zipf's Law for Web Caching", In proceedings of 
the third International WWW Caching Workshop, June 1998.

[4]  Pei Cao and Sandy Irani, "Cost-Aware WWW Proxy Caching Algorithms"
, In Proceedings of the 1997 USENIX Symposium on Internet Technology 
and Systems, pp. 193-206, December 1997.

[5] Paul Barford and Mark Crovella, "Generating Representative Web 
Workloads for Network and Server Performance Evaluation", In 
Proceedings of the SIGMETRICS '98 conference, June 1998.

[6] Stephen Manley, Michael Courage and Margo Seltzer, "A Self-Scaling 
and Self-Configuring Benchmark for Web Servers", Harvard University, 
1998.


FRANCE TELECOM                 Expires:  August 1999          [Page 18]
INTERNET-DRAFT     Cache Mesh Evaluation and Design     February 1999


[7] Ton Verschuren, Andre de Jong, Henny Bekker, Ingrid Melve, "Web 
Caching Meshes: Hit or Miss? ", INET'98, Geneva, Juli 1998.

[8] D. Wessels, K., Claffy, "Internet Cache Protocol (ICP), version 2",
RFC 2186, National Laboratory for Applied Network Research/UCSD, 
September 1997.

[9] Fielding, R., et. al, "Hypertext Transfer Protocol -- HTTP/1.1", 
RFC 2068, UC Irvine, January 1997.

[10] Wessels D., "The Squid Internet Object Cache", National Laboratory
for Applied Network Rsearch, http://squid.nlanr.net/Squid/ .

[11] Virgilio Almeida, Azer Bestravos, Mark Crovella and Adriana de 
Oliveira, "Characterizing Reference Locality in the WWW", In IEEE 
International Conference in Parallel and Distributed Information 
Systems, Miami Beach, Florida, December 1996.

6 Acknowledgments

The authors wish to thank in particular, Anthony Dauguet for his work 
on the simulation software. We also wish to thank Cedric Goutard, Sally
Lenoir and Betty Prehu for helping us in writing this document.

7 Authors' Addresses

Nicolas Saillard
France Telecom
Centre National des Etudes en Telecommunications
42, rue des Coutures BP 6243
14066 Caen Cedex
France
Phone: +33 2 31 75 92 47
Fax: +33 2 31 73 56 26
E-mail: nicolas.saillard@cnet.francetelecom.fr

Ivan Lovric
France Telecom
Centre National des Etudes en Telecommunications
42, rue des Coutures BP 6243
14066 Caen Cedex
France
Phone: +33 2 31 75 91 25
Fax: +33 2 31 73 56 26
E-mail: ivan.lovric@cnet.francetelecom.fr


FRANCE TELECOM                 Expires:  August 1999          [Page 19]