WREC Working Group Nicolas Saillard INTERNET-DRAFT Ivan Lovric draft-saillard-cache-mesh-design-00.txt February 26, 1999 Expires August 26, 1999 Cache Mesh Evaluation and Design Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This Draft describes a general method of determining a caching solution adapted to a given traffic. This method uses a simulation software developed at France Telecom which allows different existing architectures, protocols and also different streams of traffic to be simulated. Using this software, we explain how to design completely a new cache mesh and to calculate its parameters in order to maximize performances and minimize costs. FRANCE TELECOM Expires: August 1999 [Page 1] INTERNET-DRAFT Cache Mesh Evaluation and Design February 1999 Table of contents 0 Overview 1 How to determine a caching solution 1.1 Characterizing a real system 1.1.1 Architecture and measurements 1.1.2 Cost-benefit function 1.2 Capturing traffic characteristics 1.3 Running the simulation 1.3.1 Validity of using the simulator 1.3.2 Parameters step by step 1.3.3 Validity of the solution 1.3.4 Anticipation of further evolutions of traffic 1.4 Interpreting results 2 The simulation software 2.1 The controller 2.1.1 HTTP protocol 2.1.2 Time management 2.1.3 Measurements 2.2 Elements of a Web architecture represented as objects 2.2.1 Client 2.2.2 Server 2.2.3 Cache 2.2.4 Link 2.3 Graphical User Interface 3 First results 3.1 Problem to solve 3.2 Simulation software used 3.2.1 The controller 3.2.2 Client 3.2.3 Server 3.2.4 Cache 3.2.5 Evolutions of the software 3.3 Experiments 3.3.1 Cost function 3.3.2 Traffic model 3.3.3 Validation 3.3.4 Simulations 3.3.5 Results 4 Conclusion and further work 5 References 6 Acknowledgments 7 Authors' Addresses FRANCE TELECOM Expires: August 1999 [Page 2] INTERNET-DRAFT Cache Mesh Evaluation and Design February 1999 0 Overview In a network dedicated to the web, caching solutions permit latencies to be reduced. To configure such a solution, many different choices are available. Even in a single cache solution, we have to determine the cache size, and much of the criteria in the configuration file which affect the replacement and consistency policy. With more complex architectures, we will have to choose between cooperation protocols. But, our choice must take into account the actual network topology, the clients' traffic, the latencies due to the Internet and all the evolutions of all these factors. This problem is complex and many solutions are empirically determined. Several single cache solutions are proposed by cache constructors in function of the context of utilization ( isp, intranet, university ) and the number of clients. Generally these solutions are oversized to make sure that the solution will work but without giving any estimation of its viability. It would be more interesting to be able to determine a solution adapted to a given traffic and that ensures good performances for a given period, for the lower cost. This solution would also be able to evolve for a lower cost with traffic. In order to determine such a solution, we describe a method based on a simulation software. Then we use this method and the simulation software to determine the size of a parent cache. 1 How to determine a caching solution The objective of using the simulator is to find a solution of caches adapted to a given traffic. This solution has to provide the best services for the lowest cost. That is why the first step is to define the existing architecture. 1.1 Characterizing a real system For our objectives we need a description of the actual architecture in many points. Then to determine a better architecture, we need to evaluate architectures by means of a cost-benefit analysis. 1.1.1 Architecture and measurements The complete description of the actual web architecture is composed of: - Each node description (cache, server, ..) - Each link description - Each protocol description FRANCE TELECOM Expires: August 1999 [Page 3] INTERNET-DRAFT Cache Mesh Evaluation and Design February 1999 Another important point is the traffic. That is why we need trace files of caches in order to get informations on streams of requests that arrive in input to caches. These log files must be captured over specific periods ( a day for example )and on different locations, and measurements must be made for the same periods. These measurements could be : - Latency - Bandwidth gain - hit rate (byte or request) ... The list of these measurements depends on the objective of the new architecture. Some of these measurements can be performed directly by analyzing the log traces, others require specific tools. With all these data, we dispose of knowledge of the actual system and of its performances. 1.1.2 Cost-benefit function The cost-benefit function [7] is the function which calculates the benefit cost ratio for a given configuration and for performance measurements. This function helps us to evaluate each configuration and to determine the best caching solution. This function is dependent on the objectives and on the actual architecture. The benefits for this function are the increase of quality of service that we determine by using measurements. The costs are administration cost and material cost, taking into account the existing architecture. 1.2 Capturing traffic characteristics We want to replay the captured traces on different configurations of caches in order to know which of these configurations is the best adapted to the traces. But using a log file is not flexible and for our simulations, it would be more efficient to play a shorter part of the traffic and to divide it, or to play it for a given period. That is why we prefer to simulate this traffic. This generation of traffic must reproduce the characteristics of the real traffic which are constant from one log file to an other. A random process must be used in order to avoid adding specific characteristic. With such a system we aim at determining a caching solution adapted to the global characteristic of the traffic and not to a specific log file. FRANCE TELECOM Expires: August 1999 [Page 4] INTERNET-DRAFT Cache Mesh Evaluation and Design February 1999 Characteristics we want to generate can vary with the given problem and are: - Size of document - Popularity of document - Spatial locality of document - Reference locality of document - Cacheability of document - Type of document - Network distance of server which contains the document - Type of protocol (FTP, HTTP) - Number of clients - Time-To-Live of document ... The goal is to find a model that is accurate enough to assist Web cache design and configuration. All these characteristics can present different values and there are a lot of models of traffic that capture part of these characteristics with different qualities of reproduction. A lot of work on modeling web traffic has already been made like in [5,6], for developing server and cache benchmarks. The choice among all these different models is determined by the problem we want to solve and the objects ( caches , protocols, .. ) we want to use in the simulation. The difficulty of this modeling is the exponential growth of the complexity, along with the increase of the number of criteria taken into account. The objective is to take the smallest model that is valid for the measurements we want to conduct. To validate the choice of a model, we have to make a first simulation with it on the existing architecture and to compare the results with a log file simulation. 1.3 Running the simulation Now, using the model of traffic determined before, we have to simulate the traffic on different architectures in order to find the architecture which maximizes the cost-benefit function. We propose here a method to achieve this objective. 1.3.1 Validity of using the simulator To validate the use of the simulator, the first point is to see how precisely we can reproduce the real architecture. For this, we will use clients that can play the different log files for the simulated architecture and make measurements during the simulation. Comparing these measurements with real ones allows us to know how realistic the simulation is and which measurements can be taken into account for our experiments. FRANCE TELECOM Expires: August 1999 [Page 5] INTERNET-DRAFT Cache Mesh Evaluation and Design February 1999 1.3.2 Parameters step by step Simulations should be made on different configurations for a given period. To have significant measurements, the period must be longer than the time needed to fill the different caches. A pre-filling algorithm can be used to get results faster. To obtain the optimal configuration for the cost-benefit function, we will have to modify one or several of these parameters of the architecture: - Architecture design - Number of Caches - Cooperation protocol - Network bandwidth - Size of caches - Power of caches - Replacement policy - Coherency policy ... So, to obtain an optimal result, we have to evaluate a lot of possible configurations which modify all or part of these different parameters. The best methodology is to make comparisons between configurations which differ on only one parameter. That is why, the simulation must be divided into different steps. There will be a step for each parameter we wish to modify. During each step, measurements will be performed and cost-benefit function calculated. For each of these steps, we fix all parameters except one. Different values of this parameter will be simulated in order to get the best value for this parameter (determined by the cost-benefit function) under the current configuration. Then, for the following step, we will set this parameter to this optimal value and will repeat the simulation on another parameter on the new configuration. Even if we have a local optimum, we hope to converge to an optimal configuration by modifying one criteria after another. This process allows us to work once again on a parameter that has already been determined, but it is important to minimize the number of steps. This method can take a long time to apply but, if we have an idea of the lowest set of parameters of the configuration to modify, it will be successful. In general, we will concentrate our effort on the most significant parameters. This method will generally be more useful in answering a precise question with less parameters to modify rather than in determining the best caching solution for a complex system. To answer such a difficult question, it will be useful to decompose the problem in separate basic questions. FRANCE TELECOM Expires: August 1999 [Page 6] INTERNET-DRAFT Cache Mesh Evaluation and Design February 1999 1.3.3 Validity of the solution To validate the configuration determined by the experiments we have to use the simulator for playing real log files on the determined configuration. If these values do not correspond, it is certainly because the traffic model is not as realistic as it would be for the set of parameters that has been modified. A last experiment in real conditions of this configuration, if it is possible, will validate the result. Then if measurements are confirmed, we are sure of the adequacy of the new configuration to the traffic. 1.3.4 Anticipation of further evolutions of traffic When a new caching configuration is installed, its objective is not only to provide the best caching services but also to stay a good caching solution for a given time. Because of the exponential growth of web traffic, our experiments must take into account evolutions of the traffic. By analyzing traffic, from one year to another for example, we can detect how different criteria are evolving. So, it is possible to anticipate these evolutions by changing some of the different characteristics of the chosen model. Cost-benefit function can also be modified in order to take into account the time objectives of the new configuration. 1.4 Interpreting results With this method, we can obtain a range of cost-benefit values for different caching meshes. A single optimal value corresponds to the best architecture. In other case, if we have several optimum or none optimum, we will have to choice by taking into account the quality of services objectives and the cost objectives. 2 The simulation software Different simulation softwares have already been used, like in [2,4] in order to answer a specific question like: which replacement policy gives the highest hit rate ? We present here a general model of simulator which could permit the simulation of all different kinds of traffic and caching architectures. The different components of an architecture of cooperating caches can be represented very well by objects, which justify the use of an object oriented language, like JAVA, which also simplifies objects communication. The other object of the software is the controller which coordinates all the simulation. FRANCE TELECOM Expires: August 1999 [Page 7] INTERNET-DRAFT Cache Mesh Evaluation and Design February 1999 2.1 The controller This object of the simulation has to treat all nodes and all the events (in a stack) which are requests. The object controller is responsible for the activation of the architecture of the simulator and for the generation of all of the objects constituting this architecture: client workstations, caches, servers and links. It also permits us to define measurements to be conducted. 2.1.1 HTTP protocol HTTP protocol is defined in document [9]. For each request, between two nodes of the architecture, which is sent to the event stack, the controller opens a thread and keeps the connection alive until the reply has not been returned. The parameters that affect the connection are described in the link object between the two nodes. So there are three steps for each request during the thread life: - sending the request by the link to the upper node - waiting for the reply of the upper node - sending the answer by the link to the lower node For the simulation, an HTTP request is composed of: - Request Identification number - protocol: HTTP - URL identification number - server addressed - client - link concerned - receiver (server itself or a low-level cache) - emission date An HTTP answer is for the simulation composed of: - Request Identification number - protocol: HTTP - URL identification number - document size - server addressed - client - link concerned - receiver - emission date Different methods of request management can be implemented. The simplest way is to wait for a reply before sending another request without considering bandwidth limitations. But in this case, measurements we can conduct are limited. FRANCE TELECOM Expires: August 1999 [Page 8] INTERNET-DRAFT Cache Mesh Evaluation and Design February 1999 The controller might have to treat other requests than HTTP requests. For example ICP [8] requests will have specific parameters. 2.1.2 Time Management To simulate bottlenecks, we need to introduce time management. In this case, the controller has to define the time clock with his events stack. A controller method will permit all nodes to access to this time. All nodes and links will have a temporization parameter which is constant and temporization methods which calculate time for replying to the request, thus simulating and taking into account parameters like bandwidth limitation, maximum number of connections supported and document size. 2.1.3 Measurements All measurements are controlled by the controller. The measurements being made by each object of the simulation give local performance information. The controller decides when and how often to collect these values, which are used to calculate performances values for the whole system. 2.2 Elements of a Web architecture represented as objects The software has to describe the following components of an architecture of cooperating caches: - client workstation - cache - server - link To each of these elements corresponds a specific class. The global class controller, permits us to launch the application and to generate all objects constituting the architecture. Common parameters and measurements can be implemented in a parent class. Different instances of these classes can be used in the same simulation. 2.2.1 Client Each client object aims to simulate a number of real users sending HTTP requests towards a Web server. Two different ways of simulating clients can be implemented. The first one and the easiest is to play a real log file. The second is to implement a model of traffic (cf. 1.2 ). Different realistic levels of models can be implemented. For each client object, parameters can be: FRANCE TELECOM Expires: August 1999 [Page 9] INTERNET-DRAFT Cache Mesh Evaluation and Design February 1999 - Number of clients - Diversity of URL - Popularity of URL - Redundancy rate - Spatial locality - Temporal locality - Inter-reference times - Off times ... Some of these parameters are bound and lot of models exist. The most used is Zipf Law [3] which defines the popularity of URLs. If different streams of traffic have to be simulated on different points of the architecture, we have to model interference between the streams, with for example the percentage of common URLs. The principal measurement that can be conducted at client level is the mean latency. 2.2.2 Server The server object replies to requests of one or several clients and has to simulate a population of servers. The server object not only simulates a server on the Internet but also the entire part of the network between the server and the simulated intranet. The answer of the server is the size of the document. This size can follow a non linear law based on measurements done in real experimentations. All parameters of a server are: - Size of URLs - Mean size - cacheability of document - type of url (image, text, dynamic) - Time-to-live of document - Temporization time ... Temporization time is an important value which permits us to simulate the network distance to a server, and bottlenecks on the Internet. Otherwise, it is possible to simulate an overloaded server by modifying the parameter temporization which indicates to the thread the time to wait before sending the reply. No specific measurements have to be performed on the server side because the objective is to design our own network, not the Internet. 2.2.3 Cache The cache is the most complex element of the objects of the simulator. It has the possibility to function with cooperating capabilities or not. Most common parameters are: FRANCE TELECOM Expires: August 1999 [Page 10] INTERNET-DRAFT Cache Mesh Evaluation and Design February 1999 - parents - replacement policy (LRU, LFU, SIZE, ... ) - consistency policy - Size of cache - Size of memory - activation of traces - Size range of cacheable documents - Maximum number of connection supported ... When receiving a HTTP request, the cache verifies the presence of the requested URL in its internal content. In case of missing, the request is transmitted to the upper level toward the parent cache or the server. Every received request generates a thread responsible for the treatment of the request. Once the request is transmitted to the upper level, the thread waits for the reply coming from the upper level. After receiving the reply from the upper level, the cache verifies the size of the reply. In the case of ICP cooperation, when receiving a client request that does not match with the local content, the cache generates an ICP request for each of its sibling caches. The simplest way to implement the ICP protocol is the following one: Each sibling cache will reply an "ICP_HIT" if it contains the requested document and an "ICP_MISS" otherwise. The first "ICP_HIT" received determines the cache which will received the HTTP request. If no "ICP_HIT" are received, the request will be transmitted to the upper level. 2.2.4 Link The link object defines the bandwidth between two nodes. 2.3 Graphical User Interface The graphical user interface permits us to change each parameter of each object and to see performances measurements. Graphical user interface is very easy to use and allows us to comfortably modify parameters of the system in real time. It is thus possible to either activate or disactivate objects of the architecture without being obliged to recompile the code in JAVA, which permits tests to be done quickly. 3 First results In this part, we present our work in progress. By testing the above method, and developing a simulator, we not only want to determine the best caching solution, but also to increase our understanding of web caching meshes. That is why, our first work was to study a basic problem. FRANCE TELECOM Expires: August 1999 [Page 11] INTERNET-DRAFT Cache Mesh Evaluation and Design February 1999 3.1 Problem to solve _______/\______ / \ < World Wide Web > /\_______________/\ / \ / \ ____/______ ___\_______ | | | | | Cache 1 | | Cache 2 | | Size I | | Size I | |_________| |_________| / \ / \ /| |\ /| |\ | | | | Stream of Stream of Requests 1 Requests 2 Figure 1 : Starting Configuration We have two caches of the same size (100 MB) which address directly the Internet. We would like to keep these two symmetric caches at a low level and to add cache size at an upper level. We have the choice of the amount of cache to be placed at this upper level. The objective is to find the optimal parent cache size if it exists. For this first study we restrict the performance measurements to the hit rate. 3.2 Simulation software used In this part we present our first version of a simulator which was elaborated in order to answer the question: How to determine a parent cache size ? For this problem we need to simulate two clients, three caches and one server. Since the only measurement is the hit rate, we do not need to manage time and to simulate links. The simulation software in its present version has been written using the JDK1.1.7 of SUN. The software consists in 2 client workstations, 2 sibling caches, 1 parent cache, 1 Web server. 3.2.1 Controller This first version of the controller object waits for the reply of each sent request before sending another one. Time management will be implemented in a further version. FRANCE TELECOM Expires: August 1999 [Page 12] INTERNET-DRAFT Cache Mesh Evaluation and Design February 1999 Measurements conducted are hit rate of each cache. The controller object contains a "Reinit" button that reinitializes in real time all of the hit rate indicators for each cache when the user clicks over. This button permits us to become independent from the initial conditions of working of the system when caches begin to fill their storage space. Once the caches are filled and the user pushes on this button, hit rates stabilize more quickly because the system functions then in permanent mode and converge more quickly. 3.2.2 Client Two types of client have been developed. The first one plays a log file and the second sends requests following general version of Zipf Law [3] which add a parameter named alpha. 3.2.3 Server The server object only contains every document size. These sizes can be initialized by using a log file. 3.2.4 Cache The cache replacement policy implemented here is LRU. Parameters used by the cache are: - size min: minimum size of an object that can be stored in the cache - size max: maximum size of an object that can be stored in the cache - total size: the cache size - button on/off: activates or disactivates the cache. If disactivated, the cache functions as a proxy and does not store any URL. Otherwise, the cache object shows its hit rate during its use. 3.2.5 Evolutions of the software ICP cooperation protocol has already been implemented in the present version with the possibility to activate or disactivate the local storage of the ICP replies. But we will need to add time management in order to have more realistic results. Network bandwidth are not treated yet in the software which means that latency times do not take into account this bandwidth that is yet very important in the case of weak bandwidths (as those gotten in a PSTN network) or with important transit signal in the case of satellite links. It is foreseen therefore in the future extensions of the software to add a class of object "link" that will simulate this bandwidth of links between objects. Graphical user interface will also integrate the possibility to dynamically modify the architecture while adding objects and links dynamically between objects. FRANCE TELECOM Expires: August 1999 [Page 13] INTERNET-DRAFT Cache Mesh Evaluation and Design February 1999 3.3 Experiments The two low-level caches are two Squid [10] under Linux of 100 MB cache size. The objective for this first experiment is to solve a simple problem in order to show how the simulator can help us in resolving a problem. 3.3.1 Cost-benefit function For this experiment, the objective is to obtain a better hit rate by adding a parent cache . The benefit is the hit rate of the parent cache. The cost depends on the cache size we have added. We have considered, for this experiment, that this cost is proportional to the addition of cache size at the parent level. Parent Hit Rate F = __________________________ Parent Cache Size 3.3.2 Traffic model We have captured the two streams of request that are directed to the two caches over a period of 15 days. Characteristics we would like to reproduce in order to study Hit rate of the parent cache are: - Document size distribution - Popularity of documents - Number of common urls between the two streams - Percentage of cacheable documents In order to simplify the problem we have only considered cacheable traffic which represents 55% of the total traffic for our two Squid. Non cacheable traffic does not affect number of hit either for child caches and the parent. Analysis of this traffic gives the following results : Total traffic: 95794 unique url requested Mean size of documents is 13136 bytes About 10% of requests are common in the two streams. First stream of request: 167790 requests 62657 URLs The stream fits the general Zipf's law with alpha = 0.67 (cf. 3.2.2) Second stream of request: 74811 requests 38974 urls The stream fits the general Zipf's law with alpha = 0.55 FRANCE TELECOM Expires: August 1999 [Page 14] INTERNET-DRAFT Cache Mesh Evaluation and Design February 1999 For this simulation, we have considered that all documents have the same size (the mean size) and that the two streams of requests follow general Zipf law with the corresponding alpha parameter. We have take care of respect of redundancy rate of each stream. For each request of the second stream, two requests of the first stream are sent. Proportion of common URLs has been respected. For this former simulation we did not try to get a realistic traffic, in further experiments we will refine the model in order to validate our results for other measurements. This experiment is simple but can help us in making opinion of what can be the answer to our question. And we prefer having at first time, informal answer with no realistic model rather than using a realistic model that can not give us results because of its complexity. 7 % of urls are common between the two streams which represent for each stream about 10 % of requests on common urls. 3.3.3 Validation With this model, using the simulator for two caches of 100MB, we achieved for the two caches respectively : 36% and 30% hit rate. Whereas, for log file simulation we obtain 55 % and 44 % which correspond to the real hit rate of our two Squid. The objective is not to achieve the real hit rate but that the comparison of hit rate on the simulation are valid in a real context. To achieve a real hit rate we would have to add reference locality to this model which is not captured by Zipf model [11]. We have calculated the hit rate for different size of cache, and the result is that the hit rate of our simulation follows the same evolution than hit rate obtained by playing logs. When using a parent cache, we will have to validate that these differences of hit rate will have no bad effect on the parent level. 3.3.4 Simulations Here, we present the simulations we have performed. Given X, The parent cache size, Figure 2 presents the simulated architecture. These experiments have been made with different values of X. FRANCE TELECOM Expires: August 1999 [Page 15] INTERNET-DRAFT Cache Mesh Evaluation and Design February 1999 _______/\______ / \ < World Wide Web > \_______________/ | ____|______ | Parent | | Cache | |------>| Size X |<------| | |_________| | ____|_______ ____|______ | | | | | Cache 1 | | Cache 2 | |Size 100MB| |Size 100MB| |__________| |__________| / \ / \ /| |\ /| |\ | | | | Stream of Stream of Requests 1 Requests 2 Figure 2 : Configuration with a parent cache The results are the following : X (in MB) Parent hit rate F(X) (cost-benefit function) 25 0.01156 6.07548378376 50 0.02267 5.95651970277 75 0.03514 6.15538567668 100 0.04472 5.87559020436 125 0.05724 6.01622570231 150 0.07789 6.82191660538 175 0.09874 7.41303993689 200 0.12175 7.99747329107 225 0.14120 8.24473801671 250 0.16582 8.71365958982 275 0.18342 8.76267415973 300 0.20139 8.81922127053 325 0.21969 8.88058185076 350 0.23489 8.81669975107 375 0.24953 8.74178753153 400 0.25915 8.5114581022 425 0.27058 8.36406473512 The cost-benefit function presents an optimal value for a size about 325 MB. In order to valid this result we have performed the same experiment by using log files : FRANCE TELECOM Expires: August 1999 [Page 16] INTERNET-DRAFT Cache Mesh Evaluation and Design February 1999 X (in MB) Parent hit rate F(X) (cost-benefit function) 125 0.02540 2.03266453203 325 0.08009 2.46445867141 425 0.09759 2.29624641084 These three values show that this optimum is present on this version of the experiment. Nevertheless, parent hit rate achieved is different but its evolution seems to be the same. By calculating more values using log traces, we have confirmed this result. 3.3.5 Results With these experiments, we have been able to draw a graphic, which represents the cost-quality rate for different sizes of parent cache. This value grows with this cache size in order to become maximal for a parent cache size which is about 1.5 times bigger than the total child cache size. And then this value decreases slowly. The first result is that, as we predicted, dimensioning a parent cache is a difficult question. And under our conditions (traffic model and cost-benefit function) , using a parent cache is interesting only if we are able to dimension it correctly. 4 Conclusion and further work With our first experiment, we have shown that such a method and simulator can help us in designing cache meshes. A lot of experiments show that fixing a cache size is important because hit rate is not proportional to this size. In fact hit rate is proportional to the logarithm of cache size. That is why increasing size is not a good solution to increase hit rate if we take into account the cost of increasing this size. We used our simulator to evaluate a solution. The use of a parent cache, whether we are using ICP or not between low level caches, can be a solution, and sizing this parent cache is the same problem as for a single cache. But traffic in input for this cache is dependent on a child cache configuration. That is why lower cache size is an important criteria for dimensioning upper cache size. The simulations we have performed for two low level caches have helped us to confirm this hypothesis. The result is that, for our traffic characteristics and under our cost-benefit function, the best hit rate is achieved with a parent cache size 1.5 times bigger than total child cache size. For this precise problem of sizing a parent cache, our objectives are to understand how the number of cache, size of low-level caches, hit rate of low-level caches and the use of the ICP cooperation affect the optimal size of a parent cache. Our final objective is to answer the following question : FRANCE TELECOM Expires: August 1999 [Page 17] INTERNET-DRAFT Cache Mesh Evaluation and Design February 1999 For a given total cache system size, what is the best solution ? - sharing new cache size between the low-level caches. - creating a parent cache with this size. - a mixed solution. This method of designing a cache mesh do not avoid testing under real conditions the determined configuration. The model chosen depends on the questions we would like to answer. In our previous problem, if we have wanted to compare another solution which uses an intelligent load balancing solution, we would have to use another model of traffic. Another point is that our model was suited to compare hit rate but it would be interesting to measure a load on the different levels of caches in order to quantify the power of each cache which is dependent on cache size. This would also affect the cost-benefit function. By using the simulator, our first objective is to increase our knowledge of cache meshes in order to answer more complex problem. For this task, we will have to extend the simulator abilities. We have noticed that the traffic model chosen depends on measurements we want to conduct and parameters of the traffic we would like to modify. It will be interesting to establish a list of dependences between characteristics of traffics, parameters of a caching architecture and measurements. An other important part of this work is to determine realistic cost-benefit functions. 5 References [1] Bradley M.Duska and David Marwood and Michael J.Feeley, "The Measured Access Characteristics of World-Wide-Web Client Proxy Caches", In Proceedings of 1997 USENIX Symposium on Internet Technology and systems, december 1997. [2] Guillaume Pierre, "Conception d'un systeme de cache Web adapte aux specificites des utilisateurs", In proceedings of the NoTeRe '98 colloquium, October 1998. [3] Lee Breslau, Pei Cao, Li Fan, Graham Phillips, and Scott Shenher, "On the Implications of Zipf's Law for Web Caching", In proceedings of the third International WWW Caching Workshop, June 1998. [4] Pei Cao and Sandy Irani, "Cost-Aware WWW Proxy Caching Algorithms" , In Proceedings of the 1997 USENIX Symposium on Internet Technology and Systems, pp. 193-206, December 1997. [5] Paul Barford and Mark Crovella, "Generating Representative Web Workloads for Network and Server Performance Evaluation", In Proceedings of the SIGMETRICS '98 conference, June 1998. [6] Stephen Manley, Michael Courage and Margo Seltzer, "A Self-Scaling and Self-Configuring Benchmark for Web Servers", Harvard University, 1998. FRANCE TELECOM Expires: August 1999 [Page 18] INTERNET-DRAFT Cache Mesh Evaluation and Design February 1999 [7] Ton Verschuren, Andre de Jong, Henny Bekker, Ingrid Melve, "Web Caching Meshes: Hit or Miss? ", INET'98, Geneva, Juli 1998. [8] D. Wessels, K., Claffy, "Internet Cache Protocol (ICP), version 2", RFC 2186, National Laboratory for Applied Network Research/UCSD, September 1997. [9] Fielding, R., et. al, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2068, UC Irvine, January 1997. [10] Wessels D., "The Squid Internet Object Cache", National Laboratory for Applied Network Rsearch, http://squid.nlanr.net/Squid/ . [11] Virgilio Almeida, Azer Bestravos, Mark Crovella and Adriana de Oliveira, "Characterizing Reference Locality in the WWW", In IEEE International Conference in Parallel and Distributed Information Systems, Miami Beach, Florida, December 1996. 6 Acknowledgments The authors wish to thank in particular, Anthony Dauguet for his work on the simulation software. We also wish to thank Cedric Goutard, Sally Lenoir and Betty Prehu for helping us in writing this document. 7 Authors' Addresses Nicolas Saillard France Telecom Centre National des Etudes en Telecommunications 42, rue des Coutures BP 6243 14066 Caen Cedex France Phone: +33 2 31 75 92 47 Fax: +33 2 31 73 56 26 E-mail: nicolas.saillard@cnet.francetelecom.fr Ivan Lovric France Telecom Centre National des Etudes en Telecommunications 42, rue des Coutures BP 6243 14066 Caen Cedex France Phone: +33 2 31 75 91 25 Fax: +33 2 31 73 56 26 E-mail: ivan.lovric@cnet.francetelecom.fr FRANCE TELECOM Expires: August 1999 [Page 19]