Network Working Group J. Mansigian INTERNET-DRAFT Consultant Expire in six months March 1997 Draft: Version 01 Clearing the Traffic Jam at Internet Servers A Network Layer View Of Network Traffic Consolidation Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as ``work in progress.'' To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet- Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Abstract The cause of the typically glacial response from popular Internet World Wide Web servers is seldom lack of network bandwidth or any deficits in the client's equipment. The reason for the abysmal performance is that the accessed server is spending an inordinate amount of time managing two problems; an unnecessarily large number of transport connections and the transmission of masses of redundant data without optimization. This work addresses both problems. This document presents an introduction to the concepts and architecture of network traffic consolidation. It is not intended to describe a complete protocol with every ancillary feature but rather to focus on performance driven core ideas that could become a part of emerging commercially featured protocols. The scope of network traffic consolidation is confined to file level interactions between Internet World Wide Web servers and their clients. Data is delivered to clients without client specific change. The goal of network traffic consolidation is to make an overburdened file server's behavior become very much as if it were servicing a light flow of file requests. The methods of network traffic consolidation can be summarized by saying that they achieve their goal by actually making the server's file request flow light. Network traffic consolidation acts on both the input and output flows of client server data. The input processing of network traffic consolidation is called request reduction. The output processing is called multicast response. Input and output processing operate asynchronously. Request reduction is implemented by a multi-threaded process resident on the server platform. This process sees a busy server's file request flow for each file templated by a series of time windows of small uniform interval. Request reduction divides all input requests into two classifications - those requests that can be consolidated and those that cannot be. Requests that cannot be consolidated are passed without delay to the server. Reqeusts that can be consolidated are treated differently. Conceptually a thread of the request reduction process gathers into one group all file requests that are from the same time window, request the same file, and originate from different clients. A common example would be multiple HTTP requests from different clients requesting the same HTML document file occuring in the same time window. What is actually constructed are two data structures. One is a copy of the common request data with a system generated key that is placed in the originator's address field. The other data structure is a list of all of the requesting clients, called the multicast distribution list. This list is keyed by the system generated key for later retrieval. The thread then places the keyed multicast distribution list in a queue that resides in shared memory and passes the request to the server. A consolidated request is indistinguishable from an individual request to the server so therefore no server request processing logic needs to change. A direct advantage of request reduction is a dramatic decrease in the frequency of server interruption with attendant improvement in server performance. Multicast response is implemented as a thin layer which invokes a qaulified third party implementation of a reliable multicast protocol ( RMP ) of the customer's choice. The multicast thin layer is a simple single threaded driver that is responsible for invoking RMP to service consolidated requests. The multicast response process receives a reply from the server and tries to locate a multicast distribution list that matches the reply's destination address. If it cannot then the reply is from a non-consolidated request and the unicast service of RMP is used to send the reply. If a match is found then the multicast response process takes as input the server's response, and consumes as input a member from the queue of multicast distribution lists. These two input data along with configuration data are used by the driver in its invocation of the RMP. Once the multicast response driver initiates the RMP it has no further responsibility with respect to the request except to invoke the RMP primitive to delete and disestablish all RMP data structures and remote RMP processes associated with the multicast group after a configured amount of time. All flow control, error retry and error messages come from the RMP. An important advantage of multicast response is that the server needs to put only one copy of the file on the wire no matter how many client requests compose the consolidated request. Introduction The Internet is used by millions of people every day for a variety of purposes. There is no sign that growth of interest in using the Internet is abating. The demands being placed on the Internet today could not have been anticipated decades ago when its predecessor, Arpanet, was designed. However the effects of these early design decisions are still very much with the Internet. The Paradigm Shift and its Effects The 1990s brought to the Internet a crush of new users from diverse backgrounds. This influx was mostly the result of the meteoric rise of the World Wide Web precipitated by the widespread availability of easy to use graphically interfaced clients such as Mosaic and Netscape. There were two important changes that resulted from this burgeoning movement. One was that there was explosive growth in activity both over the network and at the host interface. Two, the predominant form of communication had shifted from a peer-to-peer model to a mixture of peer-to-peer and client server modalities. In the early days of internetworking the peer-to-peer model of network use was exemplified by collaborating researchers sending email and experiment data to each other. The peer-to-peer traffic remains very important today but is no longer unchallenged as the predominant form of communication on the Internet. The ascendency of the World Wide Web has created a massive client server traffic on the Internet that differs qualitatively from the previous network traffic in important ways. The first difference is that in today's internetworked client server model the data is no longer necessarily unique. Colleagues sending each other email or collaborating by exchanging files of work related data are very unlikely not to make progress from one communication to the next. Thus the data being transmitted is pretty much unique. Intimate use of the Internet by a small number of communicants sending unique data was the predominant style before the 1990s. This was the culture of the Internet before the public embraced it. This state of affairs contrasts sharply with the current rage for accessing HTML pages from the World Wide Web. Popular Web pages undergo change very slowly when regarded in ratio with their access rate and therefore closely approach being constant data. The number of clients that access popular Web pages is spectacular. On another front the emergence of commercial and public data bases accessible from the Internet has brought about the commoditization of online information. The commodity data of these databases tends to change slowly in ratio with their access rate and is therefore another major source of nearly constant data which did not exist before. Like the Web pages many of these information files also experience heavy demand from a growing public audience. Another important way in which the new Internet traffic differs from its precursor has to do with the temporal clustering of requests. With the phenomenal growth in client activity in recent years the percentage of requests that arrive almost simultaneously at servers has also increased dramatically. The confluence of data redundancy, temporal clustering of requests, and heavy traffic in the new Internet are crucial factors that effect the client's perception of overall performance when accessing Web pages. The also provide the basis for optimization. The Problems and Their Causes Frequent Host Interruption Network based hosts on which server processes execute are controlled by general purpose operating systems. The host system does not perform efficiently when interrupts occur too frequently. Protocols based on individual request and response in conjunction with an environment of hundreds or thousands of clients a minute accessing the host produce such a dense pattern of interrupts that the host's performance is seriously degraded. LAN Saturation The LANs that Internet based hosts are connected to are adversely and unnecessarily effected by the passage of large numbers of individual requests onto the LAN when data redundancy of the requests is high. Every packet that arrives at the LAN must have its Internet address resolved to a physical address. Carrying request packets that are to be processed individually keeps the LAN unnecessarily loaded. The degrading effects of LAN saturation go beyond shackling performance delivered to remote clients. Local clients running transient applications ( e.g. word processors ) on hosts connected to the LAN also experience a loss in quality of service. Host Interface Burdened by Redundant Output The current state of the art for the internetworked client server model has the server or a proxy copying data onto the wire as many times as it is requested regardless of conditions. Conditions may include many clients making requests for the same data within a brief time interval. However current protocols used to distribute the server's output cannot optimize transfer of data from a memory buffer to the network media using the conditions cited. As a server becomes more popular and develops tighter temporal clustering for same file requests the time it takes to output the data increases at a faster than linear rate. The rate is thus because as the interrupt pattern becomes more dense the system degrades. The use of on server caches and mirror servers cannot address the fact that the data is transferred to the network substrate as many times as it is requested. Conclusion To The Introduction The individualy focused request and response paradigm at the core of the current client server model fails massive public application because of inefficiency bred of treating every request and every response as an individual piece of work regardless of the prescence of conditions that allow optimization. Solution lies in the direction of revised input and output processing that exploits patterns of data redundancy, temporal clustering, and the efficiencies of multicast delivery. This approach cannot be wholly transparent below the application layer. Transport and network layer protocols different from those commonly used today must be employed in the new client server model. Client To Server Basis for Request Reduction The basis for advantageous request reduction is high frequency arrival of the same request semantics from different clients. The busiest Web sites today receive HTTP hits at a sustained rate of 300 per second. Given the fact that most clients will use the same entry point to the site and the same few layers of the site's HTML document hierarchy there exists, within a small time window such as two seconds, scores of requests for the same HTML file. Even if we scale down from the busiest Web sites by an order of magnitude the sixty or so HTTP hits inside the time window provides sufficient basis for successful request reduction. Distribution of Request Reduction Responsibilities The request reduction process runs on the server's platform. The request reduction process is implemented as a multi-threaded daemon that receives incoming client requests from an RMP connection that has endpoint code running at the client's host and at the host of the request reduction process. The request reduction process consists of the following threads: One manager thread. Classifies input requests. Starts and stops service threads. Provides overall process control. One listener thread. Receives client requests and places them in the manager thread's input queue. Many unicast service thread(s) Implements the server side unicast RMP endpoint for requests that are not consolidated. Many time window service thread(s) Implements the server side multicast RMP endpoint for consolidated requests. The request reduction process acts as a filter that removes and processes the file requests it is responsible for and passes directly through to the server the rest of the request traffic. How Request Reduction Sees Time Flow A request reduction time window service thread divides time into small windows of configured interval. All time windows associated with the same file use the same configured interval. Time windows of different files are free to be configured with differant time intervals. There can and almost certainly will be more than one time window for the same file when measured over a longer time. The temporal flow of time windows for the same file may or may not be continuous. New time windows for the same file start at the release time of the previous time window for that file if there is one or more pending request(s) for the file. If not the continuity is broken and the time window for that file will reappear when the next request for that file is received. All time windows for the same file are non-overlapping with each other. Time windows of differant files may and almost certainly will overlap each other's temporal boundaries. Request Reduction Operating Cycle When an input request arrives at the listener thread it is immediately put into the manager thread's queue of input requests. The manager thread will go through this queue, classify each reqeust, and take appropriate action. Case 1: Request Is Not A Consolidation Candidate The manager thread has examined the request to see if it is a file request with no client specific processing. It hasn't passed this test so the manager thread starts a unicast thread to service the request's reply and passes the request to the server. Case 2: Request Is A Consolidation Candidate For A New Time Window The request examined by the manager thread is a consolidation candidate. Its semantics have been examined to see if they match the semantics of the request held by any existing time window. No match was found. The manager thread starts a new time window service thread to service this request. This involves creating a new time window by allocating a time window structure, starting a new timer, setting the reduction count variable to zero, allocating a memory buffer for the new request, moving the new request into this buffer, and incrementing the time window's reduction count variable by one. Another buffer associated one-to-one with the newly allocated time window is allocated. This buffer contains the list of addresses of clients that share the same time window membership. Another way to look at it is to say that this is a list of addresses of clients that have made the same request at nearly the same time. This list is called the multicast distribution list. This list is keyed by a system generated unique key that binds it to the consolidated request. Case 3: Request Is A Consolidation Candidate For An Existing Time Window The request examined by the manager thread is a consolidation candidate. Its semantics have been examined to see if they match the semantics of the request held by any existing time window. A match was found. The client address of the newly arrived request is inserted into the matching time window's multicast distribution list and the time window's reduction count variable is incremented by one. When a time window's release comes due, either because of elapsed time or because the reduction count variable has exceeded a configured maximum, the following happens. 1) The time window service thread generates, for reference by the co-resident multicast response process, a unique request key that identifies the consolidated request and its multicast distribution list. 2) The time window service thread inserts the multicast distribution list into a queue kept in shared memory. ( The other co-resident process, multicast response, is this queue's consumer. ) 3) The time window service thread creates a consolidated request. This consists of the common request data held in the time window's request buffer with the client's address field filled in with the system generated unique key that references the list of clients that should receive a reply to the consolidated request. 4) The consolidated request is passed to the server. ( Note that this request looks exactly like an ordinary request to the server. ) Key Generation Issues It is important that the key generator does not select a key value to identify a multicast distribution list which collides with the originator address of a request which was bypassed as a candidate for consolidated request. To insure that this does not occur Network Traffic Consolidation will have assigned to it one Class B Internet id from which the key generator will make the key values that uniquely identify each multicast distribution list. Since these key values are pseudo-addresses that are never seen outside of the NTC processes that reside on one platform it is possible for every installation of NTC to re-use the same Internet id as the root from which all key values are derived. The only real issue is to insure that no client can ever have the same address as an NTC key value. Server To Client IP Multicast in a Nutshell Multicast communication involves the sending of packets from one source to many destinations. Network routers that run the multicast router daemon copy received packets onto those interfaces that are part of a shortest path distribution tree pruned of superfluous links. This pruned distribution tree provides just one path from the packet's source to each destination. Destinations are referenced by a special type of IP address known as a group address or Class D Internet address. Recipients of multicast packets have a standard command interface that allows them to join and leave a group address thus controlling what transmissions they will receive. The architecture of IP multicast is defined by RFC 1112. A representative implementation is MOSPF defined by RFC 1584 and further discussed in RFC 1585. Reliable Multicast Protocols Reliable Multicast Protocols, or RMPs, are built upon the network layer IP Multicast service that has become widely deployed on the Internet. Qualifying A Reliable Multicast Protocol Network Traffic Consolidation allows customer's to select the RMP service of their choice from a list of supported RMPs that meet the following qualification criteria. Receiver initiatiated design to achieve: Distributed state management that scales to many receivers without sender becoming a bottleneck. IP multicast is the network layer service used by the RMP. Supports dynamic join and leave of members from a multicast group. Completely and transparently manages flow control, error retransmission and error messages. Native support for unicast transmission instead of reliance on TCP as a second transport. Distributed modular organization that allows error retransmission from local data store instead of the sender to achieve: Reduced wait by client on a retransmission. Better use of network bandwidth. Server free from having to buffer sent data. Multicast Response The multicast response driver is a very simple single threaded process. It receives a server reply through a call from the NTC service API and tries to find a multicast distribution list that matches the reply's destination address. If it cannot match the address then it simply calls the RMP's unicast service to handle delivery of the server's reply. This is the case of a non-consolidated request being fulfilled. If there is a match then a consolidated request is being processed. If this is the case then the multicast response driver makes a multicast invocation of the RMP using the server provided reply data, the consolidated request's multicast distribution list, and any pertinent configuration data. The only other thing that the multicast response driver does is to invoke an RMP primitive to release all process and data resources associated with the request after a configured amount of time. The timeout is calculated to be long enough to allow nearly 100% of transmissions to complete in their entirety under adverse conditions. Advantages Of Network Traffic Consolidation Network Traffic Consolidation moves in the direction of bounding the number of requests that a server will receive for a given file in a given span of time. This makes server load more predictable and more importantly protects the server from being overwhelmed by too many requests in a given span of time. Although HTTP data can be served very well by Network Traffic Consolidation this technology is future safe in the sense that it is general enough to process all highly redundant data records regardless of format. Network Traffic Consolidation addresses the bottleneck at the point where busy servers, be they primary or proxy servers, transfer data from host memory to network media. This involves significant CPU resource on popular servers that regularly have dozens of clients simultaneously requesting the same few high level HTML files. The multicast mode of transmission used by the server to client processing of Network Traffic Consolidation preserves network bandwidth when compared to the current unicast method of serving clients. Network Traffic Consolidation reduces the number of software interrupts received by network hosts for a given rate of client requests. Network Traffic Consolidation scales exceptionally well. The worst area of Web site overload involves accessing the first few levels of HTML document files. There is more redundant data access here than anywhere else. Because of the hierarchical structure of a Web site nearly everyone enters from a common top page and there is a slow moving concentration of traffic at levels near the top page that gradually works downward. In Network Traffic Consolidation, because every like intentioned request in the same small time window is consolidated into one request, the greatest improvement over the conventional one request one response mode of service is seen during heavy load. Security Considerations Since the identities of the clients involved in a consolidated request are masked by a pseudo-address key the server is not able to enforce any client specific restrictions to data access. In this sense, a consolidated request is a Trojan Horse. This is an area that needs to be addressed. References S. Deering, "Host Extensions for IP Multicasting", STD 5, RFC1112 Stanford University, August 1989 Rajendra Yavatkar, James Griffioen, Madhu Sudam, "A Reliable Dissemination Protocol for Interactive Collaborative Applications", University of Kentucky, December 1996 http://www.dcs.uky.edu/~griff/papers/ tmtp-mm95/main.html Nils Seifert, "Multicast Transport Protocol Version 2", Berlin, October 1995 http://www.cs.tu-berlin.de/~nilss/mtp/protocol.html Alex Koifman, Steve Zabelle "A Reliable Adaptive Multicast Protocol", RFC 1458, TASC, May 1993 S. Armstrong, A. Freier, K. Marzullo "Multicast Transport Protocol", RFC 1301, Xerox, Apple, Cornell, Feb 1993 J. Moy "Multicast Extensions to OSPF", STD 1, RFC 1584, Proteon Inc., March 1994 J. Moy "MOSPF: Analysis and Experience", RFC 1585, Proteon Inc., March 1994 T. Berners-Lee, R. Fielding, H. Frystak, "Hypertext Transfer Protocol - HTTP/1.0", RFC 1945, MIT/LCS, UC Irvine, DEC, May 1996 R. Fielding, J. Gettys, J. Mogul, H. Frystyk, T. Berners-Lee "Hypertext Transfer Protocol - HTTP/1.1", STD 1, RFC 2068, January 1997 S.E. Spero, "Analysis of HTTP Performance Problems", http://sunsite.unc.edu/mdma-release/http-prob.html Author's Address Joseph Mansigian 155 Marlin Rd. New Britain, CT 06053 Phone: (860) 223-5869 EMail: jman@connix.com