Network Working Group                                     J. Mansigian
INTERNET-DRAFT                                              Consultant
Expire in six months                                        March 1997

Draft: Version 01

            Clearing the Traffic Jam at Internet Servers
         A Network Layer View Of Network Traffic Consolidation
                <draft-mansigian-ntc-intro-01.txt>

Status of this Memo

    This document is an Internet-Draft.  Internet-Drafts are working
     documents of the Internet Engineering Task Force (IETF), its
     areas, and its working groups.  Note that other groups may also
     distribute working documents as Internet-Drafts.
 
     Internet-Drafts are draft documents valid for a maximum of six
     months and may be updated, replaced, or obsoleted by other
     documents at any time.  It is inappropriate to use Internet-
     Drafts as reference material or to cite them other than as
     ``work in progress.''
 
     To learn the current status of any Internet-Draft, please check
     the ``1id-abstracts.txt'' listing contained in the Internet-
     Drafts Shadow Directories on ftp.is.co.za (Africa),
     nic.nordu.net (Europe), munnari.oz.au (Pacific Rim),
     ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast).
 

Abstract

   The cause of the typically glacial response from popular Internet  
World Wide Web servers is seldom lack of network bandwidth or 
any deficits in the client's equipment. The reason for the abysmal 
performance is that the accessed server is spending an inordinate 
amount of time managing two problems; an unnecessarily large number 
of transport connections and the transmission of masses of redundant 
data without optimization. This work addresses both problems.

   This document presents an introduction to the concepts and 
architecture of network traffic consolidation. It is not intended to 
describe a complete protocol with every ancillary feature but rather 
to focus on performance driven core ideas that could become a part of 
emerging commercially featured protocols. 

   The scope of network traffic consolidation is confined to file level 
interactions between Internet World Wide Web servers and their clients.
Data is delivered to clients without client specific change. 

   The goal of network traffic consolidation is to make an overburdened 
file server's behavior become very much as if it were servicing a light 
flow of file requests. The methods of network traffic consolidation can 
be summarized by saying that they achieve their goal by actually making 
the server's file request flow light.

   Network traffic consolidation acts on both the input and output flows 
of client server data. The input processing of network traffic 
consolidation is called request reduction. The output processing is 
called multicast response. Input and output processing operate 
asynchronously.

   Request reduction is implemented by a multi-threaded process resident 
on the server platform. This process sees a busy server's file 
request flow for each file templated by a series of time windows of 
small uniform interval. Request reduction divides all input requests 
into two classifications - those requests that can be consolidated and 
those that cannot be. Requests that cannot be consolidated are passed 
without delay to the server. Reqeusts that can be consolidated are 
treated differently. Conceptually a thread of the request reduction 
process gathers into one group all file requests that are from the same 
time window, request the same file, and originate from different 
clients. A common example would be multiple HTTP requests from different 
clients requesting the same HTML document file occuring in the same time 
window. What is actually constructed are two data structures. One is a 
copy of the common request data with a system generated key that is 
placed in the originator's address field. The other data structure is a 
list of all of the requesting clients, called the multicast distribution 
list. This list is keyed by the system generated key for later 
retrieval. The thread then places the keyed multicast distribution list 
in a queue that resides in shared memory and passes the request to the 
server. 
    
   A consolidated request is indistinguishable from an individual 
request to the server so therefore no server request processing logic 
needs to change. 

   A direct advantage of request reduction is a dramatic decrease in the 
frequency of server interruption with attendant improvement in server 
performance.

   Multicast response is implemented as a thin layer which invokes a 
qaulified third party implementation of a reliable multicast protocol 
( RMP ) of the customer's choice. The multicast thin layer is a simple 
single threaded driver that is responsible for invoking RMP to service 
consolidated requests.

   The multicast response process receives a reply from the server and 
tries to locate a multicast distribution list that matches the reply's 
destination address. If it cannot then the reply is from a 
non-consolidated request and the unicast service of RMP is used to send 
the reply. If a match is found then the multicast response process 
takes as input the server's response, and consumes as input a member 
from the queue of multicast distribution lists. These two input data 
along with configuration data are used by the driver in its invocation 
of the RMP. Once the multicast response driver initiates the RMP it has 
no further responsibility with respect to the request except to invoke 
the RMP primitive to delete and disestablish all RMP data structures and 
remote RMP processes associated with the multicast group after a 
configured amount of time. All flow control, error retry and error 
messages come from the RMP.

   An important advantage of multicast response is that the server needs 
to put only one copy of the file on the wire no matter how many client 
requests compose the consolidated request.

Introduction

   The Internet is used by millions of people every day for a variety
of purposes. There is no sign that growth of interest in using the 
Internet is abating. The demands being placed on the Internet today 
could not have been anticipated decades ago when its predecessor, 
Arpanet, was designed. However the effects of these early design 
decisions are still very much with the Internet.

The Paradigm Shift and its Effects

   The 1990s brought to the Internet a crush of new users from diverse
backgrounds. This influx was mostly the result of the meteoric
rise of the World Wide Web precipitated by the widespread 
availability of easy to use graphically interfaced clients such as 
Mosaic and Netscape. There were two important changes that resulted 
from this burgeoning movement. One was that there was explosive growth 
in activity both over the network and at the host interface. Two, the 
predominant form of communication had shifted from a peer-to-peer model 
to a mixture of peer-to-peer and client server modalities. In the early 
days of internetworking the peer-to-peer model of network use was 
exemplified by collaborating researchers sending email and experiment 
data to each other. The peer-to-peer traffic remains very important 
today but is no longer unchallenged as the predominant form of 
communication on the Internet. The ascendency of the World Wide Web has 
created a massive client server traffic on the Internet that differs 
qualitatively from the previous network traffic in important ways. 
            
   The first difference is that in today's internetworked client server 
model the data is no longer necessarily unique. Colleagues sending each 
other email or collaborating by exchanging files of work related data 
are very unlikely not to make progress from one communication to the 
next. Thus the data being transmitted is pretty much unique. Intimate 
use of the Internet by a small number of communicants sending unique 
data was the predominant style before the 1990s. This was the culture 
of the Internet before the public embraced it. This state of affairs 
contrasts sharply with the current rage for accessing HTML pages from 
the World Wide Web.

   Popular Web pages undergo change very slowly when regarded in ratio 
with their access rate and therefore closely approach being constant 
data. The number of clients that access popular Web pages is 
spectacular. On another front the emergence of commercial and public 
data bases accessible from the Internet has brought about the 
commoditization of online information. The commodity data of these 
databases tends to change slowly in ratio with their access rate and is 
therefore another major source of nearly constant data which did not 
exist before. Like the Web pages many of these information files also 
experience heavy demand from a growing public audience. 
   
   Another important way in which the new Internet traffic differs from  
its precursor has to do with the temporal clustering of requests.
With the phenomenal growth in client activity in recent years the 
percentage of requests that arrive almost simultaneously at servers has 
also increased dramatically.   

   The confluence of data redundancy, temporal clustering of requests, 
and heavy traffic in the new Internet are crucial factors that effect 
the client's perception of overall performance when accessing Web pages. 
The also provide the basis for optimization.

The Problems and Their Causes

Frequent Host Interruption

   Network based hosts on which server processes execute are controlled 
by general purpose operating systems. The host system does not perform 
efficiently when interrupts occur too frequently. Protocols based on  
individual request and response in conjunction with an environment of
hundreds or thousands of clients a minute accessing the host produce 
such a dense pattern of interrupts that the host's performance is 
seriously degraded. 
   
LAN Saturation

   The LANs that Internet based hosts are connected to are adversely 
and unnecessarily effected by the passage of large numbers of 
individual requests onto the LAN when data redundancy of the requests 
is high. Every packet that arrives at the LAN must have its Internet 
address resolved to a physical address. Carrying request packets that 
are to be processed individually keeps the LAN unnecessarily loaded. 
The degrading effects of LAN saturation go beyond shackling performance 
delivered to remote clients. Local clients running transient 
applications ( e.g. word processors ) on hosts connected to the LAN 
also experience a loss in quality of service.

Host Interface Burdened by Redundant Output

   The current state of the art for the internetworked client server 
model has the server or a proxy copying data onto the wire as many 
times as it is requested regardless of conditions. Conditions may 
include many clients making requests for the same data within a brief 
time interval. However current protocols used to distribute the 
server's output cannot optimize transfer of data from a memory buffer 
to the network media using the conditions cited. As a server becomes 
more popular and develops tighter temporal clustering for same file 
requests the time it takes to output the data increases at a faster 
than linear rate. The rate is thus because as the interrupt pattern 
becomes more dense the system degrades. The use of on server caches 
and mirror servers cannot address the fact that the data is transferred 
to the network substrate as many times as it is requested.

Conclusion To The Introduction

   The individualy focused request and response paradigm at the core of 
the current client server model fails massive public application because 
of inefficiency bred of treating every request and every response as an 
individual piece of work regardless of the prescence of conditions that 
allow optimization. Solution lies in the direction of revised input and 
output processing that exploits patterns of data redundancy, temporal 
clustering, and the efficiencies of multicast delivery.

   This approach cannot be wholly transparent below the application 
layer. Transport and network layer protocols different from those 
commonly used today must be employed in the new client server model.  

Client To Server

Basis for Request Reduction

   The basis for advantageous request reduction is high frequency 
arrival of the same request semantics from different clients. 
The busiest Web sites today receive HTTP hits at a sustained rate of 
300 per second. Given the fact that most clients will use the same 
entry point to the site and the same few layers of the site's HTML 
document hierarchy there exists, within a small time window such as 
two seconds, scores of requests for the same HTML file. Even if we 
scale down from the busiest Web sites by an order of magnitude the 
sixty or so HTTP hits inside the time window provides sufficient basis 
for successful request reduction.

Distribution of Request Reduction Responsibilities

   The request reduction process runs on the server's platform. 
The request reduction process is implemented as a multi-threaded daemon 
that receives incoming client requests from an RMP connection that has 
endpoint code running at the client's host and at the host of the
request reduction process. 

The request reduction process consists of the following threads: 

  One manager thread. 
     Classifies input requests. 
     Starts and stops service threads. 
     Provides overall process control. 
     
  One listener thread. 
     Receives client requests and places them in the manager 
     thread's input queue.

  Many unicast service thread(s) 
     Implements the server side unicast RMP endpoint for requests 
     that are not consolidated.

  Many time window service thread(s) 
     Implements the server side multicast RMP endpoint for 
     consolidated requests.

   The request reduction process acts as a filter that removes and 
processes the file requests it is responsible for and passes directly 
through to the server the rest of the request traffic.

How Request Reduction Sees Time Flow

   A request reduction time window service thread divides time into 
small windows of configured interval. All time windows associated with 
the same file use the same configured interval. Time windows of 
different files are free to be configured with differant time intervals. 
There can and almost certainly will be more than one time window for the
same file when measured over a longer time. The temporal flow of time 
windows for the same file may or may not be continuous. New time windows 
for the same file start at the release time of the previous time window 
for that file if there is one or more pending request(s) for the file. 
If not the continuity is broken and the time window for that file will 
reappear when the next request for that file is received. All time 
windows for the same file are non-overlapping with each other. Time 
windows of differant files may and almost certainly will overlap each 
other's temporal boundaries. 
   
Request Reduction Operating Cycle

   When an input request arrives at the listener thread it is 
immediately put into the manager thread's queue of input requests. 
The manager thread will go through this queue, classify each reqeust, 
and take appropriate action.

Case 1: Request Is Not A Consolidation Candidate 

   The manager thread has examined the request to see if it is a file 
request with no client specific processing. It hasn't passed this test 
so the manager thread starts a unicast thread to service the request's 
reply and passes the request to the server.

Case 2: Request Is A Consolidation Candidate For A New Time Window

   The request examined by the manager thread is a consolidation 
candidate. Its semantics have been examined to see if they match the 
semantics of the request held by any existing time window. No match 
was found. The manager thread starts a new time window service thread 
to service this request. This involves creating a new time window by 
allocating a time window structure, starting a new timer, setting the 
reduction count variable to zero, allocating a memory buffer for the 
new request, moving the new request into this buffer, and incrementing 
the time window's reduction count variable by one. Another buffer 
associated one-to-one with the newly allocated time window is allocated. 
This buffer contains the list of addresses of clients that share the 
same time window membership. Another way to look at it is to say that 
this is a list of addresses of clients that have made the same request 
at nearly the same time. This list is called the multicast distribution 
list. This list is keyed by a system generated unique key that binds it 
to the consolidated request.

Case 3: Request Is A Consolidation Candidate For An Existing Time Window 

   The request examined by the manager thread is a consolidation 
candidate. Its semantics have been examined to see if they match the 
semantics of the request held by any existing time window. A match was 
found. The client address of the newly arrived request is inserted into 
the matching time window's multicast distribution list and the time 
window's reduction count variable is incremented by one. 

   When a time window's release comes due, either because of elapsed 
time or because the reduction count variable has exceeded a configured 
maximum, the following happens.

   1) The time window service thread generates, for reference by the 
co-resident multicast response process, a unique request key that  
identifies the consolidated request and its multicast distribution 
list. 

   2) The time window service thread inserts the multicast distribution 
list into a queue kept in shared memory. ( The other co-resident 
process, multicast response, is this queue's consumer. )
   
   3) The time window service thread creates a consolidated request. 
This consists of the common request data held in the time window's 
request buffer with the client's address field filled in with the 
system generated unique key that references the list of clients that 
should receive a reply to the consolidated request.

   4) The consolidated request is passed to the server. 
( Note that this request looks exactly like an ordinary request 
to the server. )

Key Generation Issues

   It is important that the key generator does not select a key value 
to identify a multicast distribution list which collides with the
originator address of a request which was bypassed as a candidate
for consolidated request. To insure that this does not occur Network 
Traffic Consolidation will have assigned to it one Class B Internet id 
from which the key generator will make the key values that uniquely 
identify each multicast distribution list. Since these key values are 
pseudo-addresses that are never seen outside of the NTC processes that 
reside on one platform it is possible for every installation of NTC to 
re-use the same Internet id as the root from which all key values are 
derived. The only real issue is to insure that no client can ever have 
the same address as an NTC key value.

Server To Client

IP Multicast in a Nutshell

   Multicast communication involves the sending of packets from one 
source to many destinations. Network routers that run the multicast 
router daemon copy received packets onto those interfaces that are part 
of a shortest path distribution tree pruned of superfluous links. 
This pruned distribution tree provides just one path from the packet's 
source to each destination. Destinations are referenced by a special 
type of IP address known as a group address or Class D Internet address. 
Recipients of multicast packets have a standard command interface that 
allows them to join and leave a group address thus controlling what 
transmissions they will receive. The architecture of IP multicast is 
defined by RFC 1112. A representative implementation is MOSPF defined 
by RFC 1584 and further discussed in RFC 1585.

Reliable Multicast Protocols

   Reliable Multicast Protocols, or RMPs, are built upon the network 
layer IP Multicast service that has become widely deployed on the 
Internet.

Qualifying A Reliable Multicast Protocol

   Network Traffic Consolidation allows customer's to select the RMP
service of their choice from a list of supported RMPs that meet
the following qualification criteria.

Receiver initiatiated design to achieve:
   Distributed state management that scales to many receivers without 
   sender becoming a bottleneck.

IP multicast is the network layer service used by the RMP.

Supports dynamic join and leave of members from a multicast group.

Completely and transparently manages flow control, error retransmission 
and error messages.

Native support for unicast transmission instead of reliance on TCP 
as a second transport.

Distributed modular organization that allows error retransmission 
from local data store instead of the sender to achieve:
   Reduced wait by client on a retransmission.
   Better use of network bandwidth.
   Server free from having to buffer sent data.

Multicast Response

   The multicast response driver is a very simple single threaded 
process. It receives a server reply through a call from the NTC 
service API and tries to find a multicast distribution list that 
matches the reply's destination address. If it cannot match the address 
then it simply calls the RMP's unicast service to handle delivery of the 
server's reply. This is the case of a non-consolidated request being 
fulfilled. If there is a match then a consolidated request is being 
processed. If this is the case then the multicast response driver
makes a multicast invocation of the RMP using the server provided reply 
data, the consolidated request's multicast distribution list, and any 
pertinent configuration data. The only other thing that the multicast 
response driver does is to invoke an RMP primitive to release all process 
and data resources associated with the request after a configured amount 
of time. The timeout is calculated to be long enough to allow nearly 100% 
of transmissions to complete in their entirety under adverse conditions.

Advantages Of Network Traffic Consolidation

   Network Traffic Consolidation moves in the direction of bounding 
the number of requests that a server will receive for a given file
in a given span of time. This makes server load more predictable and 
more importantly protects the server from being overwhelmed by too 
many requests in a given span of time. 

   Although HTTP data can be served very well by Network Traffic 
Consolidation this technology is future safe in the sense that it is 
general enough to process all highly redundant data records regardless 
of format.

   Network Traffic Consolidation addresses the bottleneck at the point
where busy servers, be they primary or proxy servers, transfer data from 
host memory to network media. This involves significant CPU resource on 
popular servers that regularly have dozens of clients simultaneously 
requesting the same few high level HTML files.  

   The multicast mode of transmission used by the server to client 
processing of Network Traffic Consolidation preserves network bandwidth 
when compared to the current unicast method of serving clients.

   Network Traffic Consolidation reduces the number of software 
interrupts received by network hosts for a given rate of client 
requests.

   Network Traffic Consolidation scales exceptionally well. The worst
area of Web site overload involves accessing the first few levels of 
HTML document files. There is more redundant data access here than 
anywhere else. Because of the hierarchical structure of a Web site nearly 
everyone enters from a common top page and there is a slow moving 
concentration of traffic at levels near the top page that gradually 
works downward. In Network Traffic Consolidation, because every like 
intentioned request in the same small time window is consolidated into 
one request, the greatest improvement over the conventional one request 
one response mode of service is seen during heavy load.

                  
Security Considerations

   Since the identities of the clients involved in a consolidated 
   request are masked by a pseudo-address key the server is not able to
   enforce any client specific restrictions to data access. In this
   sense, a consolidated request is a Trojan Horse.  This is an area
   that needs to be addressed.
  
   
References

   S. Deering,   "Host Extensions for IP Multicasting", STD 5, RFC1112 
                  Stanford University, August 1989

   Rajendra Yavatkar, James Griffioen, Madhu Sudam,
                 "A Reliable Dissemination Protocol for Interactive 
                  Collaborative Applications", University of Kentucky,
                  December 1996
                  http://www.dcs.uky.edu/~griff/papers/
                  tmtp-mm95/main.html


   Nils Seifert, "Multicast Transport Protocol Version 2", 
                  Berlin, October 1995
                  http://www.cs.tu-berlin.de/~nilss/mtp/protocol.html


   Alex Koifman, Steve Zabelle
                 "A Reliable Adaptive Multicast Protocol", RFC 1458, 
                  TASC, May 1993

   S. Armstrong, A. Freier, K. Marzullo
                 "Multicast Transport Protocol", RFC 1301,  
                  Xerox, Apple, Cornell,
                  Feb 1993  
                  
   J. Moy        "Multicast Extensions to OSPF", STD 1, RFC 1584, 
                  Proteon Inc., March 1994

   J. Moy        "MOSPF: Analysis and Experience", RFC 1585, 
                  Proteon Inc., March 1994

   T. Berners-Lee, R. Fielding, H. Frystak, 
                 "Hypertext Transfer Protocol - HTTP/1.0", RFC 1945, 
                  MIT/LCS, UC Irvine, DEC, May 1996

   R. Fielding, J. Gettys, J. Mogul, H. Frystyk, T. Berners-Lee
                 "Hypertext Transfer Protocol - HTTP/1.1", 
                  STD 1, RFC 2068, January 1997

   S.E. Spero,   "Analysis of HTTP Performance Problems", 
                  http://sunsite.unc.edu/mdma-release/http-prob.html


Author's Address

   Joseph Mansigian
   155 Marlin Rd.
   New Britain, CT 06053

   Phone: (860) 223-5869
   EMail: jman@connix.com