INTERNET-DRAFT                                               L. Westberg
draft-westberg-loadcntr-01.txt                             Z. R. Turanyi
Expires: Dec. 1999                                              Ericsson
                                                            Jun 14, 1999

                   Load Control of Real-Time Traffic

                   A 2-bit resoure allocation scheme


                          Status of this Memo

   This document is an Internet Draft and is in full conformance with
   all provisionings of Section 10 of RFC 2026. Internet Drafts are
   working documents of the Internet Engineering Task Force (IETF), its
   Areas, and its Working Groups. Note that other groups may also
   distribute working documents as Internet Drafts.

   Internet Drafts are draft documents valid for a maximum of six
   months. Internet Drafts may be updated, replaced, or obsoleted by
   other documents at any time. It is not appropriate to use Internet
   Drafts as reference material or to cite them other than as "work in
   progress".

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.txt

Abstract

   The purpose of this memo is to present a new resource allocation
   scheme for diffserv networks based on two-bit-markings in packet
   headers.  It provides the ability to control the traffic load in the
   network without the use of any signalling protocol. It was designed
   to be very simple to implement. Core routers need to keep no per flow
   state.  The scheme is mainly aimed for large aggregation areas where
   the number of flows or edge devices are high.

1. Background and Motivation

   With the introduction of differentiated services [RFC2475] into the
   IP protocol, it became possible to provide large scale real-time
   services. The basic idea of diffserv is not to classify packets at
   each router, but only at the edges. The result -the required packet
   treatment- is stored and carried in the packet headers. Core routers
   can carry out appropriate scheduling based on this result.


Turanyi, Westberg                                               [Page 1]

draft-westberg-loadcntr-01                            Expires: Dec. 1999


   The current definition of diffserv, however, does not contain any
   simple and scalable solution to the problem of resource provisioning
   and control. A number of approaches already exist to the problem
   [Berson97, Guerin97, Stoica99, Bernet99]. The scheme presented in
   this document does not require any state aggregation and aims at
   extreme simplicity and low cost of implementation together with good
   scaling properties.

   Our load control scheme uses bit markings in the packet headers to
   gather information about the load level along various paths of the
   network. Doing so, per flow state again can be shifted to the edges.
   This approach is useful especially in large aggregation areas, where
   the high speed of interfaces and large number of flows call for
   simple and scalable solutions.

   This load control scheme is not an end-to-end scheme. Therefore it is
   possible to use it in a domain independently of the neighbouring
   domains. It is also possible to use it if only a subset of the
   routers supports it within the given domain. There are many proposed
   ways to provide end-to-end QoS, including RSVP/Intserv,
   overprovisioning, statistical provisioning via SLAs, etc.
   Combinations of them are also possible. Load control can interact
   with such end-to-end solutions and be part of the QoS chain as well.
   It is outside the scope of this document to specify the interaction
   between load control and other methods, we restrict ourselves mainly
   to the packet markings. Some example applications are given in
   section 5, though.

   Load control operates in a DS domain, where edge devices keep per
   flow state and do per flow processing while core routers do not. Its
   main purpose is to provide a simple and scalable solution to the
   resource-provisioning problem. The required complexity in the edge
   devices grows linearly with the number of flows passing them, while
   core device complexity does not grow with the number of flows or the
   number of edge devices. The aim was to make it possible to implement
   per packet work in hardware easily.

2. Overview

   Load control is achieved by two actions: admission control of
   incoming requests and the drop of admitted flows in cases of serious
   congestion.

   When a new request arrives to the edge of the DS domain, before
   admission control a probe packet is sent through the domain. The core
   routers mark the header of the probe packet if they do not have
   enough resources to serve a new request. When the probe packet
   arrives to the edge, its header reflects the status of resources


Turanyi, Westberg                                               [Page 2]

draft-westberg-loadcntr-01                            Expires: Dec. 1999


   along the path and the edges can make the admission control decision
   accordingly.

   Under normal circumstances, admission control is enough to control
   the load in the network. Nevertheless, when exceptional events (such
   as routing changes) cause too much traffic to be re-routed over a
   link, the resulting serious congestion may compromise the quality of
   all flows on that link. In that case, the correct behaviour is to
   drop already admitted flows to provide some quality to others. Thus,
   when serious congestion occurs, the core routers mark the header of
   all packets and by doing so notify the edges about the serious
   congestion. Then the edges can start dropping flows until the packet
   markings stop.

   The above elements may be augmented with an optional central entity,
   which collects measurements on the performance of the network and
   adjusts admission criteria based on long term performance
   measurements. This entity can also be used to implement network
   policies. Such a load control server may be essential to the high
   performance of the system.

3. Operation of Load Control

   The load control scheme we are proposing has two variants.

   'Simple marking' refers to a measurement-based admission scheme where
   routers measure the actual traffic and base the admission decision on
   the measurements. We call the other variant 'Unit-based
   reservations'. This variant provides a way for sources to keep their
   reservations even if they generate less or no traffic. This is done
   in a soft state manner by periodically sending specially marked
   reservation refreshment packets. Simple marking and unit-based
   reservations are described in section 3.1 and 3.2 respectively.

   We assume a DS domain where connection requests arrive to the edges
   of the domain via either RSVP or other means. The requests may also
   be generated directly at the edges in a gateway, which provides
   connection to other types of networks, or in hosts that are connected
   directly to the domain. In this section, we use the gateway as an
   example and the IP domain is used to connect e.g. telephony networks.

3.1 Simple Marking

   In the following we describe how admission control is performed with
   simple marking and how the scheme can drop already admitted flows in
   case of serious congestion.

3.1.1 Admission Control


Turanyi, Westberg                                               [Page 3]

draft-westberg-loadcntr-01                            Expires: Dec. 1999


   In what follows, we assume that core routers have some means to
   communicate the onset of resource exhaustion to the edges via a
   simple marking in the packet header. Possibile places for such a
   marking may be the two reserved bits in the DS field [RFC2474] or a
   remappable DS codepoint. See section 4.1 for further details.

   The basic idea of simple marking is that core routers measure the
   traffic and if they encounter a near exhaustion of resources, they
   mark passing packets, thereby notifying the gateways about the
   congestion. The gateways then reject incoming flow requests.

   The router can use any algorithm to decide when to signal the
   exhaustion of resources. It can send such blocking signals whenever
   it wishes to stop accepting new traffic.

   Using the above marking feature, the gateways at the edge can check
   the availability of resources along the path of a new flow in the
   following way. Before establishing the flow, the initiating gateway
   sends a probe packet into the network. The probe packet passes
   through the same routers as the actual traffic will and is exposed to
   the marking function of the routers. When it reaches the destination
   gateway, its header will reflect the status of the resources along
   that path.

   When the destination gateway receives the probe packet, it copies the
   indication from the header into the payload of a reverse probe packet
   and sends it back to the initiating party.

   The reverse probe packet then returns to the initiating gateway and
   upon receipt, its header will reflect the congestion status on the
   backward path.

   If load control is applied in a network with bi-directional flows,
   then the initiating gateway blocks it if either of the probe packets
   have been marked. If none have been marked, it assumes that there is
   enough capacity in the DS domain and proceeds with the flow. If the
   flows are unidirectional, then the initiating gateway bases its
   decision on only one of the directions. If the communication is
   unidirectional and can be blocked by the receiver, then no reverse
   probe packet is needed.

   After establishing the flow, both gateways ignore the markings in
   incoming packets. This means that once a flow has been established,
   it will not be dropped by this mechanism. Instead, other newly
   arriving flows are blocked.

   To make the scheme more robust against packet loss, the initiating
   gateway starts a timer, when it sends the first probe packet. If any


Turanyi, Westberg                                               [Page 4]

draft-westberg-loadcntr-01                            Expires: Dec. 1999


   of the probe packets is lost, it simply retransmits on timeout.
   Selecting a precise value for the timer is not crucial. If the timer
   fires too late, only the flow setup latency will increase. Setting
   the timer value too low will cause only unnecessary retransmissions
   of probe packets, but the semantics of the scheme will not change.

   Note that this is not a new signalling protocol. Any packet will do
   as a forward or reverse probe packet if it has at least one bit in
   the payload as a placeholder for the returned congestion status.
   Thus, message exchanges of existing protocols can be fully reused.

   For bi-directional flows the forward and backward path of the flow
   can differ between the ingress and egress gateways. It is required
   however from the routing protocol that during normal operation all
   packets of a flow are delivered on the same path including the probe
   packet.

   Packets belonging to flows that have been rejected must either be
   dropped by the ingress or sent marked with another (typically best
   effort) DSCP.

3.1.2 Drop of Already Admitted Connections

   If a routing change occurs in the network either due to link failure,
   topology change or traffic management action, several already
   admitted flows may be re-routed over a link, which has insufficient
   capacity to carry all of them. This may result in excessive packet
   losses and the quality of all flows may be seriously compromised.

   If the core routers are able to signal this serious congestion
   situation to the gateways then the gateways can solve the problem by
   dropping enough flows to save the remaining. The natural place to
   drop flows is the gateway where all information (bandwidth usage,
   policy, etc.) is present.

   If a core routers detects serious congestion on one of its
   interfaces, it starts marking all packets leaving that interface, not
   only probe packets. If the egress gateway receives a marked packet
   with is not a probe packet (this can be determined from the packet
   content), it can interpret it as a sign of serious congestion along
   the path and it should notify the ingress gateway. The means of this
   notification may vary depending on the type of gateways used. In case
   of our example, the egress application can notify the ingress
   application via some type of application level signalling. Such
   notifications should not be sent too often to prevent the unnecessary
   load on the network and on the ingress gateway.

   When an egress gateway receives a marked packet which is not a probe


Turanyi, Westberg                                               [Page 5]

draft-westberg-loadcntr-01                            Expires: Dec. 1999


   packet, it must somehow determine which ingress gateway to notify.
   How this can be done, depends on the application and is for further
   study. See sections 5.1 and 5.2 for examples.

   When we wish to use simple marking with the ability to drop already
   admitted flows, we need 3 codepoints:

      - regular packet
      - probe packet
      - marked packet

   In case of serious congestion both regular and probe packets are
   marked, while in blocking state only probe packets. If the drop of
   admitted connections feature is not needed, then to save codepoints
   non-probe packets can be sent as "marked" and the "regular" codepoint
   is not needed.

3.2 Unit-Based Reservations

   While measurement-based admission control has important advantages
   over non-measurement-based algorithms, it has its disadvantages as
   well.  Unit-based reservations make it possible for the sources to
   keep their bandwidth reservation independently of the amount traffic
   generated. Although the admission scheme is very similar to the
   simple marking case, this is a fundamental difference.

   The basic idea of unit-based reservations is that sources
   periodically mark some of their data packets as refreshment packets.
   The length of the refreshment period must be globally the same in the
   DS domain. The refreshment packets are used to refresh the
   reservations in a soft-state manner. Each refreshment packet reserves
   one "unit" of the resources for one refreshment period. Core routers
   simply count the number of refreshment packets in a refreshment
   interval and thus estimate the number of reserved units. If the
   router runs out of units it goes into blocking state and starts
   marking probe packets.

   With simple marking, probe packets were used only to probe the
   network for free resources. In contrast, here probe packets when
   passed unmarked, actually does the reservation of one unit of
   resources for one refreshment period. Thus, after the probe packet
   has passed along the path unmarked, the source needs to send the
   first reservation refreshment packet only one refreshment period
   later.

   If a refreshment (or probe) packet is lost, for that refreshment
   period the unit will be reserved only upstream form the loss. The
   downstream routers will thus underestimate the number of reserved


Turanyi, Westberg                                               [Page 6]

draft-westberg-loadcntr-01                            Expires: Dec. 1999


   units. Refreshment and probe packets should therefore be protected
   from losses as much as possible.

   Upon accepting a request for a unit (passing a probe packet unmarked)
   core routers can increment the unit number estimate immediately. If
   the probe packet gets marked later in a congested router, the
   upstream routers will not be notified immediately and keep the
   reservation falsely. However, in the next refreshment interval no
   refreshment packet will arrive for the unit, so the reservation will
   be released automatically.

   Similar to simple marking, in blocking state the core routers mark
   only probe packets and not regular packets, as edge devices need
   congestion information only for probe packets. On the other hand, in
   case of serious congestion core routers start marking regular packets
   as well.

   Each flow can occupy any number of units. The unit is not necessarily
   a simple bandwidth value, it can be defined in terms of any resource
   the router has. It can take the form of effective bandwidth as well.
   Also, in an IP telephony network it may mean one call with very well
   known statistical multiplexing properties. The definition of the unit
   may vary from application to application and is outside the scope of
   this document.

   Simplicity does not come free, as all resource requests must be
   expressed in terms of integer multiple of the unit. This poses
   certain inflexibility and requires the careful selection of the unit.
   By selecting a too low unit, many probe packets are needed for most
   flows. Selecting a too large unit will result in resource waste. A
   partial solution to the problem can be found in section 3.5.

   When we wish to use unit-based reservations, we need 4 codepoints:

      - regular packet
      - probe packet
      - marked packet
      - refreshment packet

3.3 Coexistence with Non-Load Control Capable Devices

   If the load-control domain contains core routers that are not load
   control capable, the scheme continues to work. The only requirement
   put on such routers is not to change the DS field of real-time
   packets. Naturally, no control can be exercised over those links that
   are connected to an interface of a non-load control capable router.
   However, if these links are overprovisioned enough, then QoS will not
   be compromised.


Turanyi, Westberg                                               [Page 7]

draft-westberg-loadcntr-01                            Expires: Dec. 1999


   Edge devices must be load-control aware in order to be able to check
   resource availability and refuse flows or to return probe packets.
   However, if a given edge device will never participate in real-time
   traffic, there is no need for load control functionality in that edge
   device. Thus, only those edge devices need to be load control aware
   that really use it.

3.4 Admission Precision of Simple Marking

   Simple marking is basically a measurement-based admission control
   scheme, where flows do not say anything about their traffic. In
   addition, flow departure is not signalled explicitly.

   When the network carries more types of flows with different bandwidth
   requirements, the core routers do not know the bandwidth requirement
   of the incoming flows. They simply declare if they accept more flows
   or not irrespective of the bandwidth demands of the new flow. Thus
   the marking algorithm in the routers should conservatively expect
   always the largest type of flow the network carries and start
   rejecting when there is not enough bandwidth left for one such flow.
   On the positive side, this will result in fair rejection among
   different flow types, but on the negative side, some bandwidth will
   be wasted. However, if the links of our domain can carry at least
   several hundreds even from the most bandwidth demanding type of flow,
   then this is not a significant waste.

   For example, if our network carries voice (16kbit/sec), real-time
   gaming (90kbit/s), music (256 kbit/sec) and video (1Mbit/sec) flows,
   then the waste due to this conservative approach will be at most the
   size of a video flow, which is less than 1% on an OC-3 interface. If
   the links cannot carry that much from the most demanding type of
   flow, then there is no need to the scalability benefits of load
   control and more stateful approaches can be used for that type of
   flows (e.g. RSVP).

   When a core router declares the acceptance of a new flow (passing a
   probe packet unmarked), it has two options. Either it corrects its
   measurements to take the newly admitted flow into account while its
   traffic reaches the router, or it does not make any correction.

   In the latter case, the router may falsely accept some flows between
   the time of the acceptance and the arrival of actual flow data.
   Numerical examples are provided in Appendix A to illustrate the
   magnitude of this error.

   In the former case corrections are used to prevent the above
   mentioned false admissions. However, it is hard to determine the
   amount of correction to add and also the time till which the


Turanyi, Westberg                                               [Page 8]

draft-westberg-loadcntr-01                            Expires: Dec. 1999


   correction lasts. One conservative approach would be to count on the
   largest type of flow and correct accordingly. The timeframe could be
   the typical round-trip-time edge-to-edge. Also if the probe packet is
   marked later on the path, then the flow will be rejected and no
   correction would have been necessary. This error is small, however,
   if the blocking in the network is a few percent only. Corrections
   also require more processing in core routers per probe packet.

   Note that this problem does not exist with unit-based reservations,
   as probe packets in that case actually reserve one unit of resources
   for one refreshment period.

3.5 Class of Traffic

   Both in the case of simple marking or unit-based reservations we can
   convey further information to the core routers by dividing real-time
   traffic into classes and by using different DS codepoints for
   different classes. If the DSCPs denote not only the PHB the flow
   shall receive, but the bandwidth demand of the flow as well, core
   routers may mark packets more intelligently, resulting in less
   resource waste or greater flexibility.

   In the simple marking case we have two options. First, different
   acceptance levels can be used for the different classes and each
   probe packet is marked according to the acceptance level of its
   class. Thus, the operator does not have to set the acceptance level
   to a value conservative enough to fit the largest type of flow. Each
   class may have its custom value depending on the resource
   requirements of the class. Second, the router can separately measure
   the traffic in each class and make marking decisions depending on the
   resource usage in the class of the probe packet. This makes it
   possible to limit the resource usage on a per class basis.

   If the markings are implemented by the use of different DSCPs, then
   "regular" and "marked" packets in all classes can use the same DSCP
   making the total DSCP need of the scheme to 2+1/class or 1+1/class if
   regular packets are not used. If markings are implemented in the two
   unused bits of the DS field, then the DSCP need is 1/class. For these
   implementation options see section 5.1.

   In the unit-based case the major benefit is that the size of the unit
   can be different in different classes, making it possible to allocate
   resources with finer granularity. If the markings are implemented by
   the use of different DSCPs, then "regular" and "marked" packets in
   all classes can use the same DSCP making the total DSCP need of the
   scheme to 2+2/class. Again, if markings are implemented in the two
   unused bits of the DS field, then the DSCP need is 1/class.


Turanyi, Westberg                                               [Page 9]

draft-westberg-loadcntr-01                            Expires: Dec. 1999


4. Implementation Issues

4.1 Codepoints

   For the different variations of load control -simple marking (with or
   without dropping of admitted flows) and unit-based reservation-, we
   need 2, 3 and 4 codepoints per PHB respectively. The total number of
   codepoints needed depends on the number of real-time PHBs to use load
   control width. (See section 3.5.)

   For the encoding of those codepoints into the DS field, we have two
   options. The first is to use different DSCPs from the 64 possible
   values of the lower 6 bits for each load control codepoint needed.
   Thus if the simple marking is used for the EF PHB, then we need 3
   DSCPs: EF-regular, EF-probe and EF-marked. In this case the bit
   patterns used for these in a given domain are completely independent
   from each other and follow all the rules specified in [RFC2474].

   Another option is to use the two currently reserved bits in the DS
   field in conjunction with the DS codepoints that denote PHBs for
   real-time traffic. The 4 values covered by the two bits can be
   assigned to load control codepoints. The interpretation of the two
   bits may remain unspecified for other codepoints, not to prevent a
   possible ECN deployment. [RFC2481]

   The proposed default bit value assignment is the following.

                    DS field   load control
                    01234567   codepoint
                    -----------------------
                    xxxxxx00   regular
                    xxxxxx01   probe
                    xxxxxx10   marked
                    xxxxxx11   refreshment

4.2 Behaviour of the Core Routers

   Irrespectively of the application of the load control scheme, on the
   protocol level core routers need to behave the same. This behaviour
   is specified below for both variations.

4.2.1 Simple Marking

   The router continuously keeps a state if it wants to accept more
   flows or not. The means of determining this state is out of the scope
   of this document. If the state is accepting, the router passes all
   packets unchanged. If the state is rejection, then the router changes
   the marking of incoming packets with "probe" to "marked". All other


Turanyi, Westberg                                              [Page 10]

draft-westberg-loadcntr-01                            Expires: Dec. 1999


   markings remain intact.

   If the router is capable of determining serious congestion state and
   the "regular" codepoint is allocated, then upon detecting serious
   congestion, the router marks both "regular" and "probe" packets as
   "marked".

4.2.2 Unit-Based Reservations

   The router uses the "refreshment" and "probe" markings in packets to
   maintain its estimation of reserved resources. A refreshment packet
   signals already admitted resource usage, while probe packet signals a
   new request. When passed unmarked, both reserve one unit for one
   refreshment period.

   If the state of the router is accepting, then it passes all packets
   unchanged. If the state is rejection, it marks all incoming "probe"
   packets as "marked" and leaves all other packets unchanged.

   If the router is capable of detecting serious congestion, and it
   happens, then the router marks both "regular" and "probe" packets as
   "marked".

   The router always leaves the marking in "refreshment" packets
   unchanged.

4.3 Behaviour of the Edges

   The behaviour of the edges highly depends on the application or
   signalling protocol that uses the load control scheme. Below we only
   describe few elements of the edge behaviour that are necessary for
   interworking with the core routers.

4.3.1 Ingress Behaviour

4.3.1.1 Simple Marking

   Probe packets sent by the ingress should be marked as "probe". If the
   "regular" codepoint is allocated then other packets should be marked
   as "regular" otherwise as "marked".

4.3.1.2 Unit-Based Reservations

   The ingress should generate the required number of refreshment
   packets during the refreshment period for each flow it has. If there
   are not enough data packets to mark as "refreshment", then the
   ingress must generate dummy packets and mark those. The emitted
   "refreshment" packets should be as uniformly distributed through the


Turanyi, Westberg                                              [Page 11]

draft-westberg-loadcntr-01                            Expires: Dec. 1999


   refreshment interval as possible to minimise the errors of clock rate
   differences between devices.

   When a new reservation is needed, the ingress should send the
   appropriate number of packets marked as "probe".

   Otherwise all data packets should be sent as "regular".

4.3.2 Egress behaviour

4.3.2.1 Simple Marking

   If the egress receives a probe packet which is "marked", it means
   that the network has insufficient capacity along the path between the
   ingress and egress. The egress should take care of blocking the call.
   (Block the call itself or notify the ingress.)

   If the "regular" codepoint is used and the egress receives a "marked"
   packet, which is not a probe packet, it shall start dropping flows if
   that feature is enabled. If it cannot drop flows, it should notify
   the ingress to do so.

   The egress has nothing special to do with packets marked "regular".

4.3.2.2 Unit-Based Reservations

   If the egress receives a packet marked as "probe", it shall take care
   of accepting the flow belonging to the probe packet.

   If the egress receives a "marked" packet, and it is a probe packet,
   the egress shall take care of rejecting the flow belonging to the
   probe packet. If it is not a probe packet, then the egress shall
   start dropping flows. If it cannot drop flows, it should notify the
   ingress to do so.

   The egress has nothing special to do with packets marked "regular" or
   "refreshment".

5. Example Applications

   Below we give some example applications of the load control scheme to
   illustrate its operation.

5.1 Media Transport Network

   In this section we discuss a set of applications, key property of
   whose is that the edge nodes are the sources and destinations of the
   real-time IP packets. That is, they are not only routers that forward


Turanyi, Westberg                                              [Page 12]

draft-westberg-loadcntr-01                            Expires: Dec. 1999


   packets from other sources into the DS domain or forward packets to
   other destinations from the DS domain. The set of egress nodes then
   includes hosts as well as media gateway like devices. The application
   that is used as the example in section 3 is one such application. The
   egress and ingress nodes are assumed to unit common per flow state.

   In some cases there is some form of direct signalling between the
   nodes, such as SIP (without proxies). Packets of that signalling can
   be used as probe packets. However, if such direct signalling is not
   in place, then the nodes must generate dummy packets to act as
   (forward or reverse) probe packets. Specially treated ICMP Echo
   request and reply packets would suffice for example.

   In this context, the egress node can always identify the ingress node
   by looking at the source IP address of the incoming packets. This
   makes it very easy for the egress to notify the ingress about
   successful or failed probe packets or serious congestion along the
   path.

5.2 Interworking with RSVP/Intserv

   Load control can also be used in diffserv regions (backbones) that
   connect RSVP/Intserv regions. Such an interoperation is described in
   detail in [Bernet99]. Throughout this section the definitions and
   terms of that document are used. For load control to be able to
   operate, border routers of the diffserv region must be RSVP-aware to
   detect the arrival of new connections. No RSVP functionality is
   needed inside the diffserv region, however.

   PATH messages can be used as probe packets to gather congestion
   information along the path between the two border routers. When a new
   RSVP path state is installed at the egress border the collected
   admission state of the path (collected in the packet of the PATH
   message) is also stored. If a RESV message for the installed state
   arrives within a time period while the congestion state can be
   considered valid, then the egress border can perform the admission
   control for the diffserv network as well. If the first RESV message
   arrives too late, then the egress border must solicit a new (dummy)
   probe packet from the ingress to determine the current congestion
   state.

   When the egress receives a "marked" packet, which is not a PATH
   message nor a dummy probe packet, this signals the serious congestion
   state along the path. The identity of the ingress router can be
   easily determined form the path state, but in this case the egress
   can itself decide on the drop of certain reservations. The ingress
   can be notified via ResvTear messages while the receiver end systems
   get ResvErr messages.


Turanyi, Westberg                                              [Page 13]

draft-westberg-loadcntr-01                            Expires: Dec. 1999


   If unit-based reservations are used in addition to the above, the
   ingress router is responsible for generating refreshment marks in the
   data packets of the flows at the appropriate rate.

5.3 VPN

   Unit-based reservations can also be used to provision resources in a
   DS domain that is used to provide VPN "tunnels" between customer
   sites. By the use of the load control scheme, it is easy and fast to
   modify the size of these tunnels, thus tunnel size selection can be a
   very dynamic process. Note that tunnels are not necessarily real-time
   tunnels, packets of any DSCP can travel on them, receiving the
   appropriate PHB. Even best-effort tunnels can be reserved this way.
   Provisioning can be done on a per DSCP basis or in aggregates as the
   service provider wishes.

   This way the service provider, instead of using statistical
   provisioning or a central bandwidth broker functionality, solves the
   problem of provisioning in a distributed way. Load control can take
   routing changes fast into account, prohibiting the extension of
   tunnels when due to failures there is not enough capacity along the
   path, even though under normal circumstances there were.

6. Security Considerations

   We are proposing to use bit markings in packet headers (DS field) to
   reserve resources within a diffserv domain. This poses similar
   security problems as the use of the DS field to differentiate packets
   in general. [RFC2475]

   If the interior of the DS domain fully contains a tunnel, then by
   copying the outer marking into the inner header at de-capsulation,
   load control can be exercised over the links of the tunnel as well.
   The procedure is similar to the one described in [RFC2481]. As IPSec
   [RFC2402, 2406] does not allow the copying of the DS field from the
   outer to the inner header at de-capsulation, load control cannot be
   exercised over regions where IPSec tunnels are used.

7. Multicast

   TBW

Appendix A. Effect of Delays on Admission

   When passing a probe packet unmarked without correcting our estimate
   of the free resources, we in fact admit a flow without immediately
   reserve resources for it. The reservation will be implicitly done
   later by the arriving traffic or refreshment packets of the flow.


Turanyi, Westberg                                              [Page 14]

draft-westberg-loadcntr-01                            Expires: Dec. 1999


   During the time between admission and the arrival of the traffic of
   the flow, new requests can be admitted without taking the previously
   admitted flow into account.  To illustrate the effects of this delay
   we took an old and simple Markovian example. Flows are identical with
   an average flow holding time of 180 seconds and flow arrivals and
   departures follow a Poisson process. Let the link be able to carry N
   calls and let the delay be T. The link starts refusing flows when the
   measured traffic exceeds N-H calls. We can say that space of size H
   is put aside to cater for the errors caused by the delay.

   If the link is properly dimensioned, then usual blocking ratio should
   not exceed 1%. However, in a mass call like situation (such happens
   at new year's eve for example) it can be considerably higher. In this
   example 50% blocking was chosen to demonstrate the extreme load case,
   thus offered traffic is roughly twice the link capacity.

   QoS violation occurs if during time T the difference between the
   number of arriving and departing flows is larger than H. Under the
   above assumptions the chance of QoS violation can be calculated.
   Naturally the larger H is the less the chance that QoS violated is.
   The required value of H can be determined for a low value of QoS
   violation probability (e.g. 10e-5).

   The following table presents the value of H as a function of link
   size (N), delay length (T) and load (causing 1% or 50% blocking).

         |    1ms    |   10ms    |   100ms   |   500ms   |     1s    |
         | 1%  | 50% | 1%  | 50% | 1%  | 50% | 1%  | 50% | 1%  | 50% |
   ------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
      50 |  2  |  2  |  2  |  3  |   3 |   4 |   4 |   5 |   5 |   7 |
     100 |  2  |  2  |  3  |  3  |   4 |   4 |   4 |   7 |   6 |   9 |
     500 |  2  |  3  |  3  |  4  |   4 |   7 |   9 |  13 |  12 |  18 |
    1000 |  3  |  3  |  4  |  4  |   5 |   9 |  12 |  18 |  16 |  25 |
    5000 |  3  |  4  |  5  |  7  |  12 |  18 |  24 |  44 |  33 |  69 |
   10000 |  4  |  4  |  7  |  9  |  16 |  25 |  33 |  69 |  47 | 113 |

   The amount of required safety margin is the highest for small links
   as less statistical multiplexing is possible there.

Appendix B. A Simple Algorithm for Core Routers

   In this appendix, we present an algorithm for core routers, which use
   unit-based reservations. The algorithm is simple, so it can be easily
   implemented in hardware by simple counters. Its inputs are the
   refreshment interval and the number of flows allowed on the link. The
   latter is denoted by <threshold>. (We assume flows with similar
   characteristics (e.g. voice) and that one flow sends one refreshment
   packet per refreshment interval.) If the network uses more DSCPs for


Turanyi, Westberg                                              [Page 15]

draft-westberg-loadcntr-01                            Expires: Dec. 1999


   real-time traffic, then a separate copy of the algorithm may be run
   for each DSCP, resulting in per DSCP admission.

   The algorithm counts the number of refreshment and admitted probe
   packets in refreshment intervals (<count>). The result of the
   counting is an upper bound on the number of units reserved on the
   link, as some reservations may have gone by the end of the
   refreshment interval. The value of this counter is used in the next
   interval to decide on admission (<last>). When a new reservation is
   admitted this value is increased to take the new reservation into
   account. If this value is high above the admission limit, then we
   start sending serious congestion notification by marking regular
   packets as well.

      On initialisation:
         last = 0
         count = 0

      On arrival of a refreshment packet
         count++

      On arrival of a probe packet
         if last < threshold then
            last ++
            count ++
         elseif
            Mark Packet
         endif

      On arrival of an regular packet
         if last < treshold*1.1 then
            Mark Packet
         endif

      On the end of the refreshment interval
         last = count
         count = 0

Appendix C. Simulation Results

   The purpose of the simulations described in this appendix is to give
   some insight of the performance of load control. The simulation cases
   are by no means representative and the scheme may work differently in
   other situations. In section C.1 the simple marking case is
   demonstrated with a pure measurement-based admission algorithm by
   using a single link with both constant bit-rate and on/off sources.
   In appendix C.2 the unit-based reservation method is shown, using the
   algorithm in appendix B.


Turanyi, Westberg                                              [Page 16]

draft-westberg-loadcntr-01                            Expires: Dec. 1999


   The serious congestion signalling is not used in any of the examples,
   only admission control.

   We simulated a very simple network of one link. This can be viewed as
   the single bottleneck in the domain. The link had a 2 Mbit/s
   throughput, 50% of which was designated to carry real-time traffic.
   The round trip propagation delay was set to 100ms. The real time
   flows arrived according to a Poisson process, holding time was
   exponential with 90 second mean. The arrival rate of flows was set to
   produce approximately 50% blocking. Only real-time traffic was
   simulated, so scheduling was simple FIFO.

C.1 Simple Marking

C.1.1 Constant Bit-Rate Sources

   In the first case, flows emitted 40 byte long packets in every 20 ms,
   producing a constant 16 kbit/s load. The 1 Mbit/s capacity assigned
   to this traffic can thus carry 62.5 flows. From the table in appendix
   A., we can see that 4 calls should be reserved in addition to the
   62.5. After an initial transient of 5 minutes, we simulated 2.5
   hours.

   During the 2.5 hour simulation time, utilisation was measured over
   5-minute intervals. Utilisation was also measured in 20ms slots and
   the percentage of slots, in which it was above 1.064 Mbit/s (66.5
   calls) was counted.

      min/avg/max of the utilisation was: 881 / 899 / 914 kbit/s
      min/avg/max of the violation ratio was: 98.96% / 99.78% / 100%

C.1.2 On/Off Sources

   In the second simulation case on/off sources were used. During an
   "off" period no packets were emitted, while in the "on" state the
   behaviour is the same as in the previous case: 40 byte long packets
   20 ms apart. The distribution of the on and off periods were both
   drawn from a pareto distribution with the shape parameter of 1.1 and
   mean of 5 seconds. The average bit-rate of the sources is thus 8
   kbit/s. The flow arrival rate has been doubled to produce ~50%
   blocking, as the link is capable of carrying nearly twice the number
   of flows. The same set of measurements was carried out as in the
   previous case.

      min/avg/max of the utilisation was: 808 / 819 / 837 kbit/s
      min/avg/max of the violation ratio was: 98.98% / 99.40% / 99.70%

   It can be seen that although the measurement-based approach was not


Turanyi, Westberg                                              [Page 17]

draft-westberg-loadcntr-01                            Expires: Dec. 1999


   able to prevent the overuse of the real-time resources in this high
   load case, it is a viable alternative. In no cases have the 20 ms
   measurements exceed 1.15 Mbit/s, so the overuse means just a
   temporary steal from the resources provisioned to the lower priority
   traffic.

C.1.3 The Router Algorithm

   The mbac algorithm used by the router is presented here only for the
   completeness of simulation description. The marking strategy was the
   same for both types of traffic. The router counts the number of bytes
   transmitted in every 20 ms interval and calculates the average bit
   rate in these 20 ms slots. Then it smoothes these values in time
   through an exponentially weighted moving average (ewma) filter. The
   window size of the ewma was set to 9 seconds, i.e. letting a unit
   step function through it, the output will be 0.63 after 9 seconds.
   The algorithm also calculated the histogram of the difference between
   the original slot values and the filtered values. The histogram has
   been counted in 1000 bins between the range of -1 and +1 Mbit/s. The
   99% quantile of the histogram was calculated in every 100 seconds.
   The router marks all passing packets if the sum of the output of the
   ewma filter and the calculated quantile is greater than 1 Mbit/s. The
   router makes no correction to its measurements when a new flow is
   accepted.

   Thus, the target violation probability was set to 1%, which was in
   fact fulfilled on the long run.

   On arrival of a new packet only counters are incremented. In every 20
   ms a new value for the ewma must be calculated, a marking decision
   must be made for the next 20 ms and the value of one bin in the
   histogram must be increased. In every 100 seconds the 99% quantile
   value must be looked up in the histogram and the histogram must be
   initialised.

   The interested reader can read more about the design rationale of the
   above algorithm in [Gross99].

C.2 Unit-Based Reservations

   In this section we demonstrate the unit-based reservation scheme. The
   routers use simple algorithm in Appendix B, except that it never
   marks regular packets. The simulation setup is otherwise the same as
   in the previous section. The traffic inside the flows does not
   influence the admission algorithm, so during simulation sources send
   only probe and refreshment packets. The definition of the unit is a
   peak bit-rate of 16 kbit/s. The flow number threshold was set to 62
   flows resulting in near the same target utilisation of 1Mbits/s as in


Turanyi, Westberg                                              [Page 18]

draft-westberg-loadcntr-01                            Expires: Dec. 1999


   appendix C.1. The length of the refreshment period was changed
   between 100 ms and 10 seconds. The actual number of flows on the link
   never exceeded 62 (no violation), so only the utilisation values are
   shown in kbit/s.

                      | interval | min | avg | max |
                      +----------+-----+-----+-----+
                      |    --    | 968 | 972 | 976 |
                      | 100 ms   | 952 | 954 | 959 |
                      |  1 sec.  | 941 | 946 | 949 |
                      |  2 sec.  | 927 | 933 | 936 |
                      |  4 sec.  | 908 | 913 | 920 |
                      |  7 sec.  | 861 | 870 | 879 |
                      | 10 sec.  | 827 | 837 | 852 |

   The first line shows the utilisation value for the case when the
   source limits itself to 62 flows, that is blocking is not done by the
   network, but by the source. This emulates the case when the
   refreshment period is infinitely short or when a stateful approach is
   used, like RSVP. The utilisation is not 100% due to the burstiness of
   the arrivals.

   It can be seen that as the refreshment packets gets less frequent
   more resources are wasted, as the resources allocated to departing
   flows remain allocated till the end of the next refreshment period.
   The result is not only lower average utilisation, but lower maximal
   utilisation as well. When the refreshment period is 10 seconds long,
   the highest utilisation experienced was 952 kbit/sec, which is by 3
   units below the limit.

   This motivates the use of an as short refreshment period as possible.
   However, too short refreshment period will increase the effects of
   clock differences between edge and core devices (which was not taken
   into account during simulation). It also decreases the chance of
   finding a packet to mark as refreshment if the flow is currently
   transmitting below its reserved rate.

References

   [RFC2481] Ramakrishan, K., Floyd, S., "A Proposal to add Explicit
             Congestion Notification (ECN) to IP". RFC 2481, January
             1999

   [RFC2474] Nichols, K., Blake, S., Baker, F. and D. Black, "Definition
             of the Differentiated Services Field (DS Field) in the IPv4
             and IPv6 Headers", RFC 2474, December 1998.


Turanyi, Westberg                                              [Page 19]

draft-westberg-loadcntr-01                            Expires: Dec. 1999


   [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z. and
             W. Weiss, "An Architecture for Differentiated Services",
             RFC 2475, December 1998.

   [RFC2205] Braden, R., Zhang, L., Berson, S., Herzog, S., Jamin, S.,
             "Resource Reservation Protocol (RSVP) Version 1 Functional
             Specification", RFC 2205, Proposed Standard, September 1997

   [RFC2402] Kent, S. and R. Atkinson, "IP Authentication Header", RFC
             2402, November 1998.

   [RFC2406] Kent, S. and R. Atkinson, "IP Encapsulating Security
             Payload (ESP)", RFC 2406, November 1998.

   [Bernet99] Bernett, Y., Yavatkar, R., Ford, P., Baker, F., Zhang, L.,
             Speer, M., Braden, R., "Interoperation of RSVP/Intserv and
             Diffserv Networks", Internet Draft, March 1999

   [Stoica99] Stoica, I., et al "Per Hop Behaviors Based on Dynamic
             Packet States", Internet Draft, February 1999

   [Berson97] Berson, S. and Vincent, R., "Aggregation of Internet
             Integrated Services State", Internet Draft, December 1997.

   [Guerin97] Guerin, R., Blake, S. and Herzog, S.,"Aggregating RSVP
             based QoS Requests", Internet Draft, November 1997.

   [Gross99] Grossglauser, M., Tse, D. N. C., "A Time-Scale
             Decomposition Approach to Measurement-Based Admission
             Control", Infocom '99

Authors Address

   Lars Westberg
   Ericsson Research
   Kistagangen 26
   SE-164 80 Stockholm
   Sweden
   EMail: Lars.Westberg@era-t.ericsson.se

   Zoltan R. Turanyi
   Ericcson Telecommunications
   Budapest, Laborc u. 1
   H-1037
   Hungary
   EMail: Zoltan.Turanyi@eth.ericsson.se


Turanyi, Westberg                                              [Page 20]