INTERNET-DRAFT L. Westberg draft-westberg-loadcntr-01.txt Z. R. Turanyi Expires: Dec. 1999 Ericsson Jun 14, 1999 Load Control of Real-Time Traffic A 2-bit resoure allocation scheme Status of this Memo This document is an Internet Draft and is in full conformance with all provisionings of Section 10 of RFC 2026. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts. Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.txt Abstract The purpose of this memo is to present a new resource allocation scheme for diffserv networks based on two-bit-markings in packet headers. It provides the ability to control the traffic load in the network without the use of any signalling protocol. It was designed to be very simple to implement. Core routers need to keep no per flow state. The scheme is mainly aimed for large aggregation areas where the number of flows or edge devices are high. 1. Background and Motivation With the introduction of differentiated services [RFC2475] into the IP protocol, it became possible to provide large scale real-time services. The basic idea of diffserv is not to classify packets at each router, but only at the edges. The result -the required packet treatment- is stored and carried in the packet headers. Core routers can carry out appropriate scheduling based on this result. Turanyi, Westberg [Page 1] draft-westberg-loadcntr-01 Expires: Dec. 1999 The current definition of diffserv, however, does not contain any simple and scalable solution to the problem of resource provisioning and control. A number of approaches already exist to the problem [Berson97, Guerin97, Stoica99, Bernet99]. The scheme presented in this document does not require any state aggregation and aims at extreme simplicity and low cost of implementation together with good scaling properties. Our load control scheme uses bit markings in the packet headers to gather information about the load level along various paths of the network. Doing so, per flow state again can be shifted to the edges. This approach is useful especially in large aggregation areas, where the high speed of interfaces and large number of flows call for simple and scalable solutions. This load control scheme is not an end-to-end scheme. Therefore it is possible to use it in a domain independently of the neighbouring domains. It is also possible to use it if only a subset of the routers supports it within the given domain. There are many proposed ways to provide end-to-end QoS, including RSVP/Intserv, overprovisioning, statistical provisioning via SLAs, etc. Combinations of them are also possible. Load control can interact with such end-to-end solutions and be part of the QoS chain as well. It is outside the scope of this document to specify the interaction between load control and other methods, we restrict ourselves mainly to the packet markings. Some example applications are given in section 5, though. Load control operates in a DS domain, where edge devices keep per flow state and do per flow processing while core routers do not. Its main purpose is to provide a simple and scalable solution to the resource-provisioning problem. The required complexity in the edge devices grows linearly with the number of flows passing them, while core device complexity does not grow with the number of flows or the number of edge devices. The aim was to make it possible to implement per packet work in hardware easily. 2. Overview Load control is achieved by two actions: admission control of incoming requests and the drop of admitted flows in cases of serious congestion. When a new request arrives to the edge of the DS domain, before admission control a probe packet is sent through the domain. The core routers mark the header of the probe packet if they do not have enough resources to serve a new request. When the probe packet arrives to the edge, its header reflects the status of resources Turanyi, Westberg [Page 2] draft-westberg-loadcntr-01 Expires: Dec. 1999 along the path and the edges can make the admission control decision accordingly. Under normal circumstances, admission control is enough to control the load in the network. Nevertheless, when exceptional events (such as routing changes) cause too much traffic to be re-routed over a link, the resulting serious congestion may compromise the quality of all flows on that link. In that case, the correct behaviour is to drop already admitted flows to provide some quality to others. Thus, when serious congestion occurs, the core routers mark the header of all packets and by doing so notify the edges about the serious congestion. Then the edges can start dropping flows until the packet markings stop. The above elements may be augmented with an optional central entity, which collects measurements on the performance of the network and adjusts admission criteria based on long term performance measurements. This entity can also be used to implement network policies. Such a load control server may be essential to the high performance of the system. 3. Operation of Load Control The load control scheme we are proposing has two variants. 'Simple marking' refers to a measurement-based admission scheme where routers measure the actual traffic and base the admission decision on the measurements. We call the other variant 'Unit-based reservations'. This variant provides a way for sources to keep their reservations even if they generate less or no traffic. This is done in a soft state manner by periodically sending specially marked reservation refreshment packets. Simple marking and unit-based reservations are described in section 3.1 and 3.2 respectively. We assume a DS domain where connection requests arrive to the edges of the domain via either RSVP or other means. The requests may also be generated directly at the edges in a gateway, which provides connection to other types of networks, or in hosts that are connected directly to the domain. In this section, we use the gateway as an example and the IP domain is used to connect e.g. telephony networks. 3.1 Simple Marking In the following we describe how admission control is performed with simple marking and how the scheme can drop already admitted flows in case of serious congestion. 3.1.1 Admission Control Turanyi, Westberg [Page 3] draft-westberg-loadcntr-01 Expires: Dec. 1999 In what follows, we assume that core routers have some means to communicate the onset of resource exhaustion to the edges via a simple marking in the packet header. Possibile places for such a marking may be the two reserved bits in the DS field [RFC2474] or a remappable DS codepoint. See section 4.1 for further details. The basic idea of simple marking is that core routers measure the traffic and if they encounter a near exhaustion of resources, they mark passing packets, thereby notifying the gateways about the congestion. The gateways then reject incoming flow requests. The router can use any algorithm to decide when to signal the exhaustion of resources. It can send such blocking signals whenever it wishes to stop accepting new traffic. Using the above marking feature, the gateways at the edge can check the availability of resources along the path of a new flow in the following way. Before establishing the flow, the initiating gateway sends a probe packet into the network. The probe packet passes through the same routers as the actual traffic will and is exposed to the marking function of the routers. When it reaches the destination gateway, its header will reflect the status of the resources along that path. When the destination gateway receives the probe packet, it copies the indication from the header into the payload of a reverse probe packet and sends it back to the initiating party. The reverse probe packet then returns to the initiating gateway and upon receipt, its header will reflect the congestion status on the backward path. If load control is applied in a network with bi-directional flows, then the initiating gateway blocks it if either of the probe packets have been marked. If none have been marked, it assumes that there is enough capacity in the DS domain and proceeds with the flow. If the flows are unidirectional, then the initiating gateway bases its decision on only one of the directions. If the communication is unidirectional and can be blocked by the receiver, then no reverse probe packet is needed. After establishing the flow, both gateways ignore the markings in incoming packets. This means that once a flow has been established, it will not be dropped by this mechanism. Instead, other newly arriving flows are blocked. To make the scheme more robust against packet loss, the initiating gateway starts a timer, when it sends the first probe packet. If any Turanyi, Westberg [Page 4] draft-westberg-loadcntr-01 Expires: Dec. 1999 of the probe packets is lost, it simply retransmits on timeout. Selecting a precise value for the timer is not crucial. If the timer fires too late, only the flow setup latency will increase. Setting the timer value too low will cause only unnecessary retransmissions of probe packets, but the semantics of the scheme will not change. Note that this is not a new signalling protocol. Any packet will do as a forward or reverse probe packet if it has at least one bit in the payload as a placeholder for the returned congestion status. Thus, message exchanges of existing protocols can be fully reused. For bi-directional flows the forward and backward path of the flow can differ between the ingress and egress gateways. It is required however from the routing protocol that during normal operation all packets of a flow are delivered on the same path including the probe packet. Packets belonging to flows that have been rejected must either be dropped by the ingress or sent marked with another (typically best effort) DSCP. 3.1.2 Drop of Already Admitted Connections If a routing change occurs in the network either due to link failure, topology change or traffic management action, several already admitted flows may be re-routed over a link, which has insufficient capacity to carry all of them. This may result in excessive packet losses and the quality of all flows may be seriously compromised. If the core routers are able to signal this serious congestion situation to the gateways then the gateways can solve the problem by dropping enough flows to save the remaining. The natural place to drop flows is the gateway where all information (bandwidth usage, policy, etc.) is present. If a core routers detects serious congestion on one of its interfaces, it starts marking all packets leaving that interface, not only probe packets. If the egress gateway receives a marked packet with is not a probe packet (this can be determined from the packet content), it can interpret it as a sign of serious congestion along the path and it should notify the ingress gateway. The means of this notification may vary depending on the type of gateways used. In case of our example, the egress application can notify the ingress application via some type of application level signalling. Such notifications should not be sent too often to prevent the unnecessary load on the network and on the ingress gateway. When an egress gateway receives a marked packet which is not a probe Turanyi, Westberg [Page 5] draft-westberg-loadcntr-01 Expires: Dec. 1999 packet, it must somehow determine which ingress gateway to notify. How this can be done, depends on the application and is for further study. See sections 5.1 and 5.2 for examples. When we wish to use simple marking with the ability to drop already admitted flows, we need 3 codepoints: - regular packet - probe packet - marked packet In case of serious congestion both regular and probe packets are marked, while in blocking state only probe packets. If the drop of admitted connections feature is not needed, then to save codepoints non-probe packets can be sent as "marked" and the "regular" codepoint is not needed. 3.2 Unit-Based Reservations While measurement-based admission control has important advantages over non-measurement-based algorithms, it has its disadvantages as well. Unit-based reservations make it possible for the sources to keep their bandwidth reservation independently of the amount traffic generated. Although the admission scheme is very similar to the simple marking case, this is a fundamental difference. The basic idea of unit-based reservations is that sources periodically mark some of their data packets as refreshment packets. The length of the refreshment period must be globally the same in the DS domain. The refreshment packets are used to refresh the reservations in a soft-state manner. Each refreshment packet reserves one "unit" of the resources for one refreshment period. Core routers simply count the number of refreshment packets in a refreshment interval and thus estimate the number of reserved units. If the router runs out of units it goes into blocking state and starts marking probe packets. With simple marking, probe packets were used only to probe the network for free resources. In contrast, here probe packets when passed unmarked, actually does the reservation of one unit of resources for one refreshment period. Thus, after the probe packet has passed along the path unmarked, the source needs to send the first reservation refreshment packet only one refreshment period later. If a refreshment (or probe) packet is lost, for that refreshment period the unit will be reserved only upstream form the loss. The downstream routers will thus underestimate the number of reserved Turanyi, Westberg [Page 6] draft-westberg-loadcntr-01 Expires: Dec. 1999 units. Refreshment and probe packets should therefore be protected from losses as much as possible. Upon accepting a request for a unit (passing a probe packet unmarked) core routers can increment the unit number estimate immediately. If the probe packet gets marked later in a congested router, the upstream routers will not be notified immediately and keep the reservation falsely. However, in the next refreshment interval no refreshment packet will arrive for the unit, so the reservation will be released automatically. Similar to simple marking, in blocking state the core routers mark only probe packets and not regular packets, as edge devices need congestion information only for probe packets. On the other hand, in case of serious congestion core routers start marking regular packets as well. Each flow can occupy any number of units. The unit is not necessarily a simple bandwidth value, it can be defined in terms of any resource the router has. It can take the form of effective bandwidth as well. Also, in an IP telephony network it may mean one call with very well known statistical multiplexing properties. The definition of the unit may vary from application to application and is outside the scope of this document. Simplicity does not come free, as all resource requests must be expressed in terms of integer multiple of the unit. This poses certain inflexibility and requires the careful selection of the unit. By selecting a too low unit, many probe packets are needed for most flows. Selecting a too large unit will result in resource waste. A partial solution to the problem can be found in section 3.5. When we wish to use unit-based reservations, we need 4 codepoints: - regular packet - probe packet - marked packet - refreshment packet 3.3 Coexistence with Non-Load Control Capable Devices If the load-control domain contains core routers that are not load control capable, the scheme continues to work. The only requirement put on such routers is not to change the DS field of real-time packets. Naturally, no control can be exercised over those links that are connected to an interface of a non-load control capable router. However, if these links are overprovisioned enough, then QoS will not be compromised. Turanyi, Westberg [Page 7] draft-westberg-loadcntr-01 Expires: Dec. 1999 Edge devices must be load-control aware in order to be able to check resource availability and refuse flows or to return probe packets. However, if a given edge device will never participate in real-time traffic, there is no need for load control functionality in that edge device. Thus, only those edge devices need to be load control aware that really use it. 3.4 Admission Precision of Simple Marking Simple marking is basically a measurement-based admission control scheme, where flows do not say anything about their traffic. In addition, flow departure is not signalled explicitly. When the network carries more types of flows with different bandwidth requirements, the core routers do not know the bandwidth requirement of the incoming flows. They simply declare if they accept more flows or not irrespective of the bandwidth demands of the new flow. Thus the marking algorithm in the routers should conservatively expect always the largest type of flow the network carries and start rejecting when there is not enough bandwidth left for one such flow. On the positive side, this will result in fair rejection among different flow types, but on the negative side, some bandwidth will be wasted. However, if the links of our domain can carry at least several hundreds even from the most bandwidth demanding type of flow, then this is not a significant waste. For example, if our network carries voice (16kbit/sec), real-time gaming (90kbit/s), music (256 kbit/sec) and video (1Mbit/sec) flows, then the waste due to this conservative approach will be at most the size of a video flow, which is less than 1% on an OC-3 interface. If the links cannot carry that much from the most demanding type of flow, then there is no need to the scalability benefits of load control and more stateful approaches can be used for that type of flows (e.g. RSVP). When a core router declares the acceptance of a new flow (passing a probe packet unmarked), it has two options. Either it corrects its measurements to take the newly admitted flow into account while its traffic reaches the router, or it does not make any correction. In the latter case, the router may falsely accept some flows between the time of the acceptance and the arrival of actual flow data. Numerical examples are provided in Appendix A to illustrate the magnitude of this error. In the former case corrections are used to prevent the above mentioned false admissions. However, it is hard to determine the amount of correction to add and also the time till which the Turanyi, Westberg [Page 8] draft-westberg-loadcntr-01 Expires: Dec. 1999 correction lasts. One conservative approach would be to count on the largest type of flow and correct accordingly. The timeframe could be the typical round-trip-time edge-to-edge. Also if the probe packet is marked later on the path, then the flow will be rejected and no correction would have been necessary. This error is small, however, if the blocking in the network is a few percent only. Corrections also require more processing in core routers per probe packet. Note that this problem does not exist with unit-based reservations, as probe packets in that case actually reserve one unit of resources for one refreshment period. 3.5 Class of Traffic Both in the case of simple marking or unit-based reservations we can convey further information to the core routers by dividing real-time traffic into classes and by using different DS codepoints for different classes. If the DSCPs denote not only the PHB the flow shall receive, but the bandwidth demand of the flow as well, core routers may mark packets more intelligently, resulting in less resource waste or greater flexibility. In the simple marking case we have two options. First, different acceptance levels can be used for the different classes and each probe packet is marked according to the acceptance level of its class. Thus, the operator does not have to set the acceptance level to a value conservative enough to fit the largest type of flow. Each class may have its custom value depending on the resource requirements of the class. Second, the router can separately measure the traffic in each class and make marking decisions depending on the resource usage in the class of the probe packet. This makes it possible to limit the resource usage on a per class basis. If the markings are implemented by the use of different DSCPs, then "regular" and "marked" packets in all classes can use the same DSCP making the total DSCP need of the scheme to 2+1/class or 1+1/class if regular packets are not used. If markings are implemented in the two unused bits of the DS field, then the DSCP need is 1/class. For these implementation options see section 5.1. In the unit-based case the major benefit is that the size of the unit can be different in different classes, making it possible to allocate resources with finer granularity. If the markings are implemented by the use of different DSCPs, then "regular" and "marked" packets in all classes can use the same DSCP making the total DSCP need of the scheme to 2+2/class. Again, if markings are implemented in the two unused bits of the DS field, then the DSCP need is 1/class. Turanyi, Westberg [Page 9] draft-westberg-loadcntr-01 Expires: Dec. 1999 4. Implementation Issues 4.1 Codepoints For the different variations of load control -simple marking (with or without dropping of admitted flows) and unit-based reservation-, we need 2, 3 and 4 codepoints per PHB respectively. The total number of codepoints needed depends on the number of real-time PHBs to use load control width. (See section 3.5.) For the encoding of those codepoints into the DS field, we have two options. The first is to use different DSCPs from the 64 possible values of the lower 6 bits for each load control codepoint needed. Thus if the simple marking is used for the EF PHB, then we need 3 DSCPs: EF-regular, EF-probe and EF-marked. In this case the bit patterns used for these in a given domain are completely independent from each other and follow all the rules specified in [RFC2474]. Another option is to use the two currently reserved bits in the DS field in conjunction with the DS codepoints that denote PHBs for real-time traffic. The 4 values covered by the two bits can be assigned to load control codepoints. The interpretation of the two bits may remain unspecified for other codepoints, not to prevent a possible ECN deployment. [RFC2481] The proposed default bit value assignment is the following. DS field load control 01234567 codepoint ----------------------- xxxxxx00 regular xxxxxx01 probe xxxxxx10 marked xxxxxx11 refreshment 4.2 Behaviour of the Core Routers Irrespectively of the application of the load control scheme, on the protocol level core routers need to behave the same. This behaviour is specified below for both variations. 4.2.1 Simple Marking The router continuously keeps a state if it wants to accept more flows or not. The means of determining this state is out of the scope of this document. If the state is accepting, the router passes all packets unchanged. If the state is rejection, then the router changes the marking of incoming packets with "probe" to "marked". All other Turanyi, Westberg [Page 10] draft-westberg-loadcntr-01 Expires: Dec. 1999 markings remain intact. If the router is capable of determining serious congestion state and the "regular" codepoint is allocated, then upon detecting serious congestion, the router marks both "regular" and "probe" packets as "marked". 4.2.2 Unit-Based Reservations The router uses the "refreshment" and "probe" markings in packets to maintain its estimation of reserved resources. A refreshment packet signals already admitted resource usage, while probe packet signals a new request. When passed unmarked, both reserve one unit for one refreshment period. If the state of the router is accepting, then it passes all packets unchanged. If the state is rejection, it marks all incoming "probe" packets as "marked" and leaves all other packets unchanged. If the router is capable of detecting serious congestion, and it happens, then the router marks both "regular" and "probe" packets as "marked". The router always leaves the marking in "refreshment" packets unchanged. 4.3 Behaviour of the Edges The behaviour of the edges highly depends on the application or signalling protocol that uses the load control scheme. Below we only describe few elements of the edge behaviour that are necessary for interworking with the core routers. 4.3.1 Ingress Behaviour 4.3.1.1 Simple Marking Probe packets sent by the ingress should be marked as "probe". If the "regular" codepoint is allocated then other packets should be marked as "regular" otherwise as "marked". 4.3.1.2 Unit-Based Reservations The ingress should generate the required number of refreshment packets during the refreshment period for each flow it has. If there are not enough data packets to mark as "refreshment", then the ingress must generate dummy packets and mark those. The emitted "refreshment" packets should be as uniformly distributed through the Turanyi, Westberg [Page 11] draft-westberg-loadcntr-01 Expires: Dec. 1999 refreshment interval as possible to minimise the errors of clock rate differences between devices. When a new reservation is needed, the ingress should send the appropriate number of packets marked as "probe". Otherwise all data packets should be sent as "regular". 4.3.2 Egress behaviour 4.3.2.1 Simple Marking If the egress receives a probe packet which is "marked", it means that the network has insufficient capacity along the path between the ingress and egress. The egress should take care of blocking the call. (Block the call itself or notify the ingress.) If the "regular" codepoint is used and the egress receives a "marked" packet, which is not a probe packet, it shall start dropping flows if that feature is enabled. If it cannot drop flows, it should notify the ingress to do so. The egress has nothing special to do with packets marked "regular". 4.3.2.2 Unit-Based Reservations If the egress receives a packet marked as "probe", it shall take care of accepting the flow belonging to the probe packet. If the egress receives a "marked" packet, and it is a probe packet, the egress shall take care of rejecting the flow belonging to the probe packet. If it is not a probe packet, then the egress shall start dropping flows. If it cannot drop flows, it should notify the ingress to do so. The egress has nothing special to do with packets marked "regular" or "refreshment". 5. Example Applications Below we give some example applications of the load control scheme to illustrate its operation. 5.1 Media Transport Network In this section we discuss a set of applications, key property of whose is that the edge nodes are the sources and destinations of the real-time IP packets. That is, they are not only routers that forward Turanyi, Westberg [Page 12] draft-westberg-loadcntr-01 Expires: Dec. 1999 packets from other sources into the DS domain or forward packets to other destinations from the DS domain. The set of egress nodes then includes hosts as well as media gateway like devices. The application that is used as the example in section 3 is one such application. The egress and ingress nodes are assumed to unit common per flow state. In some cases there is some form of direct signalling between the nodes, such as SIP (without proxies). Packets of that signalling can be used as probe packets. However, if such direct signalling is not in place, then the nodes must generate dummy packets to act as (forward or reverse) probe packets. Specially treated ICMP Echo request and reply packets would suffice for example. In this context, the egress node can always identify the ingress node by looking at the source IP address of the incoming packets. This makes it very easy for the egress to notify the ingress about successful or failed probe packets or serious congestion along the path. 5.2 Interworking with RSVP/Intserv Load control can also be used in diffserv regions (backbones) that connect RSVP/Intserv regions. Such an interoperation is described in detail in [Bernet99]. Throughout this section the definitions and terms of that document are used. For load control to be able to operate, border routers of the diffserv region must be RSVP-aware to detect the arrival of new connections. No RSVP functionality is needed inside the diffserv region, however. PATH messages can be used as probe packets to gather congestion information along the path between the two border routers. When a new RSVP path state is installed at the egress border the collected admission state of the path (collected in the packet of the PATH message) is also stored. If a RESV message for the installed state arrives within a time period while the congestion state can be considered valid, then the egress border can perform the admission control for the diffserv network as well. If the first RESV message arrives too late, then the egress border must solicit a new (dummy) probe packet from the ingress to determine the current congestion state. When the egress receives a "marked" packet, which is not a PATH message nor a dummy probe packet, this signals the serious congestion state along the path. The identity of the ingress router can be easily determined form the path state, but in this case the egress can itself decide on the drop of certain reservations. The ingress can be notified via ResvTear messages while the receiver end systems get ResvErr messages. Turanyi, Westberg [Page 13] draft-westberg-loadcntr-01 Expires: Dec. 1999 If unit-based reservations are used in addition to the above, the ingress router is responsible for generating refreshment marks in the data packets of the flows at the appropriate rate. 5.3 VPN Unit-based reservations can also be used to provision resources in a DS domain that is used to provide VPN "tunnels" between customer sites. By the use of the load control scheme, it is easy and fast to modify the size of these tunnels, thus tunnel size selection can be a very dynamic process. Note that tunnels are not necessarily real-time tunnels, packets of any DSCP can travel on them, receiving the appropriate PHB. Even best-effort tunnels can be reserved this way. Provisioning can be done on a per DSCP basis or in aggregates as the service provider wishes. This way the service provider, instead of using statistical provisioning or a central bandwidth broker functionality, solves the problem of provisioning in a distributed way. Load control can take routing changes fast into account, prohibiting the extension of tunnels when due to failures there is not enough capacity along the path, even though under normal circumstances there were. 6. Security Considerations We are proposing to use bit markings in packet headers (DS field) to reserve resources within a diffserv domain. This poses similar security problems as the use of the DS field to differentiate packets in general. [RFC2475] If the interior of the DS domain fully contains a tunnel, then by copying the outer marking into the inner header at de-capsulation, load control can be exercised over the links of the tunnel as well. The procedure is similar to the one described in [RFC2481]. As IPSec [RFC2402, 2406] does not allow the copying of the DS field from the outer to the inner header at de-capsulation, load control cannot be exercised over regions where IPSec tunnels are used. 7. Multicast TBW Appendix A. Effect of Delays on Admission When passing a probe packet unmarked without correcting our estimate of the free resources, we in fact admit a flow without immediately reserve resources for it. The reservation will be implicitly done later by the arriving traffic or refreshment packets of the flow. Turanyi, Westberg [Page 14] draft-westberg-loadcntr-01 Expires: Dec. 1999 During the time between admission and the arrival of the traffic of the flow, new requests can be admitted without taking the previously admitted flow into account. To illustrate the effects of this delay we took an old and simple Markovian example. Flows are identical with an average flow holding time of 180 seconds and flow arrivals and departures follow a Poisson process. Let the link be able to carry N calls and let the delay be T. The link starts refusing flows when the measured traffic exceeds N-H calls. We can say that space of size H is put aside to cater for the errors caused by the delay. If the link is properly dimensioned, then usual blocking ratio should not exceed 1%. However, in a mass call like situation (such happens at new year's eve for example) it can be considerably higher. In this example 50% blocking was chosen to demonstrate the extreme load case, thus offered traffic is roughly twice the link capacity. QoS violation occurs if during time T the difference between the number of arriving and departing flows is larger than H. Under the above assumptions the chance of QoS violation can be calculated. Naturally the larger H is the less the chance that QoS violated is. The required value of H can be determined for a low value of QoS violation probability (e.g. 10e-5). The following table presents the value of H as a function of link size (N), delay length (T) and load (causing 1% or 50% blocking). | 1ms | 10ms | 100ms | 500ms | 1s | | 1% | 50% | 1% | 50% | 1% | 50% | 1% | 50% | 1% | 50% | ------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+ 50 | 2 | 2 | 2 | 3 | 3 | 4 | 4 | 5 | 5 | 7 | 100 | 2 | 2 | 3 | 3 | 4 | 4 | 4 | 7 | 6 | 9 | 500 | 2 | 3 | 3 | 4 | 4 | 7 | 9 | 13 | 12 | 18 | 1000 | 3 | 3 | 4 | 4 | 5 | 9 | 12 | 18 | 16 | 25 | 5000 | 3 | 4 | 5 | 7 | 12 | 18 | 24 | 44 | 33 | 69 | 10000 | 4 | 4 | 7 | 9 | 16 | 25 | 33 | 69 | 47 | 113 | The amount of required safety margin is the highest for small links as less statistical multiplexing is possible there. Appendix B. A Simple Algorithm for Core Routers In this appendix, we present an algorithm for core routers, which use unit-based reservations. The algorithm is simple, so it can be easily implemented in hardware by simple counters. Its inputs are the refreshment interval and the number of flows allowed on the link. The latter is denoted by . (We assume flows with similar characteristics (e.g. voice) and that one flow sends one refreshment packet per refreshment interval.) If the network uses more DSCPs for Turanyi, Westberg [Page 15] draft-westberg-loadcntr-01 Expires: Dec. 1999 real-time traffic, then a separate copy of the algorithm may be run for each DSCP, resulting in per DSCP admission. The algorithm counts the number of refreshment and admitted probe packets in refreshment intervals (). The result of the counting is an upper bound on the number of units reserved on the link, as some reservations may have gone by the end of the refreshment interval. The value of this counter is used in the next interval to decide on admission (). When a new reservation is admitted this value is increased to take the new reservation into account. If this value is high above the admission limit, then we start sending serious congestion notification by marking regular packets as well. On initialisation: last = 0 count = 0 On arrival of a refreshment packet count++ On arrival of a probe packet if last < threshold then last ++ count ++ elseif Mark Packet endif On arrival of an regular packet if last < treshold*1.1 then Mark Packet endif On the end of the refreshment interval last = count count = 0 Appendix C. Simulation Results The purpose of the simulations described in this appendix is to give some insight of the performance of load control. The simulation cases are by no means representative and the scheme may work differently in other situations. In section C.1 the simple marking case is demonstrated with a pure measurement-based admission algorithm by using a single link with both constant bit-rate and on/off sources. In appendix C.2 the unit-based reservation method is shown, using the algorithm in appendix B. Turanyi, Westberg [Page 16] draft-westberg-loadcntr-01 Expires: Dec. 1999 The serious congestion signalling is not used in any of the examples, only admission control. We simulated a very simple network of one link. This can be viewed as the single bottleneck in the domain. The link had a 2 Mbit/s throughput, 50% of which was designated to carry real-time traffic. The round trip propagation delay was set to 100ms. The real time flows arrived according to a Poisson process, holding time was exponential with 90 second mean. The arrival rate of flows was set to produce approximately 50% blocking. Only real-time traffic was simulated, so scheduling was simple FIFO. C.1 Simple Marking C.1.1 Constant Bit-Rate Sources In the first case, flows emitted 40 byte long packets in every 20 ms, producing a constant 16 kbit/s load. The 1 Mbit/s capacity assigned to this traffic can thus carry 62.5 flows. From the table in appendix A., we can see that 4 calls should be reserved in addition to the 62.5. After an initial transient of 5 minutes, we simulated 2.5 hours. During the 2.5 hour simulation time, utilisation was measured over 5-minute intervals. Utilisation was also measured in 20ms slots and the percentage of slots, in which it was above 1.064 Mbit/s (66.5 calls) was counted. min/avg/max of the utilisation was: 881 / 899 / 914 kbit/s min/avg/max of the violation ratio was: 98.96% / 99.78% / 100% C.1.2 On/Off Sources In the second simulation case on/off sources were used. During an "off" period no packets were emitted, while in the "on" state the behaviour is the same as in the previous case: 40 byte long packets 20 ms apart. The distribution of the on and off periods were both drawn from a pareto distribution with the shape parameter of 1.1 and mean of 5 seconds. The average bit-rate of the sources is thus 8 kbit/s. The flow arrival rate has been doubled to produce ~50% blocking, as the link is capable of carrying nearly twice the number of flows. The same set of measurements was carried out as in the previous case. min/avg/max of the utilisation was: 808 / 819 / 837 kbit/s min/avg/max of the violation ratio was: 98.98% / 99.40% / 99.70% It can be seen that although the measurement-based approach was not Turanyi, Westberg [Page 17] draft-westberg-loadcntr-01 Expires: Dec. 1999 able to prevent the overuse of the real-time resources in this high load case, it is a viable alternative. In no cases have the 20 ms measurements exceed 1.15 Mbit/s, so the overuse means just a temporary steal from the resources provisioned to the lower priority traffic. C.1.3 The Router Algorithm The mbac algorithm used by the router is presented here only for the completeness of simulation description. The marking strategy was the same for both types of traffic. The router counts the number of bytes transmitted in every 20 ms interval and calculates the average bit rate in these 20 ms slots. Then it smoothes these values in time through an exponentially weighted moving average (ewma) filter. The window size of the ewma was set to 9 seconds, i.e. letting a unit step function through it, the output will be 0.63 after 9 seconds. The algorithm also calculated the histogram of the difference between the original slot values and the filtered values. The histogram has been counted in 1000 bins between the range of -1 and +1 Mbit/s. The 99% quantile of the histogram was calculated in every 100 seconds. The router marks all passing packets if the sum of the output of the ewma filter and the calculated quantile is greater than 1 Mbit/s. The router makes no correction to its measurements when a new flow is accepted. Thus, the target violation probability was set to 1%, which was in fact fulfilled on the long run. On arrival of a new packet only counters are incremented. In every 20 ms a new value for the ewma must be calculated, a marking decision must be made for the next 20 ms and the value of one bin in the histogram must be increased. In every 100 seconds the 99% quantile value must be looked up in the histogram and the histogram must be initialised. The interested reader can read more about the design rationale of the above algorithm in [Gross99]. C.2 Unit-Based Reservations In this section we demonstrate the unit-based reservation scheme. The routers use simple algorithm in Appendix B, except that it never marks regular packets. The simulation setup is otherwise the same as in the previous section. The traffic inside the flows does not influence the admission algorithm, so during simulation sources send only probe and refreshment packets. The definition of the unit is a peak bit-rate of 16 kbit/s. The flow number threshold was set to 62 flows resulting in near the same target utilisation of 1Mbits/s as in Turanyi, Westberg [Page 18] draft-westberg-loadcntr-01 Expires: Dec. 1999 appendix C.1. The length of the refreshment period was changed between 100 ms and 10 seconds. The actual number of flows on the link never exceeded 62 (no violation), so only the utilisation values are shown in kbit/s. | interval | min | avg | max | +----------+-----+-----+-----+ | -- | 968 | 972 | 976 | | 100 ms | 952 | 954 | 959 | | 1 sec. | 941 | 946 | 949 | | 2 sec. | 927 | 933 | 936 | | 4 sec. | 908 | 913 | 920 | | 7 sec. | 861 | 870 | 879 | | 10 sec. | 827 | 837 | 852 | The first line shows the utilisation value for the case when the source limits itself to 62 flows, that is blocking is not done by the network, but by the source. This emulates the case when the refreshment period is infinitely short or when a stateful approach is used, like RSVP. The utilisation is not 100% due to the burstiness of the arrivals. It can be seen that as the refreshment packets gets less frequent more resources are wasted, as the resources allocated to departing flows remain allocated till the end of the next refreshment period. The result is not only lower average utilisation, but lower maximal utilisation as well. When the refreshment period is 10 seconds long, the highest utilisation experienced was 952 kbit/sec, which is by 3 units below the limit. This motivates the use of an as short refreshment period as possible. However, too short refreshment period will increase the effects of clock differences between edge and core devices (which was not taken into account during simulation). It also decreases the chance of finding a packet to mark as refreshment if the flow is currently transmitting below its reserved rate. References [RFC2481] Ramakrishan, K., Floyd, S., "A Proposal to add Explicit Congestion Notification (ECN) to IP". RFC 2481, January 1999 [RFC2474] Nichols, K., Blake, S., Baker, F. and D. Black, "Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers", RFC 2474, December 1998. Turanyi, Westberg [Page 19] draft-westberg-loadcntr-01 Expires: Dec. 1999 [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z. and W. Weiss, "An Architecture for Differentiated Services", RFC 2475, December 1998. [RFC2205] Braden, R., Zhang, L., Berson, S., Herzog, S., Jamin, S., "Resource Reservation Protocol (RSVP) Version 1 Functional Specification", RFC 2205, Proposed Standard, September 1997 [RFC2402] Kent, S. and R. Atkinson, "IP Authentication Header", RFC 2402, November 1998. [RFC2406] Kent, S. and R. Atkinson, "IP Encapsulating Security Payload (ESP)", RFC 2406, November 1998. [Bernet99] Bernett, Y., Yavatkar, R., Ford, P., Baker, F., Zhang, L., Speer, M., Braden, R., "Interoperation of RSVP/Intserv and Diffserv Networks", Internet Draft, March 1999 [Stoica99] Stoica, I., et al "Per Hop Behaviors Based on Dynamic Packet States", Internet Draft, February 1999 [Berson97] Berson, S. and Vincent, R., "Aggregation of Internet Integrated Services State", Internet Draft, December 1997. [Guerin97] Guerin, R., Blake, S. and Herzog, S.,"Aggregating RSVP based QoS Requests", Internet Draft, November 1997. [Gross99] Grossglauser, M., Tse, D. N. C., "A Time-Scale Decomposition Approach to Measurement-Based Admission Control", Infocom '99 Authors Address Lars Westberg Ericsson Research Kistagangen 26 SE-164 80 Stockholm Sweden EMail: Lars.Westberg@era-t.ericsson.se Zoltan R. Turanyi Ericcson Telecommunications Budapest, Laborc u. 1 H-1037 Hungary EMail: Zoltan.Turanyi@eth.ericsson.se Turanyi, Westberg [Page 20]