Internet Engineering Task Force Haitao Wu Internet Draft Keping Long Expires: February 2001 Shiduan Cheng Beijing Univ. of Posts & Tele. Jian Ma Nokia China R&D Center August 2000 A Direct Congestion Control Scheme for Non-responsive Flow Control in Diff-Serv IP Networks Status of Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. For potential updates to the above required-text see: http://www.ietf.org/ietf/1id-guidelines.txt Abstract This draft considers the potentially negative impacts of an increasing deployment of non-congestion-controlled or non-responsive traffic on the Internet. Traffic unresponsiveness could bring extremely unfairness against responsive TCP traffic and great service degradation. Differentiated Services (DS)[5,6], which has been proposed by IETF recently, aims to provide a scalable service differentiation in the Internet that can be used for differentiated payment. We argue that this could add the incentives of their customer to use unresponsive flow to achieve better service assurance against the competing traffic. We argue that the responsiveness should not be identified by its transport protocol, buy its reaction behavior to congestion in networks. However, the TC at the boundary has no idea of the dynamic Wu, Long, Cheng, Ma Expires: February 2001 [Page 1] Draft-wuht-diffserv-dccs-00 August 2000 traffic conditions in the DS network. To remedy these problems, we create a general direct congestion control scheme trying to regulate the traffic conditioner at the boundary of DS domain in order to provide fairness between responsive and unresponsive traffic by congestion information generated at the interior node. This mechanism enables a network provider to control the traffic entering the DS domain more powerfully. Therefore, a better resource utilization and a fair resource sharing between different traffic types can be achieved. The pdf version of this document is available at: http://wuht.topcool.net/publications.htm 1.Introduction In the traditional IP network model, all user packets compete equally for network resources and cannot achieve Quality of Service(QoS) guarantee. With the development of new applications of Internet, such as voice, video and www, the desire of Quality of Service (QoS) becomes more and more strong. An architecture for Differentiated Services, recently proposed by IETF, provides a scalable means to deliver IP QoS based on handling of traffic aggregates. Traffic Classification state is conveyed by means of IP-layer packet marking using the DS field. Packets are classified and marked to receive a particular per-hop behaviour(PHBs) on nodes along their path. Sophisticated classification and traffic conditioning, including marking, policing, and shaping operations, need only be implemented at network boundaries or hosts. Within the DS domain, core router forward packets according to the DSCP value in the packet header. A detailed description of Diff-serv is given in its architecture[6] and DSCP documents[5]. Comparing to Inte-serv, Diff-serv is more scalable in terms of implementation. It is achieved by handling aggregated traffic using a number of PHBs within the core network rather than on a per-flow basis. However, many recently studies have shown that there still exist great unfairness between responsive traffic flows and unresponsive flows in a Diff-serv network. [2,3,4,10,11] To remedy this problem, we introduce a general direct congestion control scheme trying to regulate the traffic conditioner at the edge of a DS domain, in order to provide fairness between responsive and unresponsive traffic flows. In addition, this scheme follows the original idea of Differentiated Services network, i.e., simplify the operation of core router and make the edge router smart to control traffic entering the DS domain. 2. Motivation and Related Works Wu, Long, Cheng, Ma Expires: February 2001 [Page 2] Draft-wuht-diffserv-dccs-00 August 2000 By now, IETF only define two set of PHBs [7,8], i.e, Expedited Forwarding (EF) PHB and Assured Forwarding (AF) PHBs. Since EF build a low loss, low latency, low jitter, assured bandwidth, end-to-end service through DS domain, it needs strict policing and shaping. Therefore, an EF customer can't affect another much as long as there is enough resource in the DS network to support all EF requirements. While the definition of AF is much more loose, in fact, there are no quantitative requirements for AF PHB. There are 4 independent AF classes, and 3 drop precedence level in each class. An IP packet that belongs to an AF class x and has drop precedence y is marked with AF code point AFxy. This draft will use DP0 to specify the drop precedence value with lowest drop probability and DP2 to specify the drop precedence with highest drop probability within an AF class. Recent study has shown that under various conditions, existing Diff- serv mechanism may have problems of unfairness and inefficient resource utilization, thereby failing to achieve the desired QoS [4,11,12]. Much study has been done to alleviate the unfairness between responsive TCP and non-responsive UDP by mapping TCP and UDP in and out of profile packets to different drop precedence in an AF class. In [11], Seddigh points out that in an over-provisioned network, the share of excess bandwidth is dependent on the mapping of out-of- profile packets, while in an under-provisioned network, fair degradation for TCP and UDP can not be achieved by different drop precedence. While Goyal [4] argue that fair allocation of excess network bandwidth between congestion sensitive and insensitive flows can be achieved if packets are 'colored' differently, but if the network operates close to its capacity, even three drop precedence or color can never achieve fairness. Seddigh[11] also suggests that if TCP and UDP are put to separate queues or AF classes, they may coexist fairly. But we argue that since the core router in a DS domain has no idea of the reservation bandwidth of current TCP and UDP flows, and the traffic is dynamic in nature, it can't decide how to allocate link bandwidth to them. Besides, there is no isolation of flow inside the core of the DS network, even the core router knows the reserved bandwidth of the aggregates, it can't judge how much resource a flow of the aggregate should receive. Although some forms of call admission control (CAC) mechanism may help alleviate the problems, we argue that CAC is only a necessary but insufficient requirement. Since the problem is associated with dynamic feature of network load and network capacity, and the reaction of different transport protocols to congestion, which is indicated by packets loss, only a dynamic control mechanism at DS boundary can solve this problem radically. Chow[12] points out these problems are caused by there is no dynamic control at the diff-serv boundary, and network rely on transport protocol to react. Besides, he propose a framework similar to the Resource Management cells in ATM networks, in which boundary Wu, Long, Cheng, Ma Expires: February 2001 [Page 3] Draft-wuht-diffserv-dccs-00 August 2000 periodically obtain information from the core of the network at update their TCP by those information. The main drawback of this scheme is that core router needs to maintain all the state information and the boundary router sends probe traffic periodically. However, it's a great progress since the boundary can adjust their TC according to the traffic dynamics in the original DS network. We argue that the responsiveness of a flow can not identified by its transport protocol, but its behavior or reaction to congestion in the DS network. Therefore, the boundary of DS needs additional dynamic information reporting the behavior of un-responsive or un-TCP- friendly flows from the interior network to regulate its TC. To remedy these problems, we propose a direct congestion control scheme for controlling non-responsive or non-TCP-friendly flows in DS network to achieve fairness between responsive flows and non- responsive flows. 3. Direct Congestion Control Scheme (DCCS) In this draft, we use core router to mean an interior node of a DS domain, which performs packets forwarding to implement a particular PHB according to DSCP in the packet head. And we use edge router to mean the ingress/egress node of a DS domain which performs TC functions to traffic entering the DS domain, it could be a host if it can perform TC function for its traffic entering the DS domain and it is connected to an core router directly. We create a general direct congestion control scheme to overcome the issues mentioned in preceding sections. The basic concept of our DCCS is that when some kind of PHBs packets are in congestion, core routers generate congestion control message and send it to edge routers directly, and the edge routers will regulate the Traffic Conditioner (TC) of the corresponding aggregates or flows adaptively according to the control message received. When congestion occurs at edge routers, it can adjust its TC directly and no control messages are generated. This mechanism is in conform with the essential idea of Diff-serv, that is, push the complex conditioning functions to the edge of the network, and make the core routers do forwarding according to the DSCP marked by edge routers to implement a particular PHB. The following figure shows the prototype of DCCS. Figure 1. Prototype of Direct Congestion Control Scheme (DCCS) 3.1 Core router requirements In addition to the basic packet forwarding function, a core router is extended to include a load monitor function. This won't add many Wu, Long, Cheng, Ma Expires: February 2001 [Page 4] Draft-wuht-diffserv-dccs-00 August 2000 overheads since most core router will use RIO[2,3] or Multi-RED[11] to implement different drop precedence according to current recommendations. In addition, RED[1] is the active queue management mechanism recommended by IETF, most router currently used have realized this mechanism. According to IETF RFC, different independent PHBs will likely be implemented by different queues. While an AF class with multiple drop precedence will be implemented by a single FIFO queue with RIO or Multi-RED enabled. This will simplify our scheme since RED is sensitive to incipient congestion and packets will be marked with a probability according to current average queue length. Therefore, the core routers can send congestion control message according to current load. When core routers are in congestion for lowest drop precedence packets, e.g. AFx0, a congestion control message packet for an aggregate or flows should be generated, and the message should be sent to the edge routers which perform the TC function to the corresponding aggregate or flows, which depend on the granularity of implementation. Upon receiving such a control packet, other core routers should forward it to the ingress edge router as a network control message. It may be argued that this mechanism will add overhead to core routers in a DS domain. But we believe since it is only generated in congestion and it can direct control the TC at the boundary routers adaptively, it will alleviate congestion powerfully. Therefore, considering its effectiveness, this cost is very worthy and the granularity of implementation can be adjusted to alleviate overhead. 3.2 Edge router requirements An boundary router generally perform TC functions to ensure that the traffic entering a DS domain conform to the rules specified in the TCS, in accordance with the domain's service provision policy. In our DCCS, we propose to make this TC functions dynamically adapt to the control message from the core routers. The general rules of the Dynamic TC (DTC) is: (1) Under a normal situation, DTC performs the TC functions normally; (2) When it receives a congestion control message from core router, it should adjust its TC functions of the corresponding aggregates or flows dynamically. (3) If it use a mechanism which could distinguish micro-flows in an aggregate and treat them fairly, it can regulate a micro-flow directly, not the whole aggregate. (4) An edge router should distinguish a control message whose destination is any hosts or networks accessing the DS domain by this router itself and should terminate this control message. Wu, Long, Cheng, Ma Expires: February 2001 [Page 5] Draft-wuht-diffserv-dccs-00 August 2000 3.3 Congestion control message creation and handling A congestion control should be distinguishable from other packets in a DS network. There are several possible alternatives exist for such a control packet that it could be identified easily: (1) Using a special DSCP, in which control information is carried in the data field of the packet; (2) Using a new IP option, in which a special extension is defined for control message; (3) Using an ICMP packet containing the control information; (4) A bit in IP header to indicate that this is a congestion control message. If the control message will traverse from a DS domain to another DS domain, further negotiation will be needed to assure the effectiveness of the control message between the two DS domains. Otherwise, if a DS domain does not want to corporate with another domain on such control message, it can drop it at the edge simply. Then the other domain could adjust the TC of itself without affections to other domains. We strongly recommend that it should be given a special DSCP when this scheme is implemented in a DS domain. A core router could distinguish such a message more clearly and forward it as a network control message. In addition, an edge router could identify such a control message more easily. 3.4 Congestion control message Various fields can be carried in such a control message. It should include following fields: (1) Version, since we don't have a consistent implementation, we need a identifier to indicate such an implementation; (2) PHB in congestion, it should indicate which PHB packets are now in congestion; (3) TS, a timestamp or a sequence number, this field is used to identify a control message from a particular core router when multiple core routers are in congestion simultaneously; (4) Granularity, it could be to aggregate or flows; (5) Other identifiers, this field is determined by granularity; Wu, Long, Cheng, Ma Expires: February 2001 [Page 6] Draft-wuht-diffserv-dccs-00 August 2000 (6) Control power, it indicate the DTC should adjust it functions to which extend. Or the core router could send current loading information as control power. 3.5 Dynamic Traffic Conditioning (DTC) A DTC should perform normal functions with no congestion indication. When it receive an congestion control message, it should adjust the TC parameters of corresponding aggregate or flows according to the received message. One way to implement is using two sets of parameters: one is a set of the original TC profile parameters that is static; the other is a set of supplementary parameters changing with control message. If no control messages received, the supplementary set should be identical to static one. If no further control message received after it has adjust its supplementary parameters, it should change the supplementary ones to the static ones step by step. DTC parameters include parameters for shaping, dropping and marking. 4. An implementation example By now, IETF only define two set of PHBs, i.e., EF PHB and AF PHBs. Since EF needs strict policing and shaping. Therefore, an EF customer can't affect another much as long as there is enough resource in the DS network to support all EF requirements. While AF has no quantitative requirements, so a customer may exceed the subscribed profile with the understanding that the excess traffic is not delivered with as high probability as the traffic that is within the profile. But when the network are in congestion, responsive TCP flows will decrease its window size and back-off, but unresponsive UDP flows will continue sending packets, which lead to great unfairness between TCP and UDP flows. Core router should care about the packet drops of DP1 and DP2, since the customer use it to grab excess network resource. But packet drops of DP0 means a very possible incipient congestion, and the core router should generate congestion control message when lots of DP0 packets are being dropped. Since a normal TCP will use back-off and slow start responding for packet drops, we believe that the end-to- end congestion control of TCP is enough, so we only generate congestion control message for non-responsive flows. But there is an issue need to be solved first, that is, how to identify an un- responsive or not-TCP friendly aggregate or flow. Core router: Wu, Long, Cheng, Ma Expires: February 2001 [Page 7] Draft-wuht-diffserv-dccs-00 August 2000 A core router should implement RIO or Multi-RED or similar mechanism for different drop precedence. A core router should have a table to store information about dropped DP0, DP1 and DP2 packets, which may include following field: +--------+------+----------+------+------+-------+-------+-------+ | Source | Dest | Transport| Time | AF | DP0 | DP1 | DP2 | | IP | IP | Protocol | Stamp| Class| Counts| Counts| Counts| +--------+------+----------+------+------+-------+-------+-------+ Where the three preceding fields is used to identify the flows. These three fields are determined by the granularity of scheme. Time stamp records the last time when the sets have been updated, and after a specified period (Tupd), this row should be cleared. Counts record the dropped packet number. Tupd should be a configurable parameter. When a AF packet dropped, it searches the table and uses the information of currently dropped packet to update the table. It should generate a congestion control message with a probability (pcc), which is determined by the Counts and current load. The congestion control message created by core router uses a special PHB to identify itself. The Source IP and Dest IP of this message are Dest and Source IP of the dropped packet respectively. The data field of this message include following: +--------+-----------+-------+------------+--------------+--------+ | Version| PHB in | TS/SN | Granularity| Transport | Control| | | Congestion| | | Protocol Type| Power | +--------+-----------+-------+------------+--------------+--------+ The meaning of these field have been explained in preceding section. In this example, we use granularity of flow identified by Source IP, Dest IP and Transport Protocol type. Control power is determined by dropped number of corresponding flow and current load. Normally, Control power is set to 1 and increases linearly when congestion is severer. Edge router: An edge router should perform DTC functions mentioned in preceding section. When it receive a control message, it should adjust the TC parameters of the corresponding flows dynamically. When a flow traverses multiple core routers that are occurring congestion, the edge router where this flow accesses the DS domain will receive congestion control message from different core routers. The routers sending control message can be identified by the TS/SN field in the control message. It should respond to the most server congestion indication rather than all the messages. Therefore, it Wu, Long, Cheng, Ma Expires: February 2001 [Page 8] Draft-wuht-diffserv-dccs-00 August 2000 could avoid unfairness to flows that will traverse a long way in a DS domain through lots of hops. 5. Summary In this draft, we propose a Direct Congestion Control Scheme (DCCS) for unresponsive flows congestion control in a DS IP network. We present the general requirements of this scheme and an example of the implementation. This scheme is very simple and soundly effective. We belive that it will enhance the and fairness between responsive and non responsive flows will do good to the traffic control and resource utilization in real networks. Currently, we are working on the simulation of this scheme, and we will publish the results of this scheme very soon. Finally, we list the advantages and disadvantages of our scheme: Advantages: 1) Fairness between responsive TCP flows and unresponsive UDP flows can be achieved; 2) No changes of transport protocols (such as TCP and UDP) at hosts or end; 3) Edge routers can dynamically adjust its TC according to current network load and capacity; 4) Still make the end-to-end congestion control for TCP useable and corporatable, such as ECN or other mechanisms can be used with our schemes simultaneously; Disadvantages: 1) It add somewhat additional overhead to core routers and network for contorl message generation and delivery; 2) Control message needs identification, we recommend to use a new PHB to indicate such a control message. 7. Reference [1] Floyd, S., and Jacobson, V., "Random Early Detection gateways for Congestion Avoidance ", IEEE/ACM Transactions on Networking, V.1 N.4, August 1993, p. 397-413. [2] Clark D. and Fang W., "Explicit Allocation of Best Effort Packet Delivery Service", ACM Transactions on Networking, August 1998. http://diffserv.lcs.mit.edu/exp-alloc-ddc-wf.ps Wu, Long, Cheng, Ma Expires: February 2001 [Page 9] Draft-wuht-diffserv-dccs-00 August 2000 [3] Ibanez J, Nichols K., "Preliminary Simulation Evaluation of an Assured Service", Internet Draft, draft-ibanez-diffserv-assured-eval- 00.txt>, August 1998 [4] M. Goyal, P. Misra, R. Jain, Effect of number of drop precedences in Assured Forwarding. Available from http://www.cis.ohio- state.edu/~jain/papers/dpstdy_globecom99.htm [5] Nichols, K., Blake, S., Baker, F. and D. Black, "Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers", RFC 2474, December 1998. [6] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z and W. Weiss, "An Architecture for Differentiated Services", RFC 2475, December 1998. [7] Heinanen J., Baker F., Weiss W., and Wroclawski J., "Assured Forwarding PHB Group", Internet RFC 2597, June 1999. [8] V. Jacobson, K. Nichols, K. Poduri, "Expedited Forwarding PHB", Internet RFC 2599, June 1999. [9] Heinanen, J. and R. Guerin, "A Two Rate Three Color Marker", RFC 2698, September 1999. [10] Seddigh N, Nandy B, and Pieda P, "Bandwidth Assurance Issues for TCP flows in a Differentiated Services Network", GLOBECOM 99, Rio De Janeiro, December 1999, [11] Nabil Seddigh, Biswajit Nandy, Peter Pieda "Study of TCP and UDP Interaction for the AF PHB", Internet Draft, draft-nsbnpp-diffserv- tcpudpaf-01.pdf, August 1999 [12] Hungkei (Keith) Chow, Alberto Leon-Garcia, "A Feedback Control Extension to Differentiated Services", Internet Draft, draft-chow- diffserv-fbctrl-00.pdf March 1999 8. Author's Address Haitao Wu, Keping Long, Shiduan Cheng National Key Lab of Switching Technology and Telecommunication Networks, P.O.Box 206, Beijing University of Posts & Telecommunications, Beijing 100876, P.R.China Tel: +86 10 62283761; Fax: +86 10 62283412 E-mail: {xiaodan, lkp, chsd}@bupt.edu.cn Homepage: http://wuht.topcool.net/ Wu, Long, Cheng, Ma Expires: February 2001 [Page 10] Draft-wuht-diffserv-dccs-00 August 2000 Jian Ma Nokia China R&D Center, Nokia House 1, No.11, He Ping Li Dong Jie, Beijing, 1000013, P.R.China Tel: +86 10 8422 9922 Ext.2940; Fax: +86 10 8422 2439 E-mail: jian.j.ma@nokia.com Wu, Long, Cheng, Ma Expires: February 2001 [Page 11]