RTP Media Congestion Avoidance M. Welzl Techniques (rmcat) University of Oslo Internet-Draft January 19, 2013 Intended status: Experimental Expires: July 23, 2013 Coupled congestion control for RTP media draft-welzl-rmcat-coupled-cc-00 Abstract When multiple congestion controlled RTP sessions traverse the same network bottleneck, it can be beneficial to combine their controls such that the total on-the-wire behavior is improved. This document describes such a method for flows that have the same sender, in a way that is as flexible and simple as possible while minimizing the amount of changes needed to existing RTP applications. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on July 23, 2013. Copyright Notice Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of Welzl Expires July 23, 2013 [Page 1] Internet-Draft Coupled congestion control for RTP media January 2013 the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. 1. Introduction When there is enough data to send, a congestion controller must increase its sending rate until the path's available capacity has been reached; depending on the controller, sometimes the rate is increased further, until packets are ECN-marked or dropped. In the public Internet, this is currently the only way to get any feedback from the network that can be used as an indication of congestion. This process inevitably creates undesirable queuing delay -- an effect that is amplified when multiple congestion controlled connections traverse the same network bottleneck. When such connections originate from the same host, it would therefore be ideal to use only one single sender-side congestion controller which determines the overall allowed sending rate, and then use a local scheduler to assign a proportion of this rate to each RTP session. This way, priorities could also be implemented quite easily, as a function of the scheduler; honoring user-specified priorities is, for example, required by rtcweb [rtcweb-usecases]. The Congestion Manager (CM) [RFC3124] provides a single congestion controller with a scheduling function just as described above. It is, however, hard to implement because it requires an additional congestion controller and removes all per-connection congestion control functionality, which is quite a significant change to existing RTP based applications. This document presents a method that is easier to implement than the CM and also requires less significant changes to existing RTP based applications. It attempts to roughly approximate the CM behavior by sharing information between existing congestion controllers, akin to "Ensemble Sharing" in [RFC2140]. 2. Definitions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. Available Bandwidth: The available bandwidth is the nominal link capacity minus the amount of traffic that traversed the link during a certain time interval, divided by that time interval. Welzl Expires July 23, 2013 [Page 2] Internet-Draft Coupled congestion control for RTP media January 2013 Bottleneck: The first link with the smallest available bandwidth along the path between a sender and receiver. Flow: A flow is the entity that congestion control is operating on. It could, for example, be a transport layer connection, an RTP session, or a subsession that is multiplexed onto a single RTP session together with other subsessions. Flow Group Identifier (FGI): A unique identifier for each subset of flows that is limited by a common bottleneck. Flow State Exchange (FSE): The entity which maintains information that is exchanged between flows. Flow Group (FG): A group of flows having the same FGI. Shared Bottleneck Detection (SBD): The entity that determines which flows traverse the same bottleneck in the network, or the process of doing so. 3. Limitations Sender-side only: Coupled congestion control as described here only operates inside a single host on the sender side. This is because, irrespective of where the major decisions for congestion control are taken, the sender of a flow needs to eventually decide the transmission rate. Additionally, the necessary information about how much data an application can currently send on a flow is typically only available at the sender side, making the sender an obvious choice for placement of the elements and mechanisms described here. It is recognized that flows that have different senders but the same receiver, or different senders and different receivers can also share a bottleneck; such scenarios have been omitted for simplicity, and could be incorporated in future versions of this document. Note that limiting the flows on which coupled congestion control operates merely limits the benefits derived from the mechanism. Welzl Expires July 23, 2013 [Page 3] Internet-Draft Coupled congestion control for RTP media January 2013 Shared bottlenecks do not change quickly: As per the definition above, a bottleneck depends on cross traffic, and since such traffic can heavily fluctuate, bottlenecks can change at a high frequency (e.g., there can be oscillation between two or more links). This means that, when flows are partially routed along different paths, they may quickly change between sharing and not sharing a bottleneck. For simplicity, here it is assumed that a shared bottleneck is valid for a time interval that is significantly longer than the interval at which congestion controllers operate. Note that, for the only SBD mechanism defined in this document (multiplexing on the same five-tuple), the notion of a shared bottleneck stays correct even in the presence of fast traffic fluctuations: since all flows that are assumed to share a bottleneck are routed in the same way, if the bottleneck changes, it will still be shared. 4. Architectural overview Figure 1 shows the elements of the architecture for coupled congestion control: the Flow State Exchange (FSE), Shared Bottleneck Detection (SBD) and Flows. The FSE is a storage element. It is passive in that it does not actively initiate communication with flows and the SBD; its only active role is internal state maintenance (e.g., an implementation could use soft state to remove a flow's data after long periods of inactivity). Every time a flow's congestion control mechanism would normally update its sending rate, the flow instead updates information in the FSE and performs a query on the FSE, leading to a sending rate that is often different from what the congestion controller originally determined. Using information about/from the currently active flows, SBD updates the FSE with the correct Flow Group Identifiers (FGIs). ------- <--- Flow 1 | FSE | <--- Flow 2 .. ------- <--- .. Flow N ^ | | ------- | | SBD | <-------| ------- Figure 1: Coupled congestion control architecture Welzl Expires July 23, 2013 [Page 4] Internet-Draft Coupled congestion control for RTP media January 2013 Since everything shown in Figure 1 is assumed to operate on a single host (the sender) only, this document only describes aspects that have an influence on the resulting on-the-wire behavior. It does, for instance, not define how many bits must be used to represent FGIs, or in which way the entities communicate. Implementations can take various forms: for instance, all the elements in the figure could be implemented within a single application, thereby operating on flows generated by that application only. Another alternative could be to implement both the FSE and SBD together in a separate process which different applications communicate with via some form of Inter-Process Communication (IPC). Such an implementation would extend the scope to flows generated by multiple applications. The FSE and SBD could also be included in the Operating System kernel. 5. Roles This section gives an overview of the roles of the elements of coupled congestion control, and provides an example of how coupled congestion control can operate. 5.1. SBD SBD uses knowledge about the flows to determine which flows belong in the same Flow Group (FG), and assigns FGIs accordingly. This knowledge can be derived from measurements, by considering correlations among measured delay and loss as an indication of a shared bottleneck, or it can be based on the simple assumption that packets sharing the same five-tuple (IP source and destination address, protocol, and transport layer port number pair) are typically routed in the same way. The latter method is the only one specified in this document: SBD MUST consider all flows that use the same five-tuple to belong to the same FG. This classification applies to certain tunnels, or RTP flows that are multiplexed over one transport (cf. [transport-multiplex]). In one way or another, such multiplexing will probably be recommended for use with rtcweb [rtcweb-rtp-usage]. Port numbers are needed as part of the classification due to mechanisms like Equal-Cost Multi-Path (ECMP) routing which use different paths for packets towards the same destination, but are typically configured to keep packets from the same transport connection on the same path. 5.2. FSE The FSE contains a list of all flows that have registered with it. For each flow, it stores: Welzl Expires July 23, 2013 [Page 5] Internet-Draft Coupled congestion control for RTP media January 2013 o a unique flow number to identify the flow o the FGI of the FG that it belongs to (based on the definitions in this document, a flow has only one bottleneck, and can therefore be in only one FG) o a priority P, which here is assumed to be represented as a floating point number in the range from 0.1 (unimportant) to 1 (very important). A negative value is used to indicate that a flow has terminated. o The calculated rate CR, i.e. the rate that was most recently calculated by the flow's congestion controller. o The desired rate DR. This can be smaller than the calculated rate if the application feeding into the flow has less data to send than the congestion controller would allow. In case of a greedy flow, DR must be set to CR. A DR value that is larger than CR indicates that the flow has taken leftover bandwidth from a non- greedy flow. o S_CR, the sum of the calculated rates of all flows in the same FG (including the flow itself), as seen by the flow during its last rate update. The information listed here is enough to implement the sample flow algorithm given below. FSE implementations could easily be extended to store, e.g., a flow's current sending rate for statistics gathering or future potential optimizations. 5.3. Flows Flows register themselves with SBD and FSE when they start, deregister from the FSE when they stop, and carry out an UPDATE function call every time their congestion controller calculates a new sending rate. Via UPDATE, they provide the newly calculated rate and the desired rate (less than the calculated rate in case of non-greedy flows, the same otherwise). UPDATE returns a rate that should be used instead of the rate that the congestion controller has determined. Below, an example algorithm is described. While other algorithms could be used instead, the same algorithm must be applied to all flows. The way the algorithm is described here, the operations are carried out by the flows, but they are the same for all flows. This means that the algorithm could, for example, be implemented in a library that provides registration, deregistration functions and the UPDATE function. To minimize the number of changes to existing Welzl Expires July 23, 2013 [Page 6] Internet-Draft Coupled congestion control for RTP media January 2013 applications, one could, however, also embed this functionality in the FSE element. 5.3.1. Example algorithm (1) When a flow starts, it registers itself with SBD and the FSE. CR and DR are initialized with the congestion controller's initial rate. SBD will assign the correct FGI. When a flow is assigned an FGI, its S_CR is initialized to be the sum of the calculated rates of all the flows in its FG. (2) When a flow stops, it sets its DR to 0 and negates P. (3) Every time the flow's congestion controller determines a new sending rate new_CR, assuming the flow's new desired rate new_DR to be "infinity" in case of a greedy flow with an unknown maximum rate, the flow calls UPDATE, which carries out the following tasks: (a) For all the flows in its FG (including itself), it calculates the sum of all the absolute values of all priorities, S_P, the sum of all desired rates, S_DR, and the sum of all the calculated rates, new_S_CR. (b) It updates CR if new_CR is smaller than the already stored CR value, or if new_S_CR is smaller or equal to the flow's stored S_CR value. This restriction on updating CR ensures that only one flow can make S_CR increase at a time. (c) It updates new_S_CR using its own updated CR, and updates S_CR with new_S_CR. (d) It subtracts DR from S_DR, updates DR to min(new_DR, CR), and adds the updated DR to S_DR. (e) It initializes the total leftover rate TLO to 0. Then, for every other flow i in its FG that has DR(i) < CR(i), it calculates the leftover rate as abs(P(i))/S_P * S_CR - DR(i), adds the flow's leftover rate to TLO, and sets DR(i) to CR(i). This makes flow i look like a greedy flow and ensures that the leftover rate can only once be taken from it. Finally, if P(i) is negative, it removes flow i's entry from the FSE. (f) It calculates the new sending rate as min(new_DR, P/S_P * S_CR + TLO). This gives the flow the correct share of the bandwidth based on its priority, applies an upper bound in case of an application-limited flow, and adds any Welzl Expires July 23, 2013 [Page 7] Internet-Draft Coupled congestion control for RTP media January 2013 potentially leftover bandwidth from non-greedy flows. (g) If the flow's new sending rate is greater than DR, then it updates DR with the flow's new sending rate. The goals of the flow algorithm are to achieve prioritization, improve network utilization in the face of non-greedy flows, and impose limits on the increase behavior such that the negative impact of multiple flows trying to increase their rate together is minimized. It does that by assigning a flow a sending rate that may not be what the flow's congestion controller expected. It therefore builds on the assumption that no significant inefficiencies arise from temporary non-greedy behavior or from quickly jumping to a rate that is higher than the congestion controller intended. How problematic these issues really are depends on the controllers in use and requires careful per-controller experimentation. The coupled congestion control mechanism described here also does not require all controllers to be equal; effects of heterogeneous controllers, or homogeneous controllers being in different states, are also subject to experimentation. There are more potential issues with the algorithm described here. Rule 3 b) leads to a conservative behavior: it ensures that only one flow at a time can increase the overall sending rate. This rule is probably appropriate for situations where minimizing delay is the major goal, but it may not fit for all purposes; it also does not incorporate the magnitude by which a flow can increase its rate. Notably, despite this limitation on the overall rate of all flows per FGI, immediate rate jumps of single flows could become problematic when the FSE is used in a highly asynchronous manner, e.g. when flows have very different RTTs. Rule 3 e) gives all the leftover rate of non-greedy flows to the first flow that updates its sending rate, provided that this flow needs it all (otherwise, its own leftover rate can be taken by the next flow that updates its rate). Other policies could be applied, e.g. to divide the leftover rate of a flow equally among all other flows in the FGI. 5.3.2. Example operation In order to illustrate the operation of the coupled congestion control algorithm, this section presents a toy example of two flows that use it. Let us assume that both flows traverse a common 10 Mbit/s bottleneck and use a simplistic congestion controller that starts out with 1 Mbit/s, increases its rate by 1 Mbit/s in the absence of congestion and decreases it by 2 Mbit/s in the presence of congestion. For simplicity, flows are assumed to always operate in a round-robin fashion. Rate numbers below without units are assumed to be in Mbit/s. For illustration purposes, the actual sending rate is Welzl Expires July 23, 2013 [Page 8] Internet-Draft Coupled congestion control for RTP media January 2013 also shown for every flow in FSE diagrams even though it is not really stored in the FSE. Flow #1 begins. It is greedy and considers itself to have top priority. This is the FSE after the flow algorithm's step 1: --------------------------------------------- | # | FGI | P | CR | DR | S_CR | Rate | | | | | | | | | | 1 | 1 | 1 | 1 | 1 | 1 | 1 | --------------------------------------------- Its congestion controller gradually increases its rate. Eventually, at some point, the FSE should look like this: --------------------------------------------- | # | FGI | P | CR | DR | S_CR | Rate | | | | | | | | | | 1 | 1 | 1 | 10 | 10 | 10 | 10 | --------------------------------------------- Now another flow joins. It is also greedy, and has a lower priority (0.5): ----------------------------------------------- | # | FGI | P | CR | DR | S_CR | Rate | | | | | | | | | | 1 | 1 | 1 | 10 | 10 | 10 | 10 | | 2 | 1 | 0.5 | 1 | 1 | 11 | 1 | ----------------------------------------------- Now assume that the first flow updates its rate to 8, because the total sending rate of 11 exceeds the total capacity. Let us take a closer look at what happens in step 3 of the flow algorithm. Welzl Expires July 23, 2013 [Page 9] Internet-Draft Coupled congestion control for RTP media January 2013 new_CR = 8. new_DR = infinity. 3 a) S_P = 1.5; S_DR = 11; new_S_CR = 11. 3 b) new_CR < CR, hence CR = 8. 3 c) new_S_CR = 9; S_CR = 9. 3 d) DR = CR = 8; S_DR = 9. 3 e) TLO = 0; there are no other flows with DR < CR. 3 f) new sending rate: min(infinity, 1/1.5 * 9 + 0) = 6. 3 g) does not apply. The resulting FSE looks as follows: ----------------------------------------------- | # | FGI | P | CR | DR | S_CR | Rate | | | | | | | | | | 1 | 1 | 1 | 8 | 8 | 9 | 6 | | 2 | 1 | 0.5 | 1 | 1 | 11 | 1 | ----------------------------------------------- The effect is that flow #1 is sending with 6 Mbit/s instead of the 8 Mbit/s that the congestion controller derived. Let us now assume that flow #2 updates its rate. Its congestion controller detects that the network is not fully saturated (the actual total sending rate is 6+1=7) and increases its rate. new_CR=2. new_DR = infinity. 3 a) S_P = 1.5; S_DR = 9; new_S_CR = 9. 3 b) new_CR > CR but new_S_CR < S_CR, hence CR = 2. 3 c) new_S_CR = 10; S_CR = 10. 3 d) DR = CR = 2; S_DR = 10. 3 e) TLO = 0; there are no other flows with DR < CR. 3 f) new sending rate: min(infinity, 0.5/1.5 * 10 + 0) = 3.33. 3 g) new sending rate > DR, hence DR = 3.33. The resulting FSE looks as follows: ----------------------------------------------- | # | FGI | P | CR | DR | S_CR | Rate | | | | | | | | | | 1 | 1 | 1 | 8 | 8 | 9 | 6 | | 2 | 1 | 0.5 | 2 | 3.33 | 10 | 3.33 | ----------------------------------------------- The effect is that flow #2 is now sending with 3.33 Mbit/s, which is close to half of the rate of flow #1 and leads to a total utilization of 6(#1) + 3.33(#2) = 9.33 Mbit/s. Flow #2's congestion controller has increased its rate faster than the controller actually expected. Now, flow #1 updates its rate. Its congestion controller detects Welzl Expires July 23, 2013 [Page 10] Internet-Draft Coupled congestion control for RTP media January 2013 that the network is not fully saturated and increases its rate. Additionally, the application feeding into flow #1 limits the flow's sending rate to at most 2 Mbit/s. new_CR=9. new_DR=2. 3 a) S_P = 1.5; S_DR = 11.33; new_S_CR = 10. 3 b) new_CR > CR and new_S_CR > S_CR, hence CR is not updated (since flow #2 has just increased S_CR, flow #1 cannot also increase it in this iteration). 3 c) new_S_CR = 10; S_CR = 10. 3 d) DR = 2; S_DR = 5.33. 3 e) TLO = 0; there are no other flows with DR < CR. 3 f) new sending rate: min(2, 1/1.5 * 10 + 0) = 2. Note that, without the 2 Mbit/s limitation from the application, the new sending rate for flow #1 would now be 6.66 Mbit/s, leading to perfect network saturation (6.66 + 3.33 = approx. 10). 3 g) does not apply. The resulting FSE looks as follows: ----------------------------------------------- | # | FGI | P | CR | DR | S_CR | Rate | | | | | | | | | | 1 | 1 | 1 | 8 | 2 | 10 | 2 | | 2 | 1 | 0.5 | 2 | 3.33 | 10 | 3.33 | ----------------------------------------------- Now, the total rate of the two flows is 2 + 3.33 = 5.33 Mbit/s, i.e. the network is significantly underutilized due to the limitation of flow #1. Flow #2 updates its rate. Its congestion controller detects that the network is not fully saturated and increases its rate. Welzl Expires July 23, 2013 [Page 11] Internet-Draft Coupled congestion control for RTP media January 2013 new_CR=3. new_DR = infinity. 3 a) S_P = 1.5; S_DR = 5.33; new_S_CR = 10. 3 b) new_CR > CR but new_S_CR = S_CR, hence CR = 3. 3 c) new_S_CR = 11; S_CR = 11. 3 d) DR = 3; S_DR = 5. 3 e) TLO = 0; flow #1 has DR < CR, hence TLO += 1/1.5 * 11 - 2 = 5.33. DR of flow #1 is set to 8. Flow #1 does not have a negative P(i) value, so its entry is not deleted. 3 f) new sending rate: min(infinity, 0.5/1.5*11 + 5.33) = 9. 3 g) new sending rate > DR, hence DR = 9. The resulting FSE looks as follows: ----------------------------------------------- | # | FGI | P | CR | DR | S_CR | Rate | | | | | | | | | | 1 | 1 | 1 | 8 | 8 | 10 | 2 | | 2 | 1 | 0.5 | 3 | 9 | 11 | 9 | ----------------------------------------------- Now, the total rate of the two flows is 2 + 9 = 11 Mbit/s, exceeding the total capacity by the 1 Mbit/s by which the congestion controller of flow #2 has increased its rate. Note that, had flow #1 been greedy, the same total rate would have resulted after this iteration. Finally, flow #1 terminates. It sets P to -1 and DR to 0. Let us assume that it terminated late enough for flow #2 to still experience the network in a congested state, i.e. flow #2 decreases its rate in the next iteration. new_CR = 1. new_DR = infinity. 3 a) S_P = 1.5; S_DR = 9; new_S_CR = 11. 3 b) new_CR < CR hence CR = 1. 3 c) new_S_CR = 9; S_CR = 9. 3 d) DR = 1; S_DR = 1. 3 e) TLO = 0; flow #1 has DR < CR, hence TLO += 1/1.5 * 9 - 0 = 6. DR of flow #1 is set to 8. Flow #1 has a negative P(i) value, so its entry is deleted. 3 f) new sending rate: min(infinity, 0.5/1.5 * 9 + 6) = 9. 3 g) new sending rate > DR, hence DR = 9. The resulting FSE looks as follows: ----------------------------------------------- | # | FGI | P | CR | DR | S_CR | Rate | | | | | | | | | | 1 | 1 | -1 | 8 | 0 | 10 | 2 | (before deletion) | 2 | 1 | 0.5 | 1 | 9 | 9 | 9 | Welzl Expires July 23, 2013 [Page 12] Internet-Draft Coupled congestion control for RTP media January 2013 ----------------------------------------------- Now, the total rate, used only by flow #2, is 9 Mbit/s, which is the rate that it would have had alone upon reacting to congestion after a sending rate of 11 Mbit/s. 6. Acknowledgements This document has benefitted from discussions with and feedback from Stein Gjessing, David Hayes, Safiqul Islam, Naeem Khademi, Andreas Petlund, and David Ros (who also gave the FSE its name). 7. IANA Considerations This memo includes no request to IANA. 8. Security Considerations In scenarios where the architecture described in this document is applied across applications, various cheating possibilities arise: e.g., supporting wrong values for the calculated rate, the desired rate, or the priority of a flow. In the worst case, such cheating could either prevent other flows from sending or make them send at a rate that is unreasonably large. The end result would be unfair behavior at the network bottleneck, akin to what could be achieved with any UDP based application. Hence, since this is no worse than UDP in general, there seems to be no significant harm in using this in the absence of UDP rate limiters. In the case of a single-user system, it should also be in the interest of any application programmer to give the user the best possible experience by using reasonable flow priorities or even letting the user choose them. In a multi-user system, this interest may not be given, and one could imagine the worst case of an "arms race" situation, where applications end up setting their priorities to the maximum value. If all applications do this, the end result is a fair allocation in which the priority mechanism is implicitly eliminated, and no major harm is done. 9. References Welzl Expires July 23, 2013 [Page 13] Internet-Draft Coupled congestion control for RTP media January 2013 9.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2140] Touch, J., "TCP Control Block Interdependence", RFC 2140, April 1997. [RFC3124] Balakrishnan, H. and S. Seshan, "The Congestion Manager", RFC 3124, June 2001. 9.2. Informative References [rtcweb-rtp-usage] Perkins, C., Westerlund, M., and J. Ott, "Web Real-Time Communication (WebRTC): Media Transport and Use of RTP", draft-ietf-rtcweb-rtp-usage-05.txt (work in progress), October 2012. [rtcweb-usecases] Holmberg, C., Hakansson, S., and G. Eriksson, "Web Real- Time Communication Use-cases and Requirements", draft-ietf-rtcweb-use-cases-and-requirements-10.txt (work in progress), December 2012. [transport-multiplex] Westerlund, M. and C. Perkins, "Multiple RTP Sessions on a Single Lower-Layer Transport", draft-westerlund-avtcore-transport-multiplexing-04.txt (work in progress), October 2012. Author's Address Michael Welzl University of Oslo PO Box 1080 Blindern Oslo, N-0316 Norway Phone: +47 22 85 24 20 Email: michawe@ifi.uio.no Welzl Expires July 23, 2013 [Page 14]