SIPPING Working Group V. Hilt Internet-Draft I. Widjaja Expires: January 9, 2008 Bell Labs/Alcatel-Lucent D. Malas Level 3 Communications H. Schulzrinne Columbia University July 8, 2007 Session Initiation Protocol (SIP) Overload Control draft-hilt-sipping-overload-02 Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on January 9, 2008. Copyright Notice Copyright (C) The IETF Trust (2007). Abstract Overload occurs in Session Initiation Protocol (SIP) networks when SIP servers have insufficient resources to handle all SIP messages they receive. Even though the SIP protocol provides a limited overload control mechanism through its 503 (Service Unavailable) Hilt, et al. Expires January 9, 2008 [Page 1] Internet-Draft Overload Control July 2007 response code, SIP servers are still vulnerable to overload. This document proposes several new overload control mechanisms for the SIP protocol. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Design Considerations . . . . . . . . . . . . . . . . . . . . 4 3.1. System Model . . . . . . . . . . . . . . . . . . . . . . . 4 3.2. Hop-by-Hop vs. End-to-End . . . . . . . . . . . . . . . . 5 3.3. Topologies . . . . . . . . . . . . . . . . . . . . . . . . 7 3.4. Overload Control Method . . . . . . . . . . . . . . . . . 10 3.4.1. Rate-based Overload Control . . . . . . . . . . . . . 10 3.4.2. Loss-based Overload Control . . . . . . . . . . . . . 11 3.4.3. Window-based Overload Control . . . . . . . . . . . . 11 3.5. Load Status . . . . . . . . . . . . . . . . . . . . . . . 12 3.6. SIP Mechanism . . . . . . . . . . . . . . . . . . . . . . 13 3.7. Backwards Compatibility . . . . . . . . . . . . . . . . . 14 3.8. Interaction with Local Overload Control . . . . . . . . . 15 4. SIP Application Considerations . . . . . . . . . . . . . . . . 15 4.1. How to Calculate Load Levels . . . . . . . . . . . . . . . 15 4.2. Responding to an Overload Indication . . . . . . . . . . . 15 4.3. Emergency Services Requests . . . . . . . . . . . . . . . 16 4.4. Privacy Considerations . . . . . . . . . . . . . . . . . . 16 4.4.1. Critical Notify . . . . . . . . . . . . . . . . . . . 17 4.4.2. Overload Suppression . . . . . . . . . . . . . . . . . 17 4.5. Operations and Management . . . . . . . . . . . . . . . . 18 5. SIP Load Header Field . . . . . . . . . . . . . . . . . . . . 18 5.1. Generating the Load Header . . . . . . . . . . . . . . . . 18 5.2. Determining the Load Header Value . . . . . . . . . . . . 19 5.3. Determining the Throttle Parameter Value . . . . . . . . . 19 5.4. Processing the Load Header . . . . . . . . . . . . . . . . 20 5.5. Using the Load Header Value . . . . . . . . . . . . . . . 21 5.6. Using the Throttle Parameter Value . . . . . . . . . . . . 21 5.7. Rejecting Requests . . . . . . . . . . . . . . . . . . . . 21 6. Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 7. Security Considerations . . . . . . . . . . . . . . . . . . . 23 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 24 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 24 9.1. Normative References . . . . . . . . . . . . . . . . . . . 24 9.2. Informative References . . . . . . . . . . . . . . . . . . 25 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 25 Intellectual Property and Copyright Statements . . . . . . . . . . 27 Hilt, et al. Expires January 9, 2008 [Page 2] Internet-Draft Overload Control July 2007 1. Introduction As with any network element, a Session Initiation Protocol (SIP) [2] server can suffer from overload when the number of SIP messages it receives exceeds the number of messages it can process. Overload can pose a serious problem for a network of SIP servers. During periods of overload, the throughput of a SIP server can be significantly degraded. In particular, overload may lead to a situation in which the throughput drops to a small fraction of the original server capacity. This is often called congestion collapse. Overload is said to occur if a SIP server does not have sufficient resources to process all incoming SIP messages. These resources may include CPU processing capacity, memory, network bandwidth, input/ output, or disk resources. Generally speaking, overload occurs if a SIP server can no longer process or respond to all incoming SIP messages. For overload control, we only consider failure cases where SIP servers are unable to process all SIP requests. There are other failure cases where the SIP server can process, but not fulfill, requests. These cases are beyond the scope of this document since SIP provides other response codes for them and overload control MUST NOT be used to handle these scenarios. For example, a PSTN gateway that runs out of trunk lines but still has plenty of capacity to process SIP messages should reject incoming INVITEs using a 488 (Not Acceptable Here) response [4]. Similarly, a SIP registrar that has lost connectivity to its registration database but is still capable of processing SIP messages should reject REGISTER requests with a 500 (Server Error) response [2]. The SIP protocol provides a limited mechanism for overload control through its 503 (Service Unavailable) response code. However, this mechanism cannot prevent overload of a SIP server and it cannot prevent congestion collapse. In fact, the use of the 503 (Service Unavailable) response code may cause traffic to oscillate and to shift between SIP servers and thereby worsen an overload condition. A detailed discussion of the SIP overload problem, the problems of the 503 (Service Unavailable) response code and the requirements for a SIP overload control mechanism can be found in [6]. This specification is structured as follows: Section 3 discusses general design principles of an SIP overload control mechanism. Section 4 discusses general considerations for applying SIP overload control. Section 5 defines a SIP protocol extension for overload control and Section 6 introduces the syntax of this extension. Section 7 and Section 8 discuss security and IANA considerations respectively. Hilt, et al. Expires January 9, 2008 [Page 3] Internet-Draft Overload Control July 2007 2. Terminology In this document, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in BCP 14, RFC 2119 [1] and indicate requirement levels for compliant implementations. 3. Design Considerations This section discusses key design considerations for a SIP overload control mechanism. The design goal for this mechanism is to enable a SIP server to control the load it receives from its upstream neighbors. 3.1. System Model The model shown in Figure 1 identifies fundamental components of an SIP overload control system: o SIP Processor: processes SIP messages. It is the component that is protected by overload control. o Monitor: monitors the current load of the SIP processor on the receiving entity. It implements the mechanisms needed to measure the current usage of resources relevant for the SIP processor and reports load samples (S) to the Control Function. o Control Function: implements the overload control mechanism on the receiving and sending entity. The control function uses the load samples (S). It determines if overload has occurred and a throttle (T) needs to be set to adjust the load sent to the SIP processor on the receiving entity. The control function on the receiving entity sends load feedback (F) to the control function sending entity. o Actuator: implements the algorithms needed to act on the throttles (T) and to adjust the load forwarded to the receiving entity. For example, a throttle may instruct the actuator to reduce the load destined to the receiving entity by 10%. The algorithms in the actuator then determine how the load reduction is achieved, e.g., by selecting the messages that will be affected and determining whether they are rejected or redirected. The type of feedback (F) conveyed from the receiving to the sending entity depends on the overload control method used (i.e., loss-based, rate-based or window-based overload control; see Section 3.4.3) as well as other design parameters (e.g., whether load status information is included or not). In any case, the feedback (F) informs the sending entity that overload has occurred and that the Hilt, et al. Expires January 9, 2008 [Page 4] Internet-Draft Overload Control July 2007 traffic forward to the receiving entity needs to be reduced to a lower rate. Sending Receiving Entity Entity +----------------+ +----------------+ | Server A | | Server B | | +----------+ | | +----------+ | -+ | | Control | | F | | Control | | | | | Function |<-+------+--| Function | | | | +----------+ | | +----------+ | | | T | | | ^ | | Overload | v | | | S | | Control | +----------+ | | +----------+ | | | | Actuator | | | | Monitor | | | | +----------+ | | +----------+ | | | | | | ^ | -+ | v | | | | -+ | +----------+ | | +----------+ | | <-+--| SIP | | | | SIP | | | SIP --+->|Processor |--+------+->|Processor |--+-> | System | +----------+ | | +----------+ | | +----------------+ +----------------+ -+ Figure 1: System Model for Overload Control 3.2. Hop-by-Hop vs. End-to-End A SIP request is often processed by more than one SIP server on its path to the destination. Thus, a design choice for overload control involves the level of cooperation between the SIP servers on the path of a request. Overload control can be implemented hop-by-hop, i.e., independently between each pair of servers, or end-to-end as a single control loop that stretches across the entire path from UAC to UAS (see Figure 2). Hilt, et al. Expires January 9, 2008 [Page 5] Internet-Draft Overload Control July 2007 +---------+ +-------+----------+ +------+ | | | ^ | | | | +---+ | | +---+ v | v //=>| C | v | //=>| C | +---+ +---+ // +---+ +---+ +---+ // +---+ | A |===>| B | | A |===>| B | +---+ +---+ \\ +---+ +---+ +---+ \\ +---+ ^ \\=>| D | ^ | \\=>| D | | +---+ | | +---+ | | | v | +---------+ +-------+----------+ (a) hop-by-hop loop (b) end-to-end loop ==> SIP request flow <-- Load feedback loop Figure 2: Hop-by-Hop vs. End-to-End In the hop-by-hop model, a separate control loop is instantiated between all neighboring SIP servers that directly exchange traffic. This control loop is completely independent of the control loops between other servers. In the example in Figure 2(a), three independent overload control loops are instantiated: A - B, B - C and B - D. Thus, each overload control loop only covers a single hop. In the hop-by-hop model, each SIP server provides load feedback to its direct upstream neighbors, which then adjust the amount of traffic they forward to the SIP server. These neighbors do not forward the received feedback information further upstream. Instead, they act on the feedback and resolve the overload condition if needed, for example, by re-routing or rejecting traffic. The upstream neighbor of a server instantiates a separate overload control loop with its upstream neighbors. If the neighbor becomes overloaded, it will report this problem to its upstream neighbors, which again take action based on the reported feedback. Thus, in hop-by-hop overload control, overload is always resolved by the direct upstream neighbors of the overloaded server without the need to involve entities that are located multiple SIP hops away. Hop-by-hop overload control can effectively reduce the impact of overload on a SIP network and, in particular, can avoid congestion collapse. In addition, hop-by-hop overload control is simple and scales well to networks with many SIP entities. It does not require a SIP entity to aggregate a large number of load status values or keep track of the load status of SIP servers it is not communicating with. Hilt, et al. Expires January 9, 2008 [Page 6] Internet-Draft Overload Control July 2007 End-to-end overload control implements an overload control loop along the entire path of a SIP request, from UAC to UAS. An end-to-end overload control mechanism needs to consider load information from all SIP servers on the way (including all proxies and the UAS). It has to be able to frequently collect the load status of all servers on the potential path(s) to a destination and combine this data into meaningful load feedback. A UA or SIP server should only throttle requests, if it knows that these requests will eventually be forwarded to an overloaded server. For example, if D is overload in Figure 2(b), A should only throttle requests it forwards to B, when it knows that they will be forwarded to D. It should not throttle requests that will eventually forward to C, since server C is doing fine. In many cases, it is hard for A to determine which requests will be routed to C and D since this depends on the local routing decision made by B. The main problem of end-to-end path overload control is its inherent complexity since a UAC or SIP server would need to monitor all potential paths to a destination in order to determine which requests should be throttled and which requests may be sent. Therefore, end- to-end overload control is likely to only work in simple, well-known topologies (e.g. a server is known to only have one downstream neighbor) or if a UA/server sends lots of requests to the exact same destination or . 3.3. Topologies In a simple topology, a SIP server traffic from a single source (as shown in Figure 3(a)). A load balancer is a typical example for this configuration. Overload control needs to prevent the upstream server from sending too much traffic to its downstream neighbors. In more complex topology, a SIP server receives traffic from multiple upstream sources, as shown in Figure 3(b). Here, SIP servers A, B and C forward traffic to server D. It is important to note that each of these servers may contribute a different amount of load to the overall load of D. This load mix may vary over time. If server D becomes overloaded, it needs to generate feedback to reduce the amount of traffic it receives from its upstream neighbors (i.e., A, B and C respectively). The server needs to decide how overload control feedback is balanced across upstream neighbors. This decision needs to account for the actual amount of traffic received from each upstream neighbor. The decision may need to be re-adjusted as the load contributed by each upstream neighbor varies over time. A server may use a local policy to decide how much load it wants to receive from each upstream neighbor. For example, a server may throttle all upstream sources equally (e.g., all sources need to Hilt, et al. Expires January 9, 2008 [Page 7] Internet-Draft Overload Control July 2007 reduce traffic forwarded by 10%) or to prefer some servers over others. For example, it may want to throttle a less preferred upstream neighbor earlier than a preferred neighbor or throttle the neighbor first that sends the most traffic. Since this decision is made by the receiving entity (i.e., server D), all senders for this entity are governed by the same overload control algorithm. In many network configurations, upstream servers (A, B and C in Figure 3(c)) have multiple alternative servers (servers D and E) to which they can forward traffic. In this case, they can choose to redirect a message to the alternate server if the primary target is overloaded. This is in particular useful if servers D and E differ in their processing capacity and are not load balanced otherwise. When redirecting messages, the upstream servers need to ensure that these messages do not overload the alternate server. An overload control mechanism needs to enable upstream servers to only choose alternative servers that have enough capacity to handle the redirected requests. Hilt, et al. Expires January 9, 2008 [Page 8] Internet-Draft Overload Control July 2007 +---+ +---+ /->| D | | A |-\ / +---+ +---+ \ / \ +---+ +---+-/ +---+ +---+ \->| | | A |------>| E | | B |------>| D | +---+-\ +---+ +---+ /->| | \ / +---+ \ +---+ +---+ / \->| F | | C |-/ +---+ +---+ (a) load balancer w/ (b) multiple upstream alternate servers neighbors +---+ | A |---\ a--\ +---+=\ \---->+---+ \ \/----->| D | b--\ \--->+---+ +---+--/\ /-->+---+ \---->| | | B | \/ c-------->| D | +---+===\/\===>+---+ | | /\====>| E | ... /--->+---+ +---+--/ /==>+---+ / | C |=====/ z--/ +---+ (c) multiple upstream (d) very large number of neighbors w/ alternate server upstream neighbors Figure 3: Topologies Overload control that is based on throttling the message rate is not suited for servers that receive requests from a very large population of senders, which only infrequently send requests as shown in Figure 3(d). An edge proxy that is connected to many UAs is an example for such a configuration. Since each UA typically only contributes a few requests, which are often related to the same call, it can't decrease its message rate to resolve the overload. In such a configuration, a SIP server can gradually reduce its load by rejecting a percentage of the requests it receives, e.g., with 503 (Service Unavailable) responses. Since there are many upstream neighbors that contribute to the overall load, sending 503 (Service Unavailable) to a fraction of them gradually reduces load without entirely stopping the incoming traffic. Hilt, et al. Expires January 9, 2008 [Page 9] Internet-Draft Overload Control July 2007 3.4. Overload Control Method The method used by an overload control mechanism to limit the amount of traffic forwarded to an element is a key aspect of the design. We discuss the following three different types of overload control methods: rate-based, loss-based and window-based overload control. 3.4.1. Rate-based Overload Control The key idea of rate-based overload control is to limit the message rate that an upstream element is allowed to forward to the downstream neighbor. If overload occurs, a SIP server instructs each upstream neighbor to send at most X messages per second. This rate cap ensures that the offered load for the SIP server never increases beyond the sum of the rate caps granted to all upstream neighbors and can protect a SIP server from overload even during extreme load spikes. An algorithm for the sending entity to implement a rate cap of a given number of messages per second X is message gapping. After transmitting a message to a downstream neighbor, a server waits for 1/X seconds before it transmits the next message to the same neighbor. Messages that arrive during the waiting period are not forwarded and are either redirected, rejected or buffered. The main drawback of this mechanism is that it requires a SIP server to assign a certain rate cap to each of its upstream neighbors based on its overall capacity. Effectively, a server assigns a share of its capacity to each upstream neighbor. The server needs to ensure that the sum of all rate caps assigned to upstream neighbors is not (significantly) higher than its actual processing capacity. This requires a SIP server to continuously evaluate the amount of load it receives from an upstream neighbor and assign a rate cap that is suitable for this neighbor. For example, in a non-overloaded situation, it could assign a rate cap that is 10% higher than the current load received from this neighbor. The rate cap needs to be adjusted if the load offered by upstream neighbors changes and new upstream neighbors appear or an existing neighbor stops transmitting. If the cap assigned to an upstream neighbor is too high, the server may still experience overload. However, if the cap is too low, the upstream neighbors will reject messages even though they could be processed by the server. Thus, rate-based overload control is likely to work well only if the number of upstream servers is small and constant, e.g., as shown in the example in Figure 3(d). Hilt, et al. Expires January 9, 2008 [Page 10] Internet-Draft Overload Control July 2007 3.4.2. Loss-based Overload Control A loss percentage enables a SIP server to ask its upstream neighbor to reduce the amount of traffic it would normally forward to this server by a percentage X. For example, a SIP server can ask its upstream neighbors to lower the traffic it would normally forward to it by 10%. The upstream neighbor then redirects or rejects X percent of the traffic that is destined for this server. An algorithm for the sending entity to implement a loss percentage is to draw a random number between 1 and 100 for each request to be forwarded. The request is not forwarded to the server if the random number is less than or equal to X. For loss-based overload control, the receiving entity does not need to track the message rate it receives from each upstream neighbor. To reduce load, a server can ask each upstream neighbor to lower traffic by a certain percentage. This percentage can be determined independent of the actual message rate contributed by each server. The loss percentage depends on the loss percentage currently used by the upstream servers and the current system load of the server. For example, if the server load approaches 90% and the current loss percentage is set to a 50% load reduction, then the server may decide to increase the loss percentage to 55% in order to get back to a system load of 80%. Similarly, the server can lower the loss percentage if permitted by the system utilization. This requires that system load can be accurately measured and that these measurements are reasonably stable. The main drawback of percentage throttling is that the throttle percentage needs to be adjusted to the offered load, in particular, if the load fluctuates quickly. For example, if a SIP server sets a throttle value of 10% at time t1 and load increases by 20% between time t1 and t2 (t1