Diameter Maintenance and Extensions (DIME) J. Korhonen, Ed. Internet-Draft Broadcom Intended status: Standards Track S. Donovan, Ed. Expires: August 8, 2015 B. Campbell Oracle L. Morand Orange Labs February 4, 2015 Diameter Overload Indication Conveyance draft-ietf-dime-ovli-08.txt Abstract This specification defines a base solution for Diameter overload control, referred to as Diameter Overload Indication Conveyance (DOIC). Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on August 8, 2015. Copyright Notice Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of Korhonen, et al. Expires August 8, 2015 [Page 1] Internet-Draft DOIC February 2015 the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology and Abbreviations . . . . . . . . . . . . . . . . 4 3. Conventions Used in This Document . . . . . . . . . . . . . . 5 4. Solution Overview . . . . . . . . . . . . . . . . . . . . . . 5 4.1. Piggybacking . . . . . . . . . . . . . . . . . . . . . . 7 4.2. DOIC Capability Announcement . . . . . . . . . . . . . . 7 4.3. DOIC Overload Condition Reporting . . . . . . . . . . . . 9 4.4. DOIC Extensibility . . . . . . . . . . . . . . . . . . . 11 4.5. Simplified Example Architecture . . . . . . . . . . . . . 11 5. Solution Procedures . . . . . . . . . . . . . . . . . . . . . 12 5.1. Capability Announcement . . . . . . . . . . . . . . . . . 12 5.1.1. Reacting Node Behavior . . . . . . . . . . . . . . . 13 5.1.2. Reporting Node Behavior . . . . . . . . . . . . . . . 13 5.1.3. Agent Behavior . . . . . . . . . . . . . . . . . . . 14 5.2. Overload Report Processing . . . . . . . . . . . . . . . 15 5.2.1. Overload Control State . . . . . . . . . . . . . . . 15 5.2.2. Reacting Node Behavior . . . . . . . . . . . . . . . 19 5.2.3. Reporting Node Behavior . . . . . . . . . . . . . . . 20 5.3. Protocol Extensibility . . . . . . . . . . . . . . . . . 22 6. Loss Algorithm . . . . . . . . . . . . . . . . . . . . . . . 22 6.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 23 6.2. Reporting Node Behavior . . . . . . . . . . . . . . . . . 23 6.3. Reacting Node Behavior . . . . . . . . . . . . . . . . . 24 7. Attribute Value Pairs . . . . . . . . . . . . . . . . . . . . 24 7.1. OC-Supported-Features AVP . . . . . . . . . . . . . . . . 25 7.2. OC-Feature-Vector AVP . . . . . . . . . . . . . . . . . . 25 7.3. OC-OLR AVP . . . . . . . . . . . . . . . . . . . . . . . 25 7.4. OC-Sequence-Number AVP . . . . . . . . . . . . . . . . . 26 7.5. OC-Validity-Duration AVP . . . . . . . . . . . . . . . . 26 7.6. OC-Report-Type AVP . . . . . . . . . . . . . . . . . . . 26 7.7. OC-Reduction-Percentage AVP . . . . . . . . . . . . . . . 27 7.8. Attribute Value Pair flag rules . . . . . . . . . . . . . 27 8. Error Response Codes . . . . . . . . . . . . . . . . . . . . 28 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 28 9.1. AVP codes . . . . . . . . . . . . . . . . . . . . . . . . 28 9.2. New registries . . . . . . . . . . . . . . . . . . . . . 29 10. Security Considerations . . . . . . . . . . . . . . . . . . . 29 10.1. Potential Threat Modes . . . . . . . . . . . . . . . . . 30 10.2. Denial of Service Attacks . . . . . . . . . . . . . . . 31 10.3. Non-Compliant Nodes . . . . . . . . . . . . . . . . . . 31 10.4. End-to End-Security Issues . . . . . . . . . . . . . . . 32 11. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 33 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 33 Korhonen, et al. Expires August 8, 2015 [Page 2] Internet-Draft DOIC February 2015 12.1. Normative References . . . . . . . . . . . . . . . . . . 33 12.2. Informative References . . . . . . . . . . . . . . . . . 33 Appendix A. Issues left for future specifications . . . . . . . 34 A.1. Additional traffic abatement algorithms . . . . . . . . . 34 A.2. Agent Overload . . . . . . . . . . . . . . . . . . . . . 34 A.3. New Error Diagnostic AVP . . . . . . . . . . . . . . . . 34 Appendix B. Deployment Considerations . . . . . . . . . . . . . 34 Appendix C. Requirements Conformance Analysis . . . . . . . . . 35 C.1. Deferred Requirements . . . . . . . . . . . . . . . . . . 35 C.2. Detection of non-supporting Intermediaries . . . . . . . 35 C.3. Implicit Application Indication . . . . . . . . . . . . . 36 C.4. Stateless Operation . . . . . . . . . . . . . . . . . . . 36 C.5. No New Vulnerabilities . . . . . . . . . . . . . . . . . 36 C.6. Detailed Requirements . . . . . . . . . . . . . . . . . . 36 C.6.1. General . . . . . . . . . . . . . . . . . . . . . . . 36 C.6.2. Performance . . . . . . . . . . . . . . . . . . . . . 38 C.6.3. Heterogeneous Support for Solution . . . . . . . . . 40 C.6.4. Granular Control . . . . . . . . . . . . . . . . . . 42 C.6.5. Priority and Policy . . . . . . . . . . . . . . . . . 43 C.6.6. Security . . . . . . . . . . . . . . . . . . . . . . 43 C.6.7. Flexibility and Extensibility . . . . . . . . . . . . 44 Appendix D. Considerations for Applications Integrating the DOIC Solution . . . . . . . . . . . . . . . . . . . . . . 46 D.1. Application Classification . . . . . . . . . . . . . . . 46 D.2. Application Type Overload Implications . . . . . . . . . 47 D.3. Request Transaction Classification . . . . . . . . . . . 48 D.4. Request Type Overload Implications . . . . . . . . . . . 49 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 50 1. Introduction This specification defines a base solution for Diameter overload control, referred to as Diameter Overload Indication Conveyance (DOIC), based on the requirements identified in [RFC7068]. This specification addresses Diameter overload control between Diameter nodes that support the DOIC solution. The solution, which is designed to apply to existing and future Diameter applications, requires no changes to the Diameter base protocol [RFC6733] and is deployable in environments where some Diameter nodes do not implement the Diameter overload control solution defined in this specification. A new application specification can incorporate the overload control mechanism specified in this document by making it mandatory to implement for the application and referencing this specification normatively. It is the responsibility of the Diameter application designers to define how overload control mechanisms works on that application. Korhonen, et al. Expires August 8, 2015 [Page 3] Internet-Draft DOIC February 2015 Note that the overload control solution defined in this specification does not address all the requirements listed in [RFC7068]. A number of overload control related features are left for future specifications. See Appendix A for a list of extensions that are currently being considered. See Appendix C for an analysis of conformance to the requirements specified in [RFC7068]. 2. Terminology and Abbreviations Abatement Reaction to receipt of an overload report resulting in a reduction in traffic sent to the reporting node. Abatement actions include diversion and throttling. Abatement Algorithm An extensible method requested by reporting nodes and used by reacting nodes to reduce the amount of traffic sent during an occurrence of overload control. Diversion An overload abatement treatment where the reacting node selects alternate destinations or paths for requests. Host-Routed Requests Requests that a reacting node knows will be served by a particular host, either due to the presence of a Destination-Host AVP, or by some other local knowledge on the part of the reacting node. Overload Control State (OCS) Internal state maintained by a reporting or reacting node describing occurrences of overload control. Overload Report (OLR) Overload control information for a particular overload occurrence sent by a reporting node. Reacting Node A Diameter node that acts upon an overload report. Realm-Routed Requests Korhonen, et al. Expires August 8, 2015 [Page 4] Internet-Draft DOIC February 2015 Requests that a reacting node does not know which host will service the request. Reporting Node A Diameter node that generates an overload report. (This may or may not be the overloaded node.) Throttling An abatement treatment that limits the number of requests sent by the DIOC reacting node. Throttling can include a Diameter Client choosing to not send requests, or a Diameter Agent or Server rejecting requests with appropriate error responses. In both cases the result of the throttling is a permanent rejection of the transaction. 3. Conventions Used in This Document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. RFC 2119 [RFC2119] interpretation does not apply for the above listed words when they are not used in all-caps format. 4. Solution Overview The Diameter Overload Information Conveyance (DOIC) solution allows Diameter nodes to request other Diameter nodes to perform overload abatement actions, that is, actions to reduce the load offered to the overloaded node or realm. A Diameter node that supports DOIC is known as a "DOIC node". Any Diameter node can act as a DOIC node, including Diameter Clients, Diameter Servers, and Diameter Agents. DOIC nodes are further divided into "Reporting Nodes" and "Reacting Nodes." A reporting node requests overload abatement by sending Overload Reports (OLR). A reacting node acts upon OLRs, and performs whatever actions are needed to fulfill the abatement requests included in the OLRs. A Reporting node may report overload on its own behalf, or on behalf of other nodes. Likewise, a reacting node may perform overload abatement on its own behalf, or on behalf of other nodes. Korhonen, et al. Expires August 8, 2015 [Page 5] Internet-Draft DOIC February 2015 A Diameter node's role as a DOIC node is independent of its Diameter role. For example, Diameter Agents may act as DOIC nodes, even though they are not endpoints in the Diameter sense. Since Diameter enables bi-directional applications, where Diameter Servers can send requests towards Diameter Clients, a given Diameter node can simultaneously act as both a reporting node and a reacting node. Likewise, a Diameter Agent may act as a reacting node from the perspective of upstream nodes, and a reporting node from the perspective of downstream nodes. DOIC nodes do not generate new messages to carry DOIC related information. Rather, they "piggyback" DOIC information over existing Diameter messages by inserting new AVPs into existing Diameter requests and responses. Nodes indicate support for DOIC, and any needed DOIC parameters, by inserting an OC-Supported-Features AVP (Section 7.2) into existing requests and responses. Reporting nodes send OLRs by inserting OC-OLR AVPs (Section 7.3). A given OLR applies to the Diameter realm and application of the Diameter message that carries it. If a reporting node supports more than one realm and/or application, it reports independently for each combination of realm and application. Similarly, the OC-Supported- Features AVP applies to the realm and application of the enclosing message. This implies that a node may support DOIC for one application and/or realm, but not another, and may indicate different DOIC parameters for each application and realm for which it supports DOIC. Reacting nodes perform overload abatement according to an agreed-upon abatement algorithm. An abatement algorithm defines the meaning of some of the parameters of an OLR and the procedures required for overload abatement. An overload abatement algorithm separates Diameter requests into two sets. The first set contains the requests that are to undergo overload abatement treatment of either throttling or diversion. The second set contains the requests that are to be given normal routing treatment. This document specifies a single must-support algorithm, namely the "loss" algorithm (Section 6). Future specifications may introduce new algorithms. Overload conditions may vary in scope. For example, a single Diameter node may be overloaded, in which case reacting nodes may attempt to send requests to other destinations. On the other hand, an entire Diameter realm may be overloaded, in which case such attempts would do harm. DOIC OLRs have a concept of "report type" (Section 7.6), where the type defines such behaviors. Report types are extensible. This document defines report types for overload of a specific host, and for overload of an entire realm. Korhonen, et al. Expires August 8, 2015 [Page 6] Internet-Draft DOIC February 2015 DOIC works through non supporting Diameter Agents that properly pass unknown AVPs unchanged. 4.1. Piggybacking There is no new Diameter application defined to carry overload related AVPs. The overload control AVPs defined in this specification have been designed to be piggybacked on top of existing application messages. This is made possible by adding the optional overload control AVPs OC-OLR and OC-Supported-Features into existing commands. Reacting nodes indicate support for DOIC by including the OC- Supported-Features AVP in all request messages originated or relayed by the reacting node. Reporting nodes indicate support for DOIC by including the OC- Supported-Features AVP in all answer messages originated or relayed by the reporting node that are in response to a request that contained the OC-Supported-Features AVP. Reporting nodes may include overload reports using the OC-OLR AVP in answer messages. Note that the overload control solution does not have fixed server and client roles. The DOIC node role is determined based on the message type: whether the message is a request (i.e. sent by a "reacting node") or an answer (i.e. sent by a "reporting node"). Therefore, in a typical "client-server" deployment, the Diameter Client may report its overload condition to the Diameter Server for any Diameter Server initiated message exchange. An example of such is the Diameter Server requesting a re-authentication from a Diameter Client. 4.2. DOIC Capability Announcement The DOIC solution supports the ability for Diameter nodes to determine if other nodes in the path of a request support the solution. This capability is referred to as DOIC Capability Announcement (DCA) and is separate from Diameter Capability Exchange. The DCA mechanism uses the OC-Supported-Features AVPs to indicate the Diameter overload features supported. The first node in the path of a Diameter request that supports the DOIC solution inserts the OC-Supported-Features AVP in the request message. The individual features supported by the DOIC nodes are indicated in the OC-Feature-Vector AVP. Any semantics associated with the Korhonen, et al. Expires August 8, 2015 [Page 7] Internet-Draft DOIC February 2015 features will be defined in extension specifications that introduce the features. Note: As discussed elsewhere in the document, agents in the path of the request can modify the OC-Supported-Features AVP. Note: The DOIC solution must support deployments where Diameter Clients and/or Diameter Servers do not support the DOIC solution. In this scenario, Diameter Agents that support the DOIC solution may handle overload abatement for the non supporting Diameter nodes. In this case the DOIC agent will insert the OC-Supported- Features AVP in requests that do not already contain one, telling the reporting node that there is a DOIC node that will handle overload abatement. For transactions where there was an OC- Supporting-Features AVP in the request, the agent will insert the OC-Supported-Features AVP in answers, telling the reacting node that there is a reporting node. The OC-Feature-Vector AVP will always contain an indication of support for the loss overload abatement algorithm defined in this specification (see Section 6). This ensures that a reporting node always supports at least one of the advertized abatement algorithms received in a request messages. The reporting node inserts the OC-Supported-Features AVP in all answer messages to requests that contained the OC-Supported-Features AVP. The contents of the reporting node's OC-Supported-Features AVP indicate the set of Diameter overload features supported by the reporting node. This specification defines one exception - the reporting node only includes an indication of support for one overload abatement algorithm, independent of the number of overload abatement algorithms actually supported by the reacting node. The overload abatement algorithm indicated is the algorithm that the reporting node intends to use should it enter an overload condition. Reacting nodes can use the indicated overload abatement algorithm to prepare for possible overload reports and must use the indicated overload abatement algorithm if traffic reduction is actually requested. Note that the loss algorithm defined in this document is a stateless abatement algorithm. As a result it does not require any actions by reacting nodes prior to the receipt of an overload report. Stateful abatement algorithms that base the abatement logic on a history of request messages sent might require reacting nodes to maintain state in advance of receiving an overload report to ensure that the overload reports can be properly handled. Korhonen, et al. Expires August 8, 2015 [Page 8] Internet-Draft DOIC February 2015 The DCA mechanism must also allow the scenario where the set of features supported by the sender of a request and by agents in the path of a request differ. In this case, the agent can update the OC- Supported-Features AVP to reflect the mixture of the two sets of supported features. Note: The logic to determine if the content of the OC-Supported- Features AVP should be changed is out-of-scope for this document, as is the logic to determine the content of a modified OC- Supported-Features AVP. These are left to implementation decisions. Care must be taken not to introduce interoperability issues for downstream or upstream DOIC nodes. 4.3. DOIC Overload Condition Reporting As with DOIC capability announcement, overload condition reporting uses new AVPs (Section 7.3) to indicate an overload condition. The OC-OLR AVP is referred to as an overload report. The OC-OLR AVP includes the type of report, a sequence number, the length of time that the report is valid and abatement algorithm specific AVPs. Two types of overload reports are defined in this document: host reports and realm reports. A report of type "HOST_REPORT" is sent to indicate the overload of a specific host, identified by the Origin-Host AVP of the message containing the OLR, for the application-id indicated in the transaction. When receiving an OLR of type "HOST_REPORT", a reacting node applies overload abatement treatment to the host-routed requests identified by the overload abatement algorithm (see definition in Section 2) sent for this application to the overloaded host. A report of type "REALM_REPORT" is sent to indicate the overload of a realm for the application-id indicated in the transaction. The overloaded realm is identified by the Destination-Realm AVP of the message containing the OLR. When receiving an OLR of type "REALM_REPORT", a reacting node applies overload abatement treatment to realm-routed requests identified by the overload abatement algorithm (see definition in Section 2) sent for this application to the overloaded realm. This document assumes that there is a single source for realm-reports for a given realm, or that if multiple nodes can send realm reports, that each such node has full knowledge of the overload state of the entire realm. A reacting node cannot distinguish between receiving realm-reports from a single node, or from multiple nodes. Korhonen, et al. Expires August 8, 2015 [Page 9] Internet-Draft DOIC February 2015 Note: Known issues exist if multiple sources for overload reports which apply to the same Diameter entity exist. Reacting nodes have no way of determining the source and, as such, will treat them as coming from a single source. Variance in sequence numbers between the two sources can then cause incorrect overload abatement treatment to be applied for indeterminate periods of time. Reporting nodes are responsible for determining the need for a reduction of traffic. The method for making this determination is implementation specific and depends on the type of overload report being generated. A host-report might be generated by tracking use of resources required by the host to handle transactions for the Diameter application. A realm-report generally impacts the traffic sent to multiple hosts and, as such, requires tracking the capacity of all servers able to handle realm-routed requests for the application and realm. Once a reporting node determines the need for a reduction in traffic, it uses the DOIC defined AVPs to report on the condition. These AVPs are included in answer messages sent or relayed by the reporting node. The reporting node indicates the overload abatement algorithm that is to be used to handle the traffic reduction in the OC- Supported-Features AVP. The OC-OLR AVP is used to communicate information about the requested reduction. Reacting nodes, upon receipt of an overload report, apply the overload abatement algorithm to traffic impacted by the overload report. The method used to determine the requests that are to receive overload abatement treatment is dependent on the abatement algorithm. The loss abatement algorithm is defined in this document (Section 6). Other abatement algorithms can be defined in extensions to the DOIC solution. Two types of overload abatement treatment are defined, diversion and throttling. Reacting nodes are responsible for determining which treatment is appropriate for individual requests. As the conditions that lead to the generation of the overload report change the reporting node can send new overload reports requesting greater reduction if the condition gets worse or less reduction if the condition improves. The reporting node sends an overload report with a duration of zero to indicate that the overload condition has ended and abatement is no longer needed. The reacting node also determines when the overload report expires based on the OC-Validity-Duration AVP in the overload report and stops applying the abatement algorithm when the report expires. Korhonen, et al. Expires August 8, 2015 [Page 10] Internet-Draft DOIC February 2015 4.4. DOIC Extensibility The DOIC solution is designed to be extensible. This extensibility is based on existing Diameter based extensibility mechanisms, along with the DOIC capability announcement mechanism. There are multiple categories of extensions that are expected. This includes the definition of new overload abatement algorithms, the definition of new report types and the definition of new scopes of messages impacted by an overload report. A DOIC node communicates supported features by including them in the OC-Feature-Vector AVP, as a sub-AVP of OC-Supported-Features. Any non-backwards compatible DOIC extensions define new values for the OC-Feature-Vector AVP. DOIC extensions also have the ability to add new AVPs to the OC-Supported-Features AVP, if additional information about the new feature is required. Overload reports can also be extended by adding new sub-AVPs to the OC-OLR AVP, allowing reporting nodes to communicate additional information about handling an overload condition. If necessary, new extensions can also define new AVPs that are not part of the OC-Supported-Features and OC-OLR group AVPs. It is, however, recommended that DOIC extensions use the OC-Supported- Features AVP and OC-OLR AVP to carry all DOIC related AVPs. 4.5. Simplified Example Architecture Figure 1 illustrates the simplified architecture for Diameter overload information conveyance. Korhonen, et al. Expires August 8, 2015 [Page 11] Internet-Draft DOIC February 2015 Realm X Same or other Realms <--------------------------------------> <----------------------> +--------+ : (optional) : |Diameter| : : |Server A|--+ .--. : +--------+ : .--. +--------+ | _( `. : |Diameter| : _( `. +--------+ +--( )--:-| Agent |-:--( )--|Diameter| +--------+ | ( ` . ) ) : +--------+ : ( ` . ) ) | Client | |Diameter|--+ `--(___.-' : : `--(___.-' +--------+ |Server B| : : +--------+ : : End-to-end Overload Indication 1) <-----------------------------------------------> Diameter Application Y Overload Indication A Overload Indication A' 2) <----------------------> <----------------------> Diameter Application Y Diameter Application Y Figure 1: Simplified architecture choices for overload indication delivery In Figure 1, the Diameter overload indication can be conveyed (1) end-to-end between servers and clients or (2) between servers and Diameter agent inside the realm and then between the Diameter agent and the clients. 5. Solution Procedures This section outlines the normative behavior for the DOIC solution. 5.1. Capability Announcement This section defines DOIC Capability Announcement (DCA) behavior. Note: This specification assumes that changes in DOIC node capabilities are relatively rare events that occur as a result of administrative action. Reacting nodes ought to minimize changes that force the reporting node to change the features being used, especially during active overload conditions. But even if reacting nodes avoid such changes, reporting nodes still have to be prepared for them to occur. For example, differing capabilities between multiple reacting nodes may still force a Korhonen, et al. Expires August 8, 2015 [Page 12] Internet-Draft DOIC February 2015 reporting node to select different features on a per-transaction basis. 5.1.1. Reacting Node Behavior A reacting node MUST include the OC-Supported-Features AVP in all requests. It MAY include the OC-Feature-Vector AVP, as a sub-avp of OC-Supported-Features. If it does so, it MUST indicate support for the "loss" algorithm. If the reacting node is configured to support features (including other algorithms) in addition to the loss algorithm, it MUST indicate such support in an OC-Feature-Vector AVP. An OC-Supported-Features AVP in answer messages indicates there is a reporting node for the transaction. The reacting node MAY take action, for example creating state for some stateful abatement algorithm, based on the features indicated in the OC-Feature-Vector AVP. Note: The loss abatement algorithm does not require stateful behavior when there is no active overload report. Reacting nodes need to be prepared for the reporting node to change selected algorithms. This can happen at any time, including when the reporting node has sent an active overload report. The reacting node can minimize the potential for changes by modifying the advertised abatement algorithms sent to an overloaded reporting node to the currently selected algorithm and loss (or just loss if it is the currently selected algorithm). This has the effect of limiting the potential change in abatement algorithm from the currently selected algorithm to loss, avoiding changes to more complex abatement algorithms that require state to operate properly. 5.1.2. Reporting Node Behavior Upon receipt of a request message, a reporting node determines if there is a reacting node for the transaction based on the presence of the OC-Supported-Features AVP in the request message. If the request message contains an OC-Supported-Features AVP then a reporting node MUST include the OC-Supported-Features AVP in the answer message for that transaction. Note: Capability announcement is done on a per transaction basis. The reporting node cannot assume that the capabilities announced by a reacting node will be the same between transactions. A reporting node MUST NOT include the OC-Supported-Features AVP, OC- OLR AVP or any other overload control AVPs defined in extension Korhonen, et al. Expires August 8, 2015 [Page 13] Internet-Draft DOIC February 2015 drafts in response messages for transactions where the request message does not include the OC-Supported-Features AVP. Lack of the OC-Supported-Features AVP in the request message indicates that there is no reacting node for the transaction. A reporting node knows what overload control functionality is supported by the reacting node based on the content or absence of the OC-Feature-Vector AVP within the OC-Supported-Features AVP in the request message. A reporting node MUST indicate support for one and only one abatement algorithm in the OC-Feature-Vector AVP. The abatement algorithm selected MUST indicate the abatement algorithm the reporting node wants the reacting node to use when the reporting node enters an overload condition. The abatement algorithm selected MUST be from the set of abatement algorithms contained in the request message's OC-Feature-Vector AVP. A reporting node that selects the loss algorithm may do so by including the OC-Feature-Vector AVP with an explicit indication of the loss algorithm, or it MAY omit OC-Feature-Vector. If it selects a different algorithm, it MUST include the OC-Feature-Vector AVP with an explicit indication of the selected algorithm. The reporting node SHOULD indicate support for other DOIC features defined in extension drafts that it supports and that apply to the transaction. It does so using the OC-Feature-Vector AVP. Note: Not all DOIC features will apply to all Diameter applications or deployment scenarios. The features included in the OC-Feature-Vector AVP are based on local reporting node policy. 5.1.3. Agent Behavior Diameter Agents that support DOIC can ensure that all messages relayed by the agent contain the OC-Supported-Features AVP. A Diameter Agent MAY take on reacting node behavior for Diameter endpoints that do not support the DOIC solution. A Diameter Agent detects that a Diameter endpoint does not support DOIC reacting node behavior when there is no OC-Supported-Features AVP in a request message. For a Diameter Agent to be a reacting node for a non supporting Diameter endpoint, the Diameter Agent MUST include the OC-Supported- Korhonen, et al. Expires August 8, 2015 [Page 14] Internet-Draft DOIC February 2015 Features AVP in request messages it relays that do not contain the OC-Supported-Features AVP. A Diameter Agent MAY take on reporting node behavior for Diameter endpoints that do not support the DOIC solution. The Diameter Agent MUST have visibility to all traffic destined for the non supporting host in order to become the reporting node for the Diameter endpoint. A Diameter Agent detects that a Diameter endpoint does not support DOIC reporting node behavior when there is no OC-Supported-Features AVP in an answer message for a transaction that contained the OC- Supported-Features AVP in the request message. If a request already has the OC-Supported-Features AVP, a Diameter agent MAY modify it to reflect the features appropriate for the transaction. Otherwise, the agent relays the OC-Supported-Features AVP without change. For instance, if the agent supports a superset of the features reported by the reacting node then the agent might choose, based on local policy, to advertise that superset of features to the reporting node. If the Diameter Agent changes the OC-Supported-Features AVP in a request message then it is likely it will also need to modify the OC- Supported-Features AVP in the answer message for the transaction. A Diameter Agent MAY modify the OC-Supported-Features AVP carried in answer messages. When making changes to the OC-Supported-Features or OC-OLR AVPs, the Diameter Agent needs to ensure consistency in its behavior with both upstream and downstream DOIC nodes. 5.2. Overload Report Processing 5.2.1. Overload Control State Both reacting and reporting nodes maintain Overload Control State (OCS) for active overload conditions. The following sections define behavior associated with that OCS. The contents of the OCS in the reporting node and in the reacting node represent logical constructs. The actual internal physical structure of the state included in the OCS is an implementation decision. Korhonen, et al. Expires August 8, 2015 [Page 15] Internet-Draft DOIC February 2015 5.2.1.1. Overload Control State for Reacting Nodes A reacting node maintains the following OCS per supported Diameter application: o A host-type OCS entry for each Destination-Host to which it sends host-type requests and o A realm-type OCS entry for each Destination-Realm to which it sends realm-type requests. A host-type OCS entry is identified by the pair of application-id and the node's DiameterIdentity. A realm-type OCS entry is identified by the pair of application-id and realm. The host-type and realm-type OCS entries include the following information (the actual information stored is an implementation decision): o Sequence number (as received in OC-OLR) o Time of expiry (derived from OC-Validity-Duration AVP received in the OC-OLR AVP and time of reception of the message carrying OC- OLR AVP) o Selected Abatement Algorithm (as received in the OC-Supported- Features AVP) o Abatement Algorithm specific input data (as received in the OC-OLR AVP, for example, OC-Reduction-Percentage for the Loss abatement algorithm) 5.2.1.2. Overload Control State for Reporting Nodes A reporting node maintains OCS entries per supported Diameter application, per supported (and eventually selected) Abatement Algorithm and per report-type. An OCS entry is identified by the tuple of Application-Id, Report- Type and Abatement Algorithm and includes the following information (the actual information stored is an implementation decision): o Sequence number o Validity Duration Korhonen, et al. Expires August 8, 2015 [Page 16] Internet-Draft DOIC February 2015 o Expiration Time o Algorithm specific input data (for example, the Reduction Percentage for the Loss Abatement Algorithm) 5.2.1.3. Reacting Node Maintenance of Overload Control State When a reacting node receives an OC-OLR AVP, it MUST determine if it is for an existing or new overload condition. Note: For the remainder of this section the term OLR refers to the combination of the contents of the received OC-OLR AVP and the abatement algorithm indicated in the received OC-Supported- Features AVP. When receiving an answer message with multiple OLRs of different supported report types, a reacting node MUST process each received OLR. The OLR is for an existing overload condition if a reacting node has an OCS that matches the received OLR. For a host-report this means it matches the application-id and the host's DiameterIdentity in an existing host OCS entry. For a realm-report this means it matches the application-id and the realm in an existing realm OCS entry. If the OLR is for an existing overload condition then a reacting node MUST determine if the OLR is a retransmission or an update to the existing OLR. If the sequence number for the received OLR is greater than the sequence number stored in the matching OCS entry then a reacting node MUST update the matching OCS entry. If the sequence number for the received OLR is less than or equal to the sequence number in the matching OCS entry then a reacting node MUST silently ignore the received OLR. The matching OCS MUST NOT be updated in this case. If the received OLR is for a new overload condition then a reacting node MUST generate a new OCS entry for the overload condition. For a host-report this means a reacting node creates on OCS entry with the application-id in the received message and DiameterIdentity of the Origin-Host in the received message. Korhonen, et al. Expires August 8, 2015 [Page 17] Internet-Draft DOIC February 2015 Note: This solution assumes that the Origin-Host AVP in the answer message included by the reporting node is not changed along the path to the reacting node. For a realm-report this means a reacting node creates on OCS entry with the application-id in the received message and realm of the Origin-Realm in the received message. If the received OLR contains a validity duration of zero ("0") then a reacting node MUST update the OCS entry as being expired. Note: It is not necessarily appropriate to delete the OCS entry, as there is recommended behavior that the reacting node slowly returns to full traffic when ending an overload abatement period. The reacting node does not delete an OCS when receiving an answer message that does not contain an OC-OLR AVP (i.e. absence of OLR means "no change"). 5.2.1.4. Reporting Node Maintenance of Overload Control State A reporting node SHOULD create a new OCS entry when entering an overload condition. Note: If a reporting node knows through absence of the OC- Supported-Features AVP in received messages that there are no reacting nodes supporting DOIC then the reporting node can choose to not create OCS entries. When generating a new OCS entry the sequence number SHOULD be set to zero ("0"). When generating sequence numbers for new overload conditions, the new sequence number MUST be greater than any sequence number in an active (unexpired) overload report for the same application and report-type previously sent by the reporting node. This property MUST hold over a reboot of the reporting node. Note: One way of addressing this over a reboot of a reporting node is to use a time stamp for the first overload condition that occurs after the report and to start using sequences beginning with zero for subsequent overload conditions. A reporting node MUST update an OCS entry when it needs to adjust the validity duration of the overload condition at reacting nodes. For instance, if a reporting node wishes to instruct reacting nodes to continue overload abatement for a longer period of time Korhonen, et al. Expires August 8, 2015 [Page 18] Internet-Draft DOIC February 2015 than originally communicated. This also applies if the reporting node wishes to shorten the period of time that overload abatement is to continue. A reporting node MUST update an OCS entry when it wishes to adjust any abatement algorithm specific parameters, including, for example, the reduction percentage used for the Loss abatement algorithm. For instance, if a reporting node wishes to change the reduction percentage either higher, if the overload condition has worsened, or lower, if the overload condition has improved, then the reporting node would update the appropriate OCS entry. A reporting node MUST increment the sequence number associated with the OCS entry anytime the contents of the OCS entry are changed. This will result in a new sequence number being sent to reacting nodes, instructing reacting nodes to process the OC-OLR AVP. A reporting node SHOULD update an OCS entry with a validity duration of zero ("0") when the overload condition ends. Note: If a reporting node knows that the OCS entries in the reacting nodes are near expiration then the reporting node might decide not to send an OLR with a validity duration of zero. A reporting node MUST keep an OCS entry with a validity duration of zero ("0") for a period of time long enough to ensure that any non- expired reacting node's OCS entry created as a result of the overload condition in the reporting node is deleted. 5.2.2. Reacting Node Behavior When a reacting node sends a request it MUST determine if that request matches an active OCS. If the request matches an active OCS then the reacting node MUST use the overload abatement algorithm indicated in the OCS to determine if the request is to receive overload abatement treatment. For the Loss abatement algorithm defined in this specification, see Section 6 for the overload abatement algorithm logic applied. If the overload abatement algorithm selects the request for overload abatement treatment then the reacting node MUST apply overload abatement treatment on the request. The abatement treatment applied depends on the context of the request. Korhonen, et al. Expires August 8, 2015 [Page 19] Internet-Draft DOIC February 2015 If diversion abatement treatment is possible (i.e. a different path for the request can be selected where the overloaded node is not part of the different path), then the reacting node SHOULD apply diversion abatement treatment to the request. The reacting node MUST apply throttling abatement treatment to requests identified for abatement treatment when diversion treatment is not possible or was not applied. Note: This only addresses the case where there are two defined abatement treatments, diversion and throttling. Any extension that defines a new abatement treatment must also defined the interaction of the new abatement treatment with existing treatments. If the overload abatement treatment results in throttling of the request and if the reacting node is an agent then the agent MUST send an appropriate error as defined in Section 8. Diameter endpoints that throttle requests need to do so according to the rules of the client application. Those rules will vary by application, and are beyond the scope of this document. In the case that the OCS entry indicated no traffic was to be sent to the overloaded entity and the validity duration expires then overload abatement associated with the overload report MUST be ended in a controlled fashion. 5.2.3. Reporting Node Behavior If there is an active OCS entry then a reporting node SHOULD include the OC-OLR AVP in all answers to requests that contain the OC- Supported-Features AVP and that match the active OCS entry. Note: A request matches if the application-id in the request matches the application-id in any active OCS entry and if the report-type in the OCS entry matches a report-type supported by the reporting node as indicated in the OC-Supported-Features AVP. The contents of the OC-OLR AVP depend on the selected algorithm. A reporting node MAY choose to not resend an overload report to a reacting node if it can guarantee that this overload report is already active in the reacting node. Note: In some cases (e.g. when there are one or more agents in the path between reporting and reacting nodes, or when overload reports are discarded by reacting nodes) a reporting node may not Korhonen, et al. Expires August 8, 2015 [Page 20] Internet-Draft DOIC February 2015 be able to guarantee that the reacting node has received the report. A reporting node MUST NOT send overload reports of a type that has not been advertised as supported by the reacting node. Note: A reacting node implicitly advertises support for the host and realm report types by including the OC-Supported-Features AVP in the request. Support for other report types will be explicitly indicated by new feature bits in the OC-Feature-Vector AVP. A reporting node SHOULD explicitly indicate the end of an overload occurrence by sending a new OLR with OC-Validity-Duration set to a value of zero ("0"). The reporting node SHOULD ensure that all reacting nodes receive the updated overload report. A reporting node MAY rely on the OC-Validity-Duration AVP values for the implicit overload control state cleanup on the reacting node. Note: All OLRs sent have an expiration time calculated by adding the validity-duration contained in the OLR to the time the message was sent. Transit time for the OLR can be safely ignored. The reporting node can ensure that all reacting nodes have received the OLR by continuing to send it in answer messages until the expiration time for all OLRs sent for that overload condition have expired. When a reporting node sends an OLR, it effectively delegates any necessary throttling to downstream nodes. If the reporting node also locally throttles the same set of messages, the overall number of throttled requests may be higher than intended. Therefore, before applying local message throttling, a reporting node needs to check if these messages match existing OCS entries, indicating that these messages have survived throttling applied by downstream nodes that have received the related OLR. However, even if the set of messages match existing OCS entries, the reporting node can still apply other abatement methods such as diversion. The reporting node might also need to throttle requests for reasons other than overload. For example, an agent or server might have a configured rate limit for each client, and throttle requests that exceed that limit, even if such requests had already been candidates for throttling by downstream nodes. The reporting node also has the option to send new OLRs requesting greater reductions in traffic, reducing the need for local throttling. A reporting node SHOULD decrease requested overload abatement treatment in a controlled fashion to avoid oscillations in traffic. Korhonen, et al. Expires August 8, 2015 [Page 21] Internet-Draft DOIC February 2015 For example, it might wait some period of time after overload ends before terminating the OLR, or it might send a series of OLRs indicating progressively less overload severity. 5.3. Protocol Extensibility The DOIC solution can be extended. Types of potential extensions include new traffic abatement algorithms, new report types or other new functionality. When defining a new extension that requires new normative behavior, the specification MUST define a new feature for the OC-Feature- Vector. This feature bit is used to communicate support for the new feature. The extension MAY define new AVPs for use in DOIC Capability Announcement and for use in DOIC Overload reporting. These new AVPs SHOULD be defined to be extensions to the OC-Supported-Features or OC-OLR AVPs defined in this document. [RFC6733] defined Grouped AVP extension mechanisms apply. This allows, for example, defining a new feature that is mandatory to be understood even when piggybacked on an existing application. When defining new report type values, the corresponding specification MUST define the semantics of the new report types and how they affect the OC-OLR AVP handling. The OC-Supported-Feature and OC-OLR AVPs can be expanded with optional sub-AVPs only if a legacy DOIC implementation can safely ignore them without breaking backward compatibility for the given OC- Report-Type AVP value. Any new sub-AVPs MUST NOT require that the M-bit be set. Documents that introduce new report types MUST describe any limitations on their use across non-supporting agents. As with any Diameter specification, RFC6733 requires all new AVPs to be registered with IANA. See Section 9 for the required procedures. New features (feature bits in the OC-Feature-Vector AVP) and report types (in the OC-Report-Type AVP) MUST be registered with IANA. 6. Loss Algorithm This section documents the Diameter overload loss abatement algorithm. Korhonen, et al. Expires August 8, 2015 [Page 22] Internet-Draft DOIC February 2015 6.1. Overview The DOIC specification supports the ability for multiple overload abatement algorithms to be specified. The abatement algorithm used for any instance of overload is determined by the Diameter Overload Capability Announcement process documented in Section 5.1. The loss algorithm described in this section is the default algorithm that must be supported by all Diameter nodes that support DOIC. The loss algorithm is designed to be a straightforward and stateless overload abatement algorithm. It is used by reporting nodes to request a percentage reduction in the amount of traffic sent. The traffic impacted by the requested reduction depends on the type of overload report. Reporting nodes request the stateless reduction of the number of requests by an indicated percentage. This percentage reduction is in comparison to the number of messages the node otherwise would send, regardless of how many requests the node might have sent in the past. From a conceptual level, the logic at the reacting node could be outlined as follows. 1. An overload report is received and the associated OCS is either saved or updated (if required) by the reacting node. 2. A new Diameter request is generated by the application running on the reacting node. 3. The reacting node determines that an active overload report applies to the request, as indicated by the corresponding OCS entry. 4. The reacting node determines if overload abatement treatment should be applied to the request. One approach that could be taken for each request is to select a random number between 1 and 100. If the random number is less than or equal to the indicated reduction percentage then the request is given abatement treatment, otherwise the request is given normal routing treatment. 6.2. Reporting Node Behavior The method a reporting node uses to determine the amount of traffic reduction required to address an overload condition is an implementation decision. Korhonen, et al. Expires August 8, 2015 [Page 23] Internet-Draft DOIC February 2015 When a reporting node that has selected the loss abatement algorithm determines the need to request a reduction in traffic, it includes an OC-OLR AVP in answer messages as described in Section 5.2.3. When sending the OC-OLR AVP, the reporting node MUST indicate a percentage reduction in the OC-Reduction-Percentage AVP. The reporting node MAY change the reduction percentage in subsequent overload reports. When doing so the reporting node must conform to overload report handing specified in Section 5.2.3. 6.3. Reacting Node Behavior The method a reacting node uses to determine which request messages are given abatement treatment is an implementation decision. When receiving an OC-OLR in an answer message where the algorithm indicated in the OC-Supported-Features AVP is the loss algorithm, the reacting node MUST apply abatement treatment to the requested percentage of request messages sent. Note: The loss algorithm is a stateless algorithm. As a result, the reacting node does not guarantee that there will be an absolute reduction in traffic sent. Rather, it guarantees that the requested percentage of new requests will be given abatement treatment. If reacting node comes out of the 100 percent traffic reduction as a result of the overload report timing out, the reacting node sending the traffic SHOULD be conservative and, for example, first send "probe" messages to learn the overload condition of the overloaded node before converging to any traffic amount/rate decided by the sender. Similar concerns apply in all cases when the overload report times out unless the previous overload report stated 0 percent reduction. The goal of this behavior is to reduce the probability of overload condition thrashing where an immediate transition from 100% reduction to 0% reduction results in the reporting node moving quickly back into an overload condition. 7. Attribute Value Pairs This section describes the encoding and semantics of the Diameter Overload Indication Attribute Value Pairs (AVPs) defined in this document. Korhonen, et al. Expires August 8, 2015 [Page 24] Internet-Draft DOIC February 2015 7.1. OC-Supported-Features AVP The OC-Supported-Features AVP (AVP code TBD1) is of type Grouped and serves two purposes. First, it announces a node's support for the DOIC solution in general. Second, it contains the description of the supported DOIC features of the sending node. The OC-Supported- Features AVP MUST be included in every Diameter request message a DOIC supporting node sends. OC-Supported-Features ::= < AVP Header: TBD1 > [ OC-Feature-Vector ] * [ AVP ] 7.2. OC-Feature-Vector AVP The OC-Feature-Vector AVP (AVP code TBD2) is of type Unsigned64 and contains a 64 bit flags field of announced capabilities of a DOIC node. The value of zero (0) is reserved. The OC-Feature-Vector sub-AVP is used to announce the DOIC features supported by the DOIC node, in the form of a flag-bits field in which each bit announces one feature or capability supported by the node. The absence of the OC-Feature-Vector AVP in request messages indicates that only the default traffic abatement algorithm described in this specification is supported. The absence of the OC- Feature- Vector AVP in answer messages indicates that the default traffic abatement algorithm described in this specification is selected (while other traffic abatement algorithms may be supported), and no features other than abatement algorithms are supported. The following capabilities are defined in this document: OLR_DEFAULT_ALGO (0x0000000000000001) When this flag is set by the a DOIC reacting node it means that the default traffic abatement (loss) algorithm is supported. When this flag is set by a DOIC reporting node it means that the loss algorithm will be used for requested overload abatement. 7.3. OC-OLR AVP The OC-OLR AVP (AVP code TBD3) is of type Grouped and contains the information necessary to convey an overload report on an overload condition at the reporting node. The application the OC-OLR AVP applies to is the same as the Application-Id found in the Diameter message header. The host or realm the OC-OLR AVP concerns is determined from the Origin-Host AVP and/or Origin-Realm AVP found in Korhonen, et al. Expires August 8, 2015 [Page 25] Internet-Draft DOIC February 2015 the encapsulating Diameter command. The OC-OLR AVP is intended to be sent only by a reporting node. OC-OLR ::= < AVP Header: TBD2 > < OC-Sequence-Number > < OC-Report-Type > [ OC-Reduction-Percentage ] [ OC-Validity-Duration ] * [ AVP ] 7.4. OC-Sequence-Number AVP The OC-Sequence-Number AVP (AVP code TBD4) is of type Unsigned64. Its usage in the context of overload control is described in Section 5.2. From the functionality point of view, the OC-Sequence-Number AVP is used as a non-volatile increasing counter for a sequence of overload reports between two DOIC nodes for the same overload occurrence. Sequence numbers are treated in a uni-directional manner, i.e. two sequence numbers on each direction between two DOIC nodes are not related or correlated. 7.5. OC-Validity-Duration AVP The OC-Validity-Duration AVP (AVP code TBD5) is of type Unsigned32 and indicates in seconds the validity time of the overload report. The number of seconds is measured after reception of the first OC-OLR AVP with a given value of OC-Sequence-Number AVP. The default value for the OC-Validity-Duration AVP is 30 seconds. When the OC- Validity-Duration AVP is not present in the OC-OLR AVP, the default value applies. The maximum value for the OC-Validity-Duration AVP is 86,400 seconds (24 hours). 7.6. OC-Report-Type AVP The OC-Report-Type AVP (AVP code TBD6) is of type Enumerated. The value of the AVP describes what the overload report concerns. The following values are initially defined: HOST_REPORT 0 The overload report is for a host. Overload abatement treatment applies to host-routed requests. REALM_REPORT 1 The overload report is for a realm. Overload abatement treatment applies to realm-routed requests. Korhonen, et al. Expires August 8, 2015 [Page 26] Internet-Draft DOIC February 2015 7.7. OC-Reduction-Percentage AVP The OC-Reduction-Percentage AVP (AVP code TBD7) is of type Unsigned32 and describes the percentage of the traffic that the sender is requested to reduce, compared to what it otherwise would send. The OC-Reduction-Percentage AVP applies to the default (loss) algorithm specified in this specification. However, the AVP can be reused for future abatement algorithms, if its semantics fit into the new algorithm. The value of the Reduction-Percentage AVP is between zero (0) and one hundred (100). Values greater than 100 are ignored. The value of 100 means that all traffic is to be throttled, i.e. the reporting node is under a severe load and ceases to process any new messages. The value of 0 means that the reporting node is in a stable state and has no need for the reacting node to apply any traffic abatement. 7.8. Attribute Value Pair flag rules +---------+ |AVP flag | |rules | +----+----+ AVP Section | |MUST| Attribute Name Code Defined Value Type |MUST| NOT| +--------------------------------------------------+----+----+ |OC-Supported-Features TBD1 7.1 Grouped | | V | +--------------------------------------------------+----+----+ |OC-Feature-Vector TBD2 7.2 Unsigned64 | | V | +--------------------------------------------------+----+----+ |OC-OLR TBD3 7.3 Grouped | | V | +--------------------------------------------------+----+----+ |OC-Sequence-Number TBD4 7.4 Unsigned64 | | V | +--------------------------------------------------+----+----+ |OC-Validity-Duration TBD5 7.5 Unsigned32 | | V | +--------------------------------------------------+----+----+ |OC-Report-Type TBD6 7.6 Enumerated | | V | +--------------------------------------------------+----+----+ |OC-Reduction | | | | -Percentage TBD7 7.7 Unsigned32 | | V | +--------------------------------------------------+----+----+ As described in the Diameter base protocol [RFC6733], the M-bit usage for a given AVP in a given command may be defined by the application. Korhonen, et al. Expires August 8, 2015 [Page 27] Internet-Draft DOIC February 2015 8. Error Response Codes When a DOIC node rejects a Diameter request due to overload, the DOIC node MUST select an appropriate error response code. This determination is made based on the probability of the request succeeding if retried on a different path. Note: This only applies for DOIC nodes that are not the originator of the request. A reporting node rejecting a Diameter request due to an overload condition SHOULD send a DIAMETER_TOO_BUSY error response, if it can assume that the same request may succeed on a different path. If a reporting node knows or assumes that the same request will not succeed on a different path, DIAMETER_UNABLE_TO_COMPLY error response SHOULD be used. Retrying would consume valuable resources during an occurrence of overload. For instance, if the request arrived at the reporting node without a Destination-Host AVP then the reporting node might determine that there is an alternative Diameter node that could successfully process the request and that retrying the transaction would not negatively impact the reporting node. DIAMETER_TOO_BUSY would be sent in this case. If the request arrived at the reporting node with a Destination- Host AVP populated with its own Diameter identity then the reporting node can assume that retrying the request would result in it coming to the same reporting node. DIAMETER_UNABLE_TO_COMPLY would be sent in this case. A second example is when an agent that supports the DOIC solution is performing the role of a reacting node for a non supporting client. Requests that are rejected as a result of DOIC throttling by the agent in this scenario would generally be rejected with a DIAMETER_UNABLE_TO_COMPLY response code. 9. IANA Considerations 9.1. AVP codes New AVPs defined by this specification are listed in Section 7. All AVP codes are allocated from the 'Authentication, Authorization, and Accounting (AAA) Parameters' AVP Codes registry. Korhonen, et al. Expires August 8, 2015 [Page 28] Internet-Draft DOIC February 2015 9.2. New registries Two new registries are needed under the 'Authentication, Authorization, and Accounting (AAA) Parameters' registry. A new "Overload Control Feature Vector" registry is required. The registry must contain the following: Feature Vector Value Specification - the specification that defines the new value. See Section 7.2 for the initial Feature Vector Value in the registry. This specification is the specification defining the value. New values can be added into the registry using the Specification Required policy. [RFC5226]. A new "Overload Report Type" registry is required. The registry must contain the following: Report Type Value Specification - the specification that defines the new value. See Section 7.6 for the initial assignment in the registry. New types can be added using the Specification Required policy [RFC5226]. 10. Security Considerations DOIC gives Diameter nodes the ability to request that downstream nodes send fewer Diameter requests. Nodes do this by exchanging overload reports that directly effect this reduction. This exchange is potentially subject to multiple methods of attack, and has the potential to be used as a Denial-of-Service (DoS) attack vector. Overload reports may contain information about the topology and current status of a Diameter network. This information is potentially sensitive. Network operators may wish to control disclosure of overload reports to unauthorized parties to avoid its use for competitive intelligence or to target attacks. Diameter does not include features to provide end-to-end authentication, integrity protection, or confidentiality. This may cause complications when sending overload reports between non- adjacent nodes. Korhonen, et al. Expires August 8, 2015 [Page 29] Internet-Draft DOIC February 2015 10.1. Potential Threat Modes The Diameter protocol involves transactions in the form of requests and answers exchanged between clients and servers. These clients and servers may be peers, that is, they may share a direct transport (e.g. TCP or SCTP) connection, or the messages may traverse one or more intermediaries, known as Diameter Agents. Diameter nodes use TLS, DTLS, or IPsec to authenticate peers, and to provide confidentiality and integrity protection of traffic between peers. Nodes can make authorization decisions based on the peer identities authenticated at the transport layer. When agents are involved, this presents an effectively transitive trust model. That is, a Diameter client or server can authorize an agent for certain actions, but it must trust that agent to make appropriate authorization decisions about its peers, and so on. Since confidentiality and integrity protection occurs at the transport layer, agents can read, and perhaps modify, any part of a Diameter message, including an overload report. There are several ways an attacker might attempt to exploit the overload control mechanism. An unauthorized third party might inject an overload report into the network. If this third party is upstream of an agent, and that agent fails to apply proper authorization policies, downstream nodes may mistakenly trust the report. This attack is at least partially mitigated by the assumption that nodes include overload reports in Diameter answers but not in requests. This requires an attacker to have knowledge of the original request in order to construct an answer. Such an answer would also need to arrive at a Diameter node via a protected transport connection. Therefore, implementations MUST validate that an answer containing an overload report is a properly constructed response to a pending request prior to acting on the overload report, and that the answer was received via an appropriate transport connection. A similar attack involves a compromised but otherwise authorized node that sends an inappropriate overload report. For example, a server for the realm "example.com" might send an overload report indicating that a competitor's realm "example.net" is overloaded. If other nodes act on the report, they may falsely believe that "example.net" is overloaded, effectively reducing that realm's capacity. Therefore, it's critical that nodes validate that an overload report received from a peer actually falls within that peer's responsibility before acting on the report or forwarding the report to other peers. For example, an overload report from a peer that applies to a realm not handled by that peer is suspect. Korhonen, et al. Expires August 8, 2015 [Page 30] Internet-Draft DOIC February 2015 This attack is partially mitigated by the fact that the application, as well as host and realm, for a given OLR is determined implicitly by respective AVPs in the enclosing answer. If a reporting node modifies any of those AVPs, the enclosing transaction will also be affected. 10.2. Denial of Service Attacks Diameter overload reports, especially realm-reports, can cause a node to cease sending some or all Diameter requests for an extended period. This makes them a tempting vector for DoS attacks. Furthermore, since Diameter is almost always used in support of other protocols, a DoS attack on Diameter is likely to impact those protocols as well. Therefore, Diameter nodes MUST NOT honor or forward OLRs received from peers that are not trusted to send them. An attacker might use the information in an OLR to assist in DoS attacks. For example, an attacker could use information about current overload conditions to time an attack for maximum effect, or use subsequent overload reports as a feedback mechanism to learn the results of a previous or ongoing attack. Operators need the ability to ensure that OLRs are not leaked to untrusted parties. 10.3. Non-Compliant Nodes In the absence of an overload control mechanism, Diameter nodes need to implement strategies to protect themselves from floods of requests, and to make sure that a disproportionate load from one source does not prevent other sources from receiving service. For example, a Diameter server might throttle a certain percentage of requests from sources that exceed certain limits. Overload control can be thought of as an optimization for such strategies, where downstream nodes never send the excess requests in the first place. However, the presence of an overload control mechanism does not remove the need for these other protection strategies. When a Diameter node sends an overload report, it cannot assume that all nodes will comply, even if they indicate support for DOIC. A non-compliant node might continue to send requests with no reduction in load. Such non-compliance could be done accidentally, or maliciously to gain an unfair advantage over compliant nodes. Requirement 28 [RFC7068] indicates that the overload control solution cannot assume that all Diameter nodes in a network are trusted, and that malicious nodes not be allowed to take advantage of the overload control mechanism to get more than their fair share of service. Korhonen, et al. Expires August 8, 2015 [Page 31] Internet-Draft DOIC February 2015 10.4. End-to End-Security Issues The lack of end-to-end integrity features makes it difficult to establish trust in overload reports received from non-adjacent nodes. Any agents in the message path may insert or modify overload reports. Nodes must trust that their adjacent peers perform proper checks on overload reports from their peers, and so on, creating a transitive- trust requirement extending for potentially long chains of nodes. Network operators must determine if this transitive trust requirement is acceptable for their deployments. Nodes supporting Diameter overload control MUST give operators the ability to select which peers are trusted to deliver overload reports, and whether they are trusted to forward overload reports from non-adjacent nodes. DOIC nodes MUST strip DOIC AVPs from messages received from peers that are not trusted for DOIC purposes. The lack of end-to-end confidentiality protection means that any Diameter agent in the path of an overload report can view the contents of that report. In addition to the requirement to select which peers are trusted to send overload reports, operators MUST be able to select which peers are authorized to receive reports. A node MUST NOT send an overload report to a peer not authorized to receive it. Furthermore, an agent MUST remove any overload reports that might have been inserted by other nodes before forwarding a Diameter message to a peer that is not authorized to receive overload reports. A DOIC node cannot always automatically detect that a peer also supports DOIC. For example, a node might have a peer that is a non-supporting agent. If nodes on the other side of that agent send OC-Supported-Features AVPs, the agent is likely to forward them as unknown AVPs. Messages received across the non-supporting agent may be indistinguishable from messages received across a DOIC supporting agent, giving the false impression that the non- supporting agent actually supports DOIC. This complicates the transitive-trust nature of DOIC. Operators need to be careful to avoid situations where a non-supporting agent is mistakenly trusted to enforce DOIC related authorization policies. At the time of this writing, the DIME working group is studying requirements for adding end-to-end security features [I-D.ietf-dime-e2e-sec-req] to Diameter. These features, when they become available, might make it easier to establish trust in non- adjacent nodes for overload control purposes. Readers should be reminded, however, that the overload control mechanism encourages Diameter agents to modify AVPs in, or insert additional AVPs into, existing messages that are originated by other nodes. If end-to-end security is enabled, there is a risk that such modification could violate integrity protection. The details of using any future Korhonen, et al. Expires August 8, 2015 [Page 32] Internet-Draft DOIC February 2015 Diameter end-to-end security mechanism with overload control will require careful consideration, and are beyond the scope of this document. 11. Contributors The following people contributed substantial ideas, feedback, and discussion to this document: o Eric McMurry o Hannes Tschofenig o Ulrich Wiehe o Jean-Jacques Trottin o Maria Cruz Bartolome o Martin Dolly o Nirav Salot o Susan Shishufeng 12. References 12.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA Considerations Section in RFCs", BCP 26, RFC 5226, May 2008. [RFC6733] Fajardo, V., Arkko, J., Loughney, J., and G. Zorn, "Diameter Base Protocol", RFC 6733, October 2012. 12.2. Informative References [Cx] 3GPP, , "ETSI TS 129 229 V11.4.0", August 2013. [I-D.ietf-dime-e2e-sec-req] Tschofenig, H., Korhonen, J., Zorn, G., and K. Pillay, "Diameter AVP Level Security: Scenarios and Requirements", draft-ietf-dime-e2e-sec-req-01 (work in progress), October 2013. Korhonen, et al. Expires August 8, 2015 [Page 33] Internet-Draft DOIC February 2015 [PCC] 3GPP, , "ETSI TS 123 203 V11.12.0", December 2013. [RFC4006] Hakala, H., Mattila, L., Koskinen, J-P., Stura, M., and J. Loughney, "Diameter Credit-Control Application", RFC 4006, August 2005. [RFC7068] McMurry, E. and B. Campbell, "Diameter Overload Control Requirements", RFC 7068, November 2013. [S13] 3GPP, , "ETSI TS 129 272 V11.9.0", December 2012. Appendix A. Issues left for future specifications The base solution for the overload control does not cover all possible use cases. A number of solution aspects were intentionally left for future specification and protocol work. The following sub- sections define some of the potential extensions to the DOIC solution. A.1. Additional traffic abatement algorithms This specification describes only means for a simple loss based algorithm. Future algorithms can be added using the designed solution extension mechanism. The new algorithms need to be registered with IANA. See Sections 7.1 and 9 for the required IANA steps. A.2. Agent Overload This specification focuses on Diameter endpoint (server or client) overload. A separate extension will be required to outline the handling of the case of agent overload. A.3. New Error Diagnostic AVP This specification indicates the use of existing error messages when nodes reject requests due to overload. The DIME working group is considering defining additional error codes or AVPs to indicate that overload was the reason for the rejection of the message. Appendix B. Deployment Considerations Non Supporting Agents Due to the way that realm-routed requests are handled in Diameter networks with the server selection for the request done by an agent, network operators should enable DOIC at agents that perform server selection first. Korhonen, et al. Expires August 8, 2015 [Page 34] Internet-Draft DOIC February 2015 Topology Hiding Interactions There exist proxies that implement what is referred to as Topology Hiding. This can include cases where the agent modifies the Origin-Host in answer messages. The behavior of the DOIC solution is not well understood when this happens. As such, the DOIC solution does not address this scenario. Appendix C. Requirements Conformance Analysis This section contains the result of an analysis of the DOIC solutions conformance to the requirements defined in [RFC7068]. C.1. Deferred Requirements The 3GPP has adopted an early version of this document as normative references in various Diameter related specifications to support the overload control mechanism in their release 12 framework. The DIME working group has therefore decided to defer certain requirements in order to complete the design of an extensible, generic solution before the deadline scheduled by the 3GPP for the completion of the release 12 protocol work by the end of 2014. The deferred work includes the following: o Agent Overload - The ability for an agent to report an overload condition of the agent itself. o Load Information - The ability for a node to report its load level when not overloaded. At the time of this writing, DIME has begun separate work efforts for these requirements. C.2. Detection of non-supporting Intermediaries The DOIC mechanism as currently defined does not allow supporting nodes to automatically determine whether OC-Supported-Features or OC- OLR AVPs are originated by a peer node, or by a non-peer node and sent across a non-supporting peer. This makes it impossible to detect the presence of non-supporting nodes between supporting nodes, except by configuration. The working group determined that such a configuration requirement is acceptable. This limits full compliance with certain requirements related to the limitation of new configuration, deployment in environments with mixed support, operating across non-supporting agents, and authorization. Korhonen, et al. Expires August 8, 2015 [Page 35] Internet-Draft DOIC February 2015 C.3. Implicit Application Indication The working group elected to determine the application for an overload report from that of the enclosing message. This prevents sending an OLR for an application when there are no transactions for that application. As a consequence, DOIC does not comply with the requirement to be able to report overload information across quiescent connections. DOIC does not fully comply with requirements to operate on up-to-date information, since if an OLR causes all transactions to stop for an application, the only way traffic will resume is for the OLR to expire. C.4. Stateless Operation RFC7068 explicitly discourages the sending of OLRs in every answer message, as part of the requirement to avoid additional work for overloaded nodes. DOIC recommends exactly that behavior during active overload conditions. The working group determined that doing otherwise would reduce reliability and increase statefulness. (Note that DOIC does allow nodes to avoid sending OLRs in every answer if they have some other method of ensuring that OLRs get to all relevant reacting nodes.) C.5. No New Vulnerabilities The working group believes that DOIC is compliant with the requirement to avoid introducing new vulnerabilities. However, this requirement may warrant an early security expert review. C.6. Detailed Requirements [RFC Editor: Please remove this section and subsections prior to publication as an RFC.] C.6.1. General REQ 1: The solution MUST provide a communication method for Diameter nodes to exchange load and overload information. *Partially Compliant*. The mechanism uses new AVPs piggybacked on existing Diameter messages to exchange overload information. It does not currently support "load" information or the ability to report overload of an agent. These have been left for future extensions. Korhonen, et al. Expires August 8, 2015 [Page 36] Internet-Draft DOIC February 2015 REQ 2: The solution MUST allow Diameter nodes to support overload control regardless of which Diameter applications they support. Diameter clients and agents must be able to use the received load and overload information to support graceful behavior during an overload condition. Graceful behavior under overload conditions is best described by REQ 3. *Partially Compliant*. The DOIC AVPs can be used in any application that allows the extension of AVPs. However, "load" information is not currently supported. REQ 3: The solution MUST limit the impact of overload on the overall useful throughput of a Diameter server, even when the incoming load on the network is far in excess of its capacity. The overall useful throughput under load is the ultimate measure of the value of a solution. *Compliant*. DOIC provides information that nodes can use to reduce the impact of overload. REQ 4: Diameter allows requests to be sent from either side of a connection, and either side of a connection may have need to provide its overload status. The solution MUST allow each side of a connection to independently inform the other of its overload status. *Compliant*. DOIC AVPs can be included regardless of transaction "direction" REQ 5: Diameter allows nodes to determine their peers via dynamic discovery or manual configuration. The solution MUST work consistently without regard to how peers are determined. *Compliant*. DOIC contains no assumptions about how peers are discovered. REQ 6: The solution designers SHOULD seek to minimize the amount of new configuration required in order to work. For example, it is better to allow peers to advertise or negotiate support Korhonen, et al. Expires August 8, 2015 [Page 37] Internet-Draft DOIC February 2015 for the solution, rather than to require that this knowledge to be configured at each node. *Partially Compliant*. Most DOIC parameters are advertised using the DOIC capability announcement mechanism. However, there are some situations where configuration is required. For example, a DOIC node detect the fact that a peer may not support DOIC when nodes on the other side of the non- supporting node do support DOIC without configuration. C.6.2. Performance REQ 7: The solution and any associated default algorithm(s) MUST ensure that the system remains stable. At some point after an overload condition has ended, the solution MUST enable capacity to stabilize and become equal to what it would be in the absence of an overload condition. Note that this also requires that the solution MUST allow nodes to shed load without introducing non-converging oscillations during or after an overload condition. *Compliant*. The specification offers guidance that implementations should apply hysteresis when recovering from overload, and avoid sudden ramp ups in offered load when recovering. REQ 8: Supporting nodes MUST be able to distinguish current overload information from stale information. *Partially Compliant*. DOIC overload reports are "soft state", that is they expire after an indicated period. DOIC nodes may also send reports that end existing overload conditions. DOIC requires reporting nodes to ensure that all relevant reacting nodes receive overload reports. However, since DOIC does not allow reporting nodes to send OLRs in watchdog messages, if an overload condition results in zero offered load, the reporting node cannot update the condition until the expiration of the original OLR. Korhonen, et al. Expires August 8, 2015 [Page 38] Internet-Draft DOIC February 2015 REQ 9: The solution MUST function across fully loaded as well as quiescent transport connections. This is partially derived from the requirement for stability in REQ 7. *Not Compliant*. DOIC does not allow OLRs to be sent over quiescent transport connections. This is due to the fact that OLRs cannot be sent outside of the application to which they apply. REQ 10: Consumers of overload information MUST be able to determine when the overload condition improves or ends. *Partially Compliant*. (See response to previous two requirements.) REQ 11: The solution MUST be able to operate in networks of different sizes. *Compliant*. DOIC makes no assumptions about the size of the network. DOIC can operate purely between clients and servers, or across agents. REQ 12: When a single network node fails, goes into overload, or suffers from reduced processing capacity, the solution MUST make it possible to limit the impact of the affected node on other nodes in the network. This helps to prevent a small- scale failure from becoming a widespread outage. *Partially Compliant*. DOIC allows overload reports for an entire realm, where abated traffic will not be redirected towards another server. But in situations where nodes choose to divert traffic to other nodes, DOIC offers no way of knowing whether the new recipients can handle the traffic if they have not already indicated overload. This may be mitigated with the use of a future "load" extension, or with the use of proprietary dynamic load-balancing mechanisms. REQ 13: The solution MUST NOT introduce substantial additional work for a node in an overloaded state. For example, a requirement for an overloaded node to send overload Korhonen, et al. Expires August 8, 2015 [Page 39] Internet-Draft DOIC February 2015 information every time it received a new request would introduce substantial work. *Not Compliant*. DOIC does in fact encourage an overloaded node to send an OLR in every response. The working group that other mechanisms to ensure that every relevant node receives an OLR would create even more work. [Note: This needs discussion.] REQ 14: Some scenarios that result in overload involve a rapid increase of traffic with little time between normal levels and levels that induce overload. The solution SHOULD provide for rapid feedback when traffic levels increase. *Compliant*. The piggyback mechanism allows OLRs to be sent at the same rate as application traffic. REQ 15: The solution MUST NOT interfere with the congestion control mechanisms of underlying transport protocols. For example, a solution that opened additional TCP connections when the network is congested would reduce the effectiveness of the underlying congestion control mechanisms. *Compliant*. DOIC does not require or recommend changes in the handling of transport protocols or connections. C.6.3. Heterogeneous Support for Solution REQ 16: The solution is likely to be deployed incrementally. The solution MUST support a mixed environment where some, but not all, nodes implement it. *Partially Compliant*. DOIC works with most mixed-deployment scenarios. However, it cannot work across a non-supporting proxy that modifies Origin-Host AVPs in answer messages. DOIC will have limited impact in networks where the nodes that perform server selections do not support the mechanism. REQ 17: In a mixed environment with nodes that support the solution and nodes that do not, the solution MUST NOT result in Korhonen, et al. Expires August 8, 2015 [Page 40] Internet-Draft DOIC February 2015 materially less useful throughput during overload as would have resulted if the solution were not present. It SHOULD result in less severe overload in this environment. *Compliant*. In most mixed-support deployment, DOIC will offer at least some value, and will not make things worse. REQ 18: In a mixed environment of nodes that support the solution and nodes that do not, the solution MUST NOT preclude elements that support overload control from treating elements that do not support overload control in an equitable fashion relative to those that do. Users and operators of nodes that do not support the solution MUST NOT unfairly benefit from the solution. The solution specification SHOULD provide guidance to implementers for dealing with elements not supporting overload control. *Compliant*. DOIC provides mechanisms to abate load from non- supporting sources. Furthermore, it recommends that reporting nodes will still need to be able to apply whatever protections they would ordinarily apply if DOIC were not in use. REQ 19: It MUST be possible to use the solution between nodes in different realms and in different administrative domains. *Partially Compliant*. DOIC allows sending OLRs across administrative domains, and potentially to nodes in other realms. However, an OLR cannot indicate overload for realms other than the one in the Origin-Realm AVP of the containing answer. REQ 20: Any explicit overload indication MUST be clearly distinguishable from other errors reported via Diameter. *Compliant*. DOIC sends explicit overload indication in overload reports. It does not depend on error result codes. REQ 21: In cases where a network node fails, is so overloaded that it cannot process messages, or cannot communicate due to a Korhonen, et al. Expires August 8, 2015 [Page 41] Internet-Draft DOIC February 2015 network failure, it may not be able to provide explicit indications of the nature of the failure or its levels of overload. The solution MUST result in at least as much useful throughput as would have resulted if the solution were not in place. *Compliant*. DOIC overload reports have the primary effect of suppressing message retries in overload conditions. DOIC recommends that messages never be silently dropped if at all possible. C.6.4. Granular Control REQ 22: The solution MUST provide a way for a node to throttle the amount of traffic it receives from a peer node. This throttling SHOULD be graded so that it can be applied gradually as offered load increases. Overload is not a binary state; there may be degrees of overload. *Compliant*. The "loss" algorithm expresses a percentage reduction. REQ 23: The solution MUST provide sufficient information to enable a load-balancing node to divert messages that are rejected or otherwise throttled by an overloaded upstream node to other upstream nodes that are the most likely to have sufficient capacity to process them. *Not Compliant*. DOIC provides no built in mechanism to determine the best place to divert messages that would otherwise be throttled. This can be accomplished with a future "load" extension, or with proprietary load balancing mechanisms. REQ 24: The solution MUST provide a mechanism for indicating load levels, even when not in an overload condition, to assist nodes in making decisions to prevent overload conditions from occurring. *Not Compliant*. "Load" information has been left for a future extension. Korhonen, et al. Expires August 8, 2015 [Page 42] Internet-Draft DOIC February 2015 C.6.5. Priority and Policy REQ 25: The base specification for the solution SHOULD offer general guidance on which message types might be desirable to send or process over others during times of overload, based on application-specific considerations. For example, it may be more beneficial to process messages for existing sessions ahead of new sessions. Some networks may have a requirement to give priority to requests associated with emergency sessions. Any normative or otherwise detailed definition of the relative priorities of message types during an overload condition will be the responsibility of the application specification. *Compliant*. The specification offers guidance on how requests might be prioritized for different types of applications. REQ 26: The solution MUST NOT prevent a node from prioritizing requests based on any local policy, so that certain requests are given preferential treatment, given additional retransmission, not throttled, or processed ahead of others. *Compliant*. Nothing in the specification prevents application-specific, implementation-specific, or local policies. C.6.6. Security REQ 27: The solution MUST NOT provide new vulnerabilities to malicious attack or increase the severity of any existing vulnerabilities. This includes vulnerabilities to DoS and DDoS attacks as well as replay and man-in-the-middle attacks. Note that the Diameter base specification [RFC6733] lacks end-to-end security and this must be considered (see the Security Considerations in [RFC7068]). Note that this requirement was expressed at a high level so as to not preclude any particular solution. It is expected that the solution will address this in more detail. *Compliant*. The working group is not aware of any such vulnerabilities. [This may need further analysis.] Korhonen, et al. Expires August 8, 2015 [Page 43] Internet-Draft DOIC February 2015 REQ 28: The solution MUST NOT depend on being deployed in environments where all Diameter nodes are completely trusted. It SHOULD operate as effectively as possible in environments where other nodes are malicious; this includes preventing malicious nodes from obtaining more than a fair share of service. Note that this does not imply any responsibility on the solution to detect, or take countermeasures against, malicious nodes. *Partially Compliant*. Since all Diameter security is currently at the transport layer, nodes must trust immediate peers to enforce trust policies. However, there are situations where a DOIC node cannot determine if an immediate peer supports DOIC. The authors recommend an expert security review. REQ 29: It MUST be possible for a supporting node to make authorization decisions about what information will be sent to peer nodes based on the identity of those nodes. This allows a domain administrator who considers the load of their nodes to be sensitive information to restrict access to that information. Of course, in such cases, there is no expectation that the solution itself will help prevent overload from that peer node. *Partially Compliant*. (See response to previous requirement.) REQ 30: The solution MUST NOT interfere with any Diameter-compliant method that a node may use to protect itself from overload from non-supporting nodes or from denial-of-service attacks. *Compliant*. The specification recommends that any such protection mechanism needed without DOIC should continue to be employed with DOIC. C.6.7. Flexibility and Extensibility REQ 31: There are multiple situations where a Diameter node may be overloaded for some purposes but not others. For example, this can happen to an agent or server that supports multiple applications, or when a server depends on multiple external Korhonen, et al. Expires August 8, 2015 [Page 44] Internet-Draft DOIC February 2015 resources, some of which may become overloaded while others are fully available. The solution MUST allow Diameter nodes to indicate overload with sufficient granularity to allow clients to take action based on the overloaded resources without unreasonably forcing available capacity to go unused. The solution MUST support specification of overload information with granularities of at least "Diameter node", "realm", and "Diameter application" and MUST allow extensibility for others to be added in the future. *Partially Compliant*. All DOIC overload reports are scoped to the specific application and realm. Inside that scope, overload can be reported at the specific server or whole realm scope. As currently specified, DOIC cannot indicate local overload for an agent. At the time of this writing, the DIME working group has plans to work on an agent-overload extension. DOIC allows new "scopes" through the use of extended report types. REQ 32: The solution MUST provide a method for extending the information communicated and the algorithms used for overload control. *Compliant*. DOIC allows new report types and abatement algorithms to be created. These may be indicated using the OC-Supported-Features AVP. REQ 33: The solution MUST provide a default algorithm that is mandatory to implement. *Compliant*. The "loss" algorithm is mandatory to implement. REQ 34: The solution SHOULD provide a method for exchanging overload and load information between elements that are connected by intermediaries that do not support the solution. *Partially Compliant*. DOIC information can traverse non- supporting agents, as long as those agents do not modify certain AVPs. (e.g., Origin-Host). DOIC does not provide a way for supporting nodes to detect such modification. Korhonen, et al. Expires August 8, 2015 [Page 45] Internet-Draft DOIC February 2015 Appendix D. Considerations for Applications Integrating the DOIC Solution This section outlines considerations to be taken into account when integrating the DOIC solution into Diameter applications. D.1. Application Classification The following is a classification of Diameter applications and request types. This discussion is meant to document factors that play into decisions made by the Diameter identity responsible for handling overload reports. Section 8.1 of [RFC6733] defines two state machines that imply two types of applications, session-less and session-based applications. The primary difference between these types of applications is the lifetime of Session-Ids. For session-based applications, the Session-Id is used to tie multiple requests into a single session. The Credit-Control application defined in [RFC4006] is an example of a Diameter session-based application. In session-less applications, the lifetime of the Session-Id is a single Diameter transaction, i.e. the session is implicitly terminated after a single Diameter transaction and a new Session-Id is generated for each Diameter request. For the purposes of this discussion, session-less applications are further divided into two types of applications: Stateless Applications: Requests within a stateless application have no relationship to each other. The 3GPP defined S13 application is an example of a stateless application [S13], where only a Diameter command is defined between a client and a server and no state is maintained between two consecutive transactions. Pseudo-Session Applications: Applications that do not rely on the Session-Id AVP for correlation of application messages related to the same session but use other session-related information in the Diameter requests for this purpose. The 3GPP defined Cx application [Cx] is an example of a pseudo-session application. Korhonen, et al. Expires August 8, 2015 [Page 46] Internet-Draft DOIC February 2015 The handling of overload reports must take the type of application into consideration, as discussed in Appendix D.2. D.2. Application Type Overload Implications This section discusses considerations for mitigating overload reported by a Diameter entity. This discussion focuses on the type of application. Appendix D.3 discusses considerations for handling various request types when the target server is known to be in an overloaded state. These discussions assume that the strategy for mitigating the reported overload is to reduce the overall workload sent to the overloaded entity. The concept of applying overload treatment to requests targeted for an overloaded Diameter entity is inherent to this discussion. The method used to reduce offered load is not specified here but could include routing requests to another Diameter entity known to be able to handle them, or it could mean rejecting certain requests. For a Diameter agent, rejecting requests will usually mean generating appropriate Diameter error responses. For a Diameter client, rejecting requests will depend upon the application. For example, it could mean giving an indication to the entity requesting the Diameter service that the network is busy and to try again later. Stateless Applications: By definition there is no relationship between individual requests in a stateless application. As a result, when a request is sent or relayed to an overloaded Diameter entity - either a Diameter Server or a Diameter Agent - the sending or relaying entity can choose to apply the overload treatment to any request targeted for the overloaded entity. Pseudo-Session Applications: For pseudo-session applications, there is an implied ordering of requests. As a result, decisions about which requests towards an overloaded entity to reject could take the command code of the request into consideration. This generally means that transactions later in the sequence of transactions should be given more favorable treatment than messages earlier in the sequence. This is because more work has already been done by the Diameter network for those transactions that occur later in the sequence. Rejecting them could result in increasing the load on the network as the transactions earlier in the sequence might also need to be repeated. Korhonen, et al. Expires August 8, 2015 [Page 47] Internet-Draft DOIC February 2015 Session-Based Applications: Overload handling for session-based applications must take into consideration the work load associated with setting up and maintaining a session. As such, the entity sending requests towards an overloaded Diameter entity for a session-based application might tend to reject new session requests prior to rejecting intra-session requests. In addition, session ending requests might be given a lower probability of being rejected as rejecting session ending requests could result in session status being out of sync between the Diameter clients and servers. Application designers that would decide to reject mid-session requests will need to consider whether the rejection invalidates the session and any resulting session cleanup procedures. D.3. Request Transaction Classification Independent Request: An independent request is not correlated to any other requests and, as such, the lifetime of the session-id is constrained to an individual transaction. Session-Initiating Request: A session-initiating request is the initial message that establishes a Diameter session. The ACR message defined in [RFC6733] is an example of a session-initiating request. Correlated Session-Initiating Request: There are cases when multiple session-initiated requests must be correlated and managed by the same Diameter server. It is notably the case in the 3GPP PCC architecture [PCC], where multiple apparently independent Diameter application sessions are actually correlated and must be handled by the same Diameter server. Intra-Session Request: An intra-session request is a request that uses the same Session- Id than the one used in a previous request. An intra-session request generally needs to be delivered to the server that handled the session creating request for the session. The STR message defined in [RFC6733] is an example of an intra-session request. Pseudo-Session Requests: Korhonen, et al. Expires August 8, 2015 [Page 48] Internet-Draft DOIC February 2015 Pseudo-session requests are independent requests and do not use the same Session-Id but are correlated by other session-related information contained in the request. There exists Diameter applications that define an expected ordering of transactions. This sequencing of independent transactions results in a pseudo session. The AIR, MAR and SAR requests in the 3GPP defined Cx [Cx] application are examples of pseudo-session requests. D.4. Request Type Overload Implications The request classes identified in Appendix D.3 have implications on decisions about which requests should be throttled first. The following list of request treatment regarding throttling is provided as guidelines for application designers when implementing the Diameter overload control mechanism described in this document. The exact behavior regarding throttling is a matter of local policy, unless specifically defined for the application. Independent Requests: Independent requests can generally be given equal treatment when making throttling decisions, unless otherwise indicated by application requirements or local policy. Session-Initiating Requests: Session-initiating requests often represent more work than independent or intra-session requests. Moreover, session- initiating requests are typically followed by other session- related requests. Since the main objective of the overload control is to reduce the total number of requests sent to the overloaded entity, throttling decisions might favor allowing intra-session requests over session-initiating requests. In the absence of local policies or application specific requirements to the contrary, Individual session-initiating requests can be given equal treatment when making throttling decisions. Correlated Session-Initiating Requests: A Request that results in a new binding, where the binding is used for routing of subsequent session-initiating requests to the same server, represents more work load than other requests. As such, these requests might be throttled more frequently than other request types. Pseudo-Session Requests: Korhonen, et al. Expires August 8, 2015 [Page 49] Internet-Draft DOIC February 2015 Throttling decisions for pseudo-session requests can take into consideration where individual requests fit into the overall sequence of requests within the pseudo session. Requests that are earlier in the sequence might be throttled more aggressively than requests that occur later in the sequence. Intra-Session Requests: There are two types of intra-sessions requests, requests that terminate a session and the remainder of intra-session requests. Implementers and operators may choose to throttle session- terminating requests less aggressively in order to gracefully terminate sessions, allow cleanup of the related resources (e.g. session state) and avoid the need for additional intra-session requests. Favoring session-termination requests may reduce the session management impact on the overloaded entity. The default handling of other intra-session requests might be to treat them equally when making throttling decisions. There might also be application level considerations whether some request types are favored over others. Authors' Addresses Jouni Korhonen (editor) Broadcom Porkkalankatu 24 Helsinki FIN-00180 Finland Email: jouni.nospam@gmail.com Steve Donovan (editor) Oracle 7460 Warren Parkway Frisco, Texas 75034 United States Email: srdonovan@usdonovans.com Ben Campbell Oracle 7460 Warren Parkway Frisco, Texas 75034 United States Email: ben@nostrum.com Korhonen, et al. Expires August 8, 2015 [Page 50] Internet-Draft DOIC February 2015 Lionel Morand Orange Labs 38/40 rue du General Leclerc Issy-Les-Moulineaux Cedex 9 92794 France Phone: +33145296257 Email: lionel.morand@orange.com Korhonen, et al. Expires August 8, 2015 [Page 51]