PCN L. Westberg Internet-Draft A. Bhargava Intended status: Standards Track A. Bader Expires: August 28, 2008 Ericsson G. Karagiannis University of Twente February 25, 2008 LC-PCN: The Load Control PCN Solution draft-westberg-pcn-load-control-03 Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on August 28, 2008. Copyright Notice Copyright (C) The IETF Trust (2008). Abstract There is an increased interest of simple and scalable resource provisioning solution for Diffserv network. The Load Control PCN (LC-PCN) addresses the following issues: Westberg, et al. Expires August 28, 2008 [Page 1] Internet-Draft LC-PCN February 2008 o Admission Control for real time data flows in stateless Diffserv Domains o Flow Termination: Termination of flows in case of exceptional events, such as severe congestion after re-routing. Admission control in a Diffserv stateless domain can be performed using two methods: o Admission Control based on data marking, whereby in congestion situations the data packets are marked to notify the PCN-egress- node that a congestion occurred on a particular PCN-ingress-node to PCN-egress-node path. o Probing, whereby a probe packet is sent along the forwarding path in a network to determine whether a flow can be admitted based upon the current congestion state of the network The scheme provides the capability of controlling the traffic load in the network without requiring signaling or any per-flow processing in the PCN-interior-nodes. The complexity of Load Control is kept to a minimum to make implementation simple. Westberg, et al. Expires August 28, 2008 [Page 2] Internet-Draft LC-PCN February 2008 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 3. LC-PCN Overview . . . . . . . . . . . . . . . . . . . . . . . 6 3.1. Admission control . . . . . . . . . . . . . . . . . . . . 6 3.1.1. Admission control based on data marking . . . . . . . 7 3.1.1.1. PCN-interior-node features . . . . . . . . . . . . 7 3.1.1.2. PCN-egress-node features . . . . . . . . . . . . . 8 3.1.1.3. PCN-ingress-node features . . . . . . . . . . . . 9 3.1.2. Admission control based on probing . . . . . . . . . . 9 3.1.2.1. PCN-interior-node features . . . . . . . . . . . . 9 3.1.2.2. PCN-egress-node features . . . . . . . . . . . . . 10 3.1.2.3. PCN-ingress-node features . . . . . . . . . . . . 10 3.1.3. ECMP solution . . . . . . . . . . . . . . . . . . . . 10 3.2. Flow Termination . . . . . . . . . . . . . . . . . . . . . 11 3.2.1. PCN-interior-node . . . . . . . . . . . . . . . . . . 11 3.2.2. PCN-egress-node . . . . . . . . . . . . . . . . . . . 11 3.2.3. PCN-ingress-node . . . . . . . . . . . . . . . . . . . 12 3.2.4. ECMP solution . . . . . . . . . . . . . . . . . . . . 13 3.3. Operational states in LC-PCN . . . . . . . . . . . . . . . 13 3.3.1. Operational states in PCN-interior-nodes . . . . . . . 13 3.3.2. Operational states in PCN-egress-nodes . . . . . . . . 15 3.3.3. Operational states in PCN-ingress-nodes . . . . . . . 17 3.4. Encoding of PCN traffic . . . . . . . . . . . . . . . . . 18 4. LC-PCN detailed description . . . . . . . . . . . . . . . . . 19 4.1. Admission control based on data marking for unidirectional flows . . . . . . . . . . . . . . . . . . . 19 4.1.1. Operation in PCN-interior-nodes . . . . . . . . . . . 19 4.1.2. Operation in PCN-egress-nodes . . . . . . . . . . . . 21 4.1.3. Operation in PCN-ingress-nodes . . . . . . . . . . . . 23 4.2. Admission control based on probing for unidirectional flows . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.2.1. Operation in PCN-interior-nodes . . . . . . . . . . . 24 4.2.2. Operation in PCN-egress-nodes . . . . . . . . . . . . 24 4.2.3. Operation in PCN-ingress-nodes . . . . . . . . . . . . 25 4.3. Flow Termination for unidirectional flows . . . . . . . . 25 4.3.1. Operation in the PCN-interior-node . . . . . . . . . . 25 4.3.1.1. Optimization mode features for Flow termination . 26 4.3.1.2. ECMP solutions . . . . . . . . . . . . . . . . . . 28 4.3.2. Operation in PCN-egress-nodes . . . . . . . . . . . . 28 4.3.2.1. ECMP solutions . . . . . . . . . . . . . . . . . . 31 4.3.3. Operation in PCN-ingress-nodes . . . . . . . . . . . . 31 4.4. Admission control based on data marking and probing for bi-directional flows . . . . . . . . . . . . . . . . . 32 4.5. Flow Termination handling for bi-directional flows . . . . 33 5. Security Considerations . . . . . . . . . . . . . . . . . . . 38 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 38 Westberg, et al. Expires August 28, 2008 [Page 3] Internet-Draft LC-PCN February 2008 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 38 8. Informative References . . . . . . . . . . . . . . . . . . . . 38 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 40 Intellectual Property and Copyright Statements . . . . . . . . . . 41 Westberg, et al. Expires August 28, 2008 [Page 4] Internet-Draft LC-PCN February 2008 1. Introduction The amount of traffic carried on the Internet is now greater than the traffic on the world's telephony network. Still, Internet-based communication services generate less income than plain old telephony services. Enabling value-added services over the Internet is therefore crucial for service providers. One significant class of such value-added services requires real-time packet transportation. It can be expected that these real-time services will be popular as they replicate or are natural extensions of existing communication services like telephony. Exact and reliable resource management (e.g., admission control) is essential for achieving high utilization in networks with real-time transportation capabilities. The problem is difficult mainly due to scalability issues. With the introduction of differentiated services (DS) [RFC2475], it is now possible to provide large scale, real-time services. The basic idea of DiffServ is that, rather than classifying packets at each router, packets are only classified at the edge devices. The result - the required packet treatment - is stored and carried in the packet headers, and core routers can carry out appropriate scheduling. The current definition of DiffServ, however, does not contain any simple, scalable solution to the problem of resource provisioning and control. A number of approaches to solving the problem already exist [RFC3175], [Berson97], [Stoica99], [Bernet99]. The scheme presented in this document does not require any state aggregation in the core and aims at extreme simplicity and low cost of implementation along with good scaling properties. Load control operates edge-to-edge in a DS domain, or between two RSVP or NSIS capable routers, where only the edge devices keep flow state and do per-flow processing. The main purpose of Load Control is to provide a simple and scalable solution to the resource provisioning problem. The original Load Control concept, submitted in April 2000, [Westberg00], has been developed further to a signaling concept named Resource Management in Diffserv. RMD was incorporated by NSIS working group, where the protocol details were worked out for using NSIS as external protocol [RMD]. Recently new drafts have been submitted aiming to standardize new Diffserv PHB that provides controlled load services in Diffserv domains [CL-PHB], [CL-ARCH], [Babi07], [Char07]. These concepts are very similar to the original two-bit marking scheme of Load Control. We believe that the LC-PCN features supported by, at least, PCN- interior-nodes can be combined with features supported by the above listed concepts. Westberg, et al. Expires August 28, 2008 [Page 5] Internet-Draft LC-PCN February 2008 This document aims to develop a common framework that could be used both with RSVP and NSIS external protocols. The remainder of this draft is structured as follows. After the terminology in Section 2, we give an overview of the LC-PCN in Section 3. In Section 4 we give a detailed description of the LC- PCN. Section 5 discusses security issues. 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. The terms specified in [Eard08] are used. 3. LC-PCN Overview Load Control PCN (LC-PCN) is achieved by two actions: Admission Control and/or Flow Termination. The LC-PCN can be applied within either a single PCN domain, see Figure 1, or multiple neighboring PCN domains, when a trust relationship exists between these multiple PCN domains. PCN-Ingress-Node PCN-Egress-Node (PCN-Interior-Nodes; I-Nodes) | | | | | | V V V +-------+ Data +------+ +------+ +------+ +------+ |-------|--------|------|------|------|-------|------|---->|------| | | Flow | | | | | | | | |Ingress| |I-Node| |I-Node| |I-Node| |Egress| | | | | | | | | | | +-------+ +------+ +------+ +------+ +------+ =================================================> <================================================= Signaling Figure 1: Actors in the LC-PCN 3.1. Admission control Admission control can be accomplished in LC-PCN in two ways: Westberg, et al. Expires August 28, 2008 [Page 6] Internet-Draft LC-PCN February 2008 o Admission control based on data marking: whereby in congestion situations, the admission control is accomplished using excess rate marking and metering to detect and to decide either a new flow request should be accepted or denied. o Admission control based on probing: where probing is required to accomplish the admission control procedure. Note that the two admission control features can be used either independently or combined. The Admission Control features can be applied to flows that are aggregated between PCN-ingress-nodes and PCN-egress-nodes. In this way edge-to-edge (i.e., ingress-egress) pair PCN aggregates can be maintained by PCN-ingress-nodes and PCN- egress-nodes. Two PCN-domain-wide constraints are used. One of them is denoted as "N", used to indicate the proportionality between the measured out of profile packets (or bytes) and the remarked packets (or bytes). If "N" is used in the algorithm, then it must have the same value in all Diffserv nodes that use this mechanism. The parameter N is higher or equal to 1 (N >= 1). Another PCN-domain-wide constraint, see [Char07], has to be used on the ratio U between the configured-admissible-rate on a link and the level of PCN load on the link that should trigger the Flow Termination. This level represents the configured-termination-rate, which is not explicitly configured on the PCN_interior node. The value is typically set to U = 1,2, see [Char07]. Furthermore, it is important to note that in this draft we denote the not congested PCN packets (or bytes) as PCN unmarked packets (or bytes). 3.1.1. Admission control based on data marking The admission control based on data marking is using features located at the PCN_ingress_edge, PCN-interior-node and PCN-egress-node. 3.1.1.1. PCN-interior-node features The PCN-interior-node performs measurements on the PHB aggregated PCN traffic. When the PCN-interior-node detects that the measured PHB aggregated PCN traffic is higher than a preconfigured threshold, say configured-admissible-rate, then it is considered that the PCN- interior-node changes operational state from Normal state to Admission Control state, see Section 3.3.1. Furthermore, the measured PHB aggregated PCN traffic rate that is above the configured-admissible-rate is considered to be excess rate, which is marked using PCN_marking. Westberg, et al. Expires August 28, 2008 [Page 7] Internet-Draft LC-PCN February 2008 This can be accomplished using different metering and marking features. It is important to note that the excess rate measurements SHOULD be done before a queuing mechanism used by a PCN-interior- node, drop packets before/during buffer overflow. The constant N should be used such that the marked excess rate can represent also high levels of excess rate. This means that before marking the excess rate, the measured excess rate should be divided by N (when N >= 1). This can be e.g., implemented by marking every N-th packet (or byte) instead of marking each packet (or byte). The PCN_marking SHOULD be done after the queuing mechanism drops the packets before/during buffer overflow. Several implementation alternatives of this algorithm are possible. One implementation alternative can be based on the algorithms discussed in [Char07]. In particular, a token bucket can be used with the rate configured with the rate equal to configured-admissible-rate. Other implementation alternatives can e.g., be based on rate measurements and marking. In particular, the PCN-interior-nodes packets are using the PCN_marking, whenever the measured PHB aggregated PCN traffic rate exceeds a pre- configured rate threshold denoted as configurable-admissible-rate. It is important to note that, for optimization purposes, the PCN_marking encoded packets SHOULD NOT be preferentially dropped by queuing mechanisms in PCN-interior-nodes. This can be accomplished by for example using two virtual queues. One virtual queue can be used for the PCN unmarked traffic (and PCN_Affected_Marking encoded, when the affected marking solution is supported) and another one for PCN_marking encoded packets. The virtual queues can for example, be implemented by using one Drop Tail physical queue and by maintaining queuing information and also one queuing threshold for each of the virtual queues. The physical queue uses the same scheduling algorithm, but the length of each of the virtual queue defines the packet dropping probability of a virtual queue. 3.1.1.2. PCN-egress-node features The PCN-egress-node measures the rate of the received PHB aggregated PCN unmarked and PCN_marking encoded packets. Based on these measurements, the PCN-egress-node can use a similar functionality as the one specified in [Char07] and [CL-ARCH] to calculate the Congestion Level Estimate (CLE), which is the fraction of the marked traffic received from one PCN-ingress-node. If the value of CLE is higher than a certain value, e.g., 1%, then the PCN-egress-node is changing its operational state from Normal state to Admission Control state. By using an external to PCN, signaling protocol the admission control procedure is accomplished by using a combination of the PCN operational state of the PCN-egress- Westberg, et al. Expires August 28, 2008 [Page 8] Internet-Draft LC-PCN February 2008 node and an admission control request provided by the external to PCN, signaling protocol. When the admission control request arrives at a PCN-egress-node that operates in Admission Control state then the request is rejected. If it operates in Normal state it is accepted. 3.1.1.3. PCN-ingress-node features If the external to PCN signaling protocol is also used by the PCN- ingress-node, then the PCN-ingress-node SHOULD be informed that an admission control request has been admitted or rejected by the PCN- egress-node. If the PCN-ingress-node is notified that the admission request is rejected, then the PCN-ingress-node rejects the admission control request. Otherwise it is accepted. 3.1.2. Admission control based on probing The admission control function based on probing can be used to implement a simple measurement-based admission control within a PCN domain. The main reason of why this admission control feature should be used is to solve the possible ECMP (Equal Cost Multi-Path) issue. Furthermore, this feature can provide admission control support even when the edge-to-edge pair PCN aggregate is not yet initiated at one of the edges. 3.1.2.1. PCN-interior-node features The PCN-interior-node features that are used to detect the PCN operational states are the same as the ones described in Section 3.1.1.1. In this scenario an IP packet is used as a probe packet, meaning that the DSCP (and/or ECN) field, see Section 3.4, in the header of the IP packet is re-marked when the measured PHB aggregated PCN traffic rate exceeds a predefined congestion threshold, i.e, configured-admissible-rate. Note that a message used by an external, to PCN, on path signaling protocol, e.g., RSVP, can be used as a probe packet. The PCN-interior-nodes SHOULD detect a probe packet by observing the Router Alert option, which is carried by the IP packet data packet. Note that a PCN-ingress-node sets the Router Alert option of all packets that are used as probe packets. This also holds for signaling protocol messages (e.g. RSVP PATH message) that are used by LC-PCN as probe packets. Thus if a PCN-interior-node receives a probe packet then, due to the Router Alert option it has to handle it differently than the user packets. The PCN-interior-node has to PCN_marking encode the probe packet if it is operating in Admission Control state (or Flow Termination Westberg, et al. Expires August 28, 2008 [Page 9] Internet-Draft LC-PCN February 2008 state). Otherwise the probe packet does not change its encoding state. 3.1.2.2. PCN-egress-node features The PCN-egress-node measures the rate of the received PHB aggregated PCN unmarked and PCN_marking encoded packets. When the probe packet arrives at the PCN-egress-node that is belonging to a certain edge- to-edge pair PCN aggregate, and it is PCN_marking encoded then the request is rejected. Otherwise it is accepted. Note that if an edge-to-edge pair aggregated state is not available at the PCN-egress-node, then the PCN-egress-node cannot determine whether a PCN-egress-node associated with the edge-to-edge pair PCN aggregate operates in Normal state, Admission Control state or Flow Termination state. However, even in this case, when a probe packet arrives at the PCN-egress-node, then this request should be rejected if the probe packet is PCN_marking encoded. Otherwise, i.e., if the probe packet is not PCN_marking encoded, it should be accepted. 3.1.2.3. PCN-ingress-node features The PCN-ingress-node MUST set the Router_Alert option to all packets that have to be used as probe packets. Furthermore, if the external to PCN signaling protocol is also used by the PCN-ingress-node, then the PCN-ingress-node SHOULD be informed that an admission control request has been admitted or rejected by the PCN-egress-node. If the PCN-ingress-node is notified that the admission request is rejected, then the PCN-ingress-node rejects the admission control request. Otherwise it is accepted. 3.1.3. ECMP solution By using probing, the ECMP (Equal Cost Multi Path) problem that is associated with the admission control feature can be, to a certain degree, solved by being able to identify which flows are passing through the congested node. This is because a probe packet can be PCN_marking encoded only by congested PCN-interior-nodes. Note that the ECMP problem is related to the fact that flows that are not passing through a congested PCN-interior-node can belong to an aggregate that detects a congestion. Any measures that are taken on such flows will not solve the congestion problem, since such flows are not contributing and causing the congestion in the PCN-interior-node. Westberg, et al. Expires August 28, 2008 [Page 10] Internet-Draft LC-PCN February 2008 3.2. Flow Termination The Flow Termination function is able to terminate flows in case of exceptional events, such as severe congestion after re-routing. The exceptional event, or severe congestion can be detected using a remarking approach where the PCN_marking is proportional to the excess rate. The Flow Termination features, similar to the Admission Control features, can be applied to flows that are aggregated between PCN-ingress-nodes and PCN-egress-nodes. In this way edge-to-edge aggregates can be maintained by PCN-ingress-nodes and PCN-egress- nodes. Furthermore, the "N" and "U" PCN-domain-wide constraints, specified in 3.1 are also used during Flow Termination. Moreover, it is important to note that in this draft we denote the not congested PCN packets (or bytes) as PCN unmarked packets (or bytes). 3.2.1. PCN-interior-node The PCN-interior-nodes can support two types of Flow Termination modes, a base mode and an optimization mode. The Flow Termination base mode that is supported by the PCN-interior-nodes can be accomplished using the admission control features described in Section 3.1.1. However, inaccuracies in excess rate measurements might occur due to the delay between the metering and marking events that occur at the PCN-interior-nodes, the decisions that are made at PCN-egress-nodes, and the termination of flows that are performed by PCN-ingress-nodes, see Section 6 of [CsTa05]. In order to reduce these excess rate inaccuracies an optimization feature can be added to the Flow Termination base mode, which is denoted as optimization mode. The main addition that this optimization mode requires is that an additional operational state has to be maintained by the PCN-interior-node, i.e., a Flow Termination state, see Section 3.3.1. In particular, when the measured PHB aggregated PCN traffic is higher than the threshold equal to (U * configured-admissible-rate), then the PCN-interior-node changes from the Admission Control state to the Flow Termination state. 3.2.2. PCN-egress-node The PCN-egress-node measures the rate of the received PHB aggregated PCN unmarked and PCN_marking encoded packets (or bytes). Westberg, et al. Expires August 28, 2008 [Page 11] Internet-Draft LC-PCN February 2008 The PCN-egress-node uses a similar functionality as discussed in [Char07] to activate/trigger the Flow Termination with a trigger computed from the ratio of the PCN_marking encoded and PCN unmarked packets (or bytes). It is important that the format of the equation of the above described ratio between the PCN_marking encoded and PCN unmarked traffic, depends on the values of the U and the configurable-admissible-rate at the PCN-interior-nodes. In particular if the product U * configurable-admissible-rate is smaller than the packet dropping threshold used by queuing mechanisms at the PCN_interior nodes, then the above ratio is calculated by dividing the product N * PCN_marking encoded rate of packets (or bytes) and the rate of PCN unmarked packets (or bytes). If the product (U * configurable-admissible-rate) is equal or higher than the packet dropping threshold, then the above ratio (i.e., ratio of the PCN_marking encoded and PCN unmarked packets (or bytes)) is calculated by dividing the product N * PCN_marking encoded rate of packets (or bytes) and the sum of the rates of N * PCN_marking encoded and PCN unmarked packets (or bytes). Note that the rate of PCN_marking packets (or bytes) has to be multiplied with N to match the division by N performed by the PCN-interior-node. The trigger is detected when the above given ratio between the PCN_marking encoded and PCN unmarked encoded packets (or bytes) is higher than the value of (U-1), see [Char07]. When this trigger is detected then the PCN-egress-node has to store the value of the absolute rate of the PCN_marking encoded traffic, say: configured- termination-rate-egress, see also Section 3.3.2. The N * (measured excess rate) that is above this threshold, is used to calculate the number of flows to be terminated, such that the excess rate is severely reduced until it drops below the Flow Termination trigger. Note that the excess rate and the flows to be terminated are associated with the same edge-to-edge (i.e., ingress-egress) pair PCN aggregate. For the flows that should be terminated the PCN-egress- node informs the associated PCN-ingress-node to terminate them. 3.2.3. PCN-ingress-node The flows that are Flow Termination notified by the PCN_egress-node have to be terminated by the PCN_ingress-node. Furthermore, the PCN- ingress-node rejects all new flow admission requests that are associated with the same edge-to-edge pair PCN aggregate. In addition, the PCN_ingress-node informs the associated flow sender about the occurred exceptional/severe congestion. Westberg, et al. Expires August 28, 2008 [Page 12] Internet-Draft LC-PCN February 2008 3.2.4. ECMP solution In order to solve the ECMP issue that may occur during Flow Termination operational state, the LC-PCN solution could use an additional PCN marking encoding approach, denoted as: PCN_Affected_Marking. This means that the descriptions of Section 3.2.1 and 3.2.2 have to be slightly modified. Regarding the description provided in section 3.2.1, the PCN_Affected_Marking is used in the PCN-interior-node in the following way. When the measured PHB aggregated PCN traffic is higher than the threshold equal to (U*configured-admissible-rate), then the PCN-interior-node changes from the Admission Control state to Flow Termination state, see Section 3.3.2. In Flow Termination state, the PCN-interior-node encodes all PCN unmarked (i.e., not congested PCN encoded) packets that are passing through the PCN- interior-node by using the PCN_Affected_Marking. Regarding the PCN-egress-node description provided in section 3.2.2, the rate of the PCN unmarked (or bytes), has to be replaced, by the rate of PCN_Affected_Marking encoded packets (or bytes). This rate is used to calculate the Flow Termination trigger at the PCN-egress- node. Furthermore, the PCN-egress-node uses the PCN_Affected_Marking to identify which flows were affected by the exceptional/severe congestion. In this way the PCN-egress-node, when operating in Flow Termination state, is able to terminate only the flows that received one or more PCN_Affected_Marking packets. 3.3. Operational states in LC-PCN This section describes the LC-PCN operational states that are used to identify when and how a PCN node is triggered to either remain or change into an operational state, i.e., Normal, Admission Control and Flow Termination. 3.3.1. Operational states in PCN-interior-nodes Per each PHB supported with the PCN domain, the PCN-interior-node supports the operational states diagram depicted in Figure 2. Westberg, et al. Expires August 28, 2008 [Page 13] Internet-Draft LC-PCN February 2008 --------------------------------------------- | event B | | V ---------- ------------- ---------- | Normal | event A | Admission | event B | Flow | | state |---------->| Control |-------->|Termination| | | | state | | state | ---------- ------------- ---------- ^ ^ | | | | event C | | | ----------------------- | | event D | ------------------------------------------------ Figure 2: States of Operation The terms used in Figure 2 and applied for PCN-interior-nodes are: * Normal state: represents the normal operation conditions of the node, i.e. no congestion * Flow Termination state: this state is only applied either when the optimization mode solution is applied to solve the excess rate measurement inaccuracies, see Section 3.2.1, or when the ECMP solution described in Section 3.2.4 is used. This state represents the state related to a certain PHB when the PCN-interior-node is severely congested. * Admission Control state: state where the load is relatively high, close to the level when pre-congestion can occur * event A: this event occurs when the measured PHB aggregated PCN traffic is higher than the configured-admissible-rate. The measured PHB aggregated PCN traffic rate that is above the configured- admissible-rate is considered to be excess rate, which is encoded using PCN_marking. * event B: this event is only applied either when the optimization solution is applied to solve the excess rate measurement inaccuracies or when the ECMP solution is used. This event occurs when the measured PHB aggregated PCN traffic is higher than the threshold equal to (U * configured-admissible-rate). * event C: this event occurs when the measured PHB aggregated PCN traffic is equal or lower than the configured-admissible-rate. * event D: this event is only applied either when the optimization solution is applied to solve the excess rate measurement inaccuracies Westberg, et al. Expires August 28, 2008 [Page 14] Internet-Draft LC-PCN February 2008 or when the ECMP solution is used. This event occurs when the measured PHB aggregated PCN traffic is equal or lower than the threshold equal to (U * configured-admissible-rate). 3.3.2. Operational states in PCN-egress-nodes Per edge-to-edge (ingress-egress) PCN aggregate, the PCN-egress-node supports the same operational states diagram as depicted in Figure 2. The terms used in Figure 2 and applied for PCN-egress-nodes are: * Normal state: represents the normal operation conditions of the node, i.e. no congestion. * Flow Termination state: it represents the state related to a certain edge-to-edge (ingress-egress) pair PCN aggregate to identify the situation that a severe/exceptional event occurred and ongoing flows need to be terminated in order to solve this severe congestion. * Admission Control state: state where the load is relatively high, close to the level when pre-congestion can occur. * event A: this event is activated when the ratio between the incoming_PCN_marked_rate and the total received PHB aggregated PCN traffic is higher than a predefined value, e.g., 1%, see [Char07]. Where: o incoming_PCN_marking_rate: the rate of the PCN_marking encoded packets (or bytes) and it is equal to the product of N and the measured incoming PCN_marking rate The measured rate of PCN_marking encoded packets (or bytes) has to be multiplied with N to match the division by N performed by the PCN-interior-node. o Total received PCN traffic is the sum of the received incoming_PCN_marking_rate and the PCN unmarked_rate. The PCN unmarked_rate is the rate of the not congested PCN packets (or bytes). When PCN_Affected_Marking is used, see Section 3.2.4, then this rate represents the measured PCN_Affected_Marking encoded packets (or bytes). * event B: is activated comparing the ratio of the PCN_marked and PCN unmarked packets (or bytes) to a certain preconfigured threshold. An example of such a threshold is given in [Char07], with a value of (U-1). The format of the equation of the above described ratio between the PCN marked and PCN unmarked traffic may depend on the values of the U and the configurable-admissible-rate at the PCN- interior-nodes. In order to describe the conditions that influence the event B trigger we use a pseudo code description: Westberg, et al. Expires August 28, 2008 [Page 15] Internet-Draft LC-PCN February 2008 If ((U * configured-admissible-rate) < Dropping_threshold)) THEN { IF incoming_PCN_marking_rate/unmarked_rate > (U - 1) THEN { activate event B } } ELSE { IF incoming_PCN_marking_rate/(unmarked_rate + incoming_PCN_marking_rate) >(U - 1) THEN { activate event B } } Dropping_threshold = packet dropping threshold used by queuing mechanisms at the PCN_interior nodes unmarked_rate is the rate of the not congested PCN packets (or bytes). When PCN_Affected_Marking is used, see Section 3.2.4, then this rate represents the measured PCN_Affected_Marking encoded packets (or bytes). When this trigger is detected then the egress has to store the value of the absolute rate of the PCN_marking encoded traffic, say: configured-termination-rate-egress * event C: this event occurs when the ratio between the incoming_PCN_marked_rate and the total received PCN traffic is lower or equal than the predefined value used to trigger event A. * event D: this event is activated by comparing the ratio of the PCN_marked and PCN unmarked packets (or bytes) to a certain preconfigured threshold, see event B. In order to describe the conditions that influence the event D trigger, we use a pseudo code description, see also event B. Westberg, et al. Expires August 28, 2008 [Page 16] Internet-Draft LC-PCN February 2008 If ((U * configured-admissible-rate) < Dropping_threshold)) THEN { IF incoming_PCN_marking_rate/unmarked_rate =< (U - 1) THEN { activate event D } } ELSE { IF incoming_PCN_marking_rate/(unmarked_rate + incoming_PCN_marking_rate) =< (U - 1) THEN { activate event D } } 3.3.3. Operational states in PCN-ingress-nodes Per each edge-to-edge pair of PCN aggregates the PCN-ingress-nodes support the same operational states diagram as depicted in Figure 2. The terms used in Figure 2 and applied for PCN-ingress-nodes are: * Normal state: represents the normal operation conditions of the node, i.e. no congestion. * Flow Termination state: it represents the state related to a certain edge-to-edge (ingress-egress) pair PCN aggregate to identify the situation that a severe/exceptional event occurred and ongoing flows need to be terminated in order to solve this severe congestion. In Flow Termination, the PCN-ingress-node SHOULD block all new admission flow requests that are associated with the same edge-to- edge pair of PCN aggregates. * Admission Control state: state where the load is relatively high, close to the level when pre-congestion can occur. The PCN-ingress- node rejects a flow that is requesting admission to the PCN domain. * event A: this event occurs when the PCN-ingress-node receives a response from the PCN-ingress-node that a flow that is requesting admission to the PCN domain is rejected. * event B: this event occurs when the PCN-ingress-node receives one response from the PCN-ingress-node that a flow has to be terminated due to the fact that the PCN-ingress-node operates in the Flow Termination operational state. * event C: this event occurs after the PCN-ingress-node rejected the Westberg, et al. Expires August 28, 2008 [Page 17] Internet-Draft LC-PCN February 2008 flow that was requesting admission and informed the flow sender about it. * event D: this event occurs when the PCN-ingress-node does not receive anymore responses from the PCN-ingress-node that flows have to be terminated. 3.4. Encoding of PCN traffic The encoding that can be used for LC-PCN can be based either on DSCP or on a combination between ECN and DSCP IP fields. In the current version of the draft it is assumed that the encoding is based on only the DSCP IP field. In particular, the encoding can be accomplished in the following way. The PCN traffic can be distinguished from the non PCN traffic by using a first additional DSCP, say not_congested_PCN_DSCP, to identify the not congested PCN traffic. The single marking state used during PCN_marking encoding can use a second additional DSCP, say PCN_marking_DSCP. When the PCN_Affected_Marking is used then an additional third DSCP is needed, say PCN_Affected_Marking_DSCP. The first, second and third additional DSCP values are representing DSCP values that are assigned by IANA as DSCP experimental values. It is important to note that when the LC-PCN is applied in multiple neighboring PCN domains where a trust relationship exists between these multiple PCN domains and a packet is received by the edge router of another trusted domain (new PCN domain, that might be managed by another operator), remarking of the not_congested_PCN_DSCP, PCN_marking DSCP and PCN_Affected_Marking DSCP to other DSCPs, say not_congested_PCN_new_DSCP, PCN_marking_new_DSCP and PCN_Affected_Marking_new_DSCP, respectively, might be necessary. This is because the neighbor PCN operator may use different Diffserv mapping schemes. When DSCP is used for PCN encoding and no trust relationships exist between the PCN-domains, then for packets that are forwarded outside the PCN-domain, the PCN-egress-nodes and PCN-ingress-nodes SHOULD restore the original DSCP values of the PCN remarked packets, otherwise multiple actions for the same event might occur. This value MAY be left in its remarking form if there is an SLA agreement between domains that a downstream domain handles the remarking problem. When no trust relationship exists between multiple neighboring PCN domains then the PCN-ingress-nodes SHOULD PCN encode the incoming traffic that is used as incoming PCN traffic using the not congested PCN DSCP. Westberg, et al. Expires August 28, 2008 [Page 18] Internet-Draft LC-PCN February 2008 4. LC-PCN detailed description This section describes the details of the used LC-PCN algorithms. Section 4.1, 4.2 and 4.3 describe the "Admission control based on data marking", "Admission control based on probing" and "Flow Termination" scenario, respectively, for the situation that the end- to-end sessions are using unidirectional reservations. Section 4.4 describes the two admission control procedures and Section 4.5 describes the flow termination scenario for the situation that the end-to-end sessions are using bi-directional reservations. 4.1. Admission control based on data marking for unidirectional flows This type of admission control uses excess rate marking and metering to provide admission control for unidirectional flows. In pre- congestion situations the data packets are marked to notify the PCN- egress-node that a congestion occurred on a particular PCN-ingress- node to PCN-egress-node path. 4.1.1. Operation in PCN-interior-nodes The PCN-interior-node performs measurements on the PHB aggregated PCN traffic, see Figure 3, and changes operational state from Normal to Admission Control state when the event A trigger occurs, see Section 3.3.1. As mentioned in Section 3.1.1.1, the measured aggregated PCN traffic rate that is above the configured-admissible-rate is considered to be excess rate, which is marked using PCN_marking. When the PCN- interior-node operates in Admission Control state and the configured- admissible-rate is exceeded then PCN unmarked packets are proportionally to the excess rate re-marked, using the PCN_marking encoding, see event A, in Section 3.3.1. The above described functionalities can be accomplished using different metering and marking features. Several implementation alternatives of this algorithms are possible. One implementation alternative can be based on the algorithms discussed in [Char07]. In particular, a token bucket can be used with the rate configured with the rate equal to configured-admissible-rate. Another implementation alternative can for example be based on rate measurements and marking. In particular, the PCN-interior-nodes packets using the PCN_marking, whenever the measured PHB aggregated PCN traffic rate exceeds a pre-configured rate threshold denoted as configurable-admissible-rate. An example of the detailed operation of this later procedure is described below. The predefined configured-admissible-rate, see Section 3.1.1.1 is set according to, Westberg, et al. Expires August 28, 2008 [Page 19] Internet-Draft LC-PCN February 2008 and usually less than, an engineered bandwidth limitation, i.e., real admission threshold, based on e.g. agreed Service Level Agreement or a capacity limitation of specific links. The difference between the configured-admissible-rate and the engineered bandwidth limitation, i.e., real admission threshold, provides an interval where the signaling information on resource limitation is already sent by a node but the actual resource limitation is not reached. During admission control the PCN-interior-node calculates, per traffic class (PHB), the incoming rate that is above configured- admissible-rate, denoted as signaled_overload_rate, in the following way: * before queuing and eventually dropping the packets, at the end of each measurement interval of T seconds, the PCN-interior-node should count the total number of PCN unmarked, PCN_marking (and PCN_Affected_Marking bytes, when the ECMP solution is used, see Section 3.2.4) received. Denote this number as total_received_bytes. Note that there are situations when more than one PCN-interior-nodes in the same communication path become admission control congested and operate in Admission Control state. Therefore, any PCN-interior-node located behind a PCN- interior-node that operates in Admission Control state may receive PCN_marking (and PCN_Affected_Marking, when the ECMP solution is used, see Section 3.2.4) bytes. Then the PCN-interior-node calculates the current estimated excess rate (i.e., overloaded rate), say signaled_overload_rate, by using the following equation: signaled_overload_rate = ((total_received_bytes) / T) - configured-admissible-rate) To provide reliable estimation of the encoded information several techniques can be used, see [AtLi01], [AdCa03], [ThCo04], [AnHa06]. The bytes that have to be remarked to satisfy the signaled overload rate, e.g., signaled_remarked_bytes, are calculated as follows: Westberg, et al. Expires August 28, 2008 [Page 20] Internet-Draft LC-PCN February 2008 IF (measured PHB rate > configured-admissible-rate THEN { IF (incoming_PCN_marking_rate <> 0) THEN { signaled_remarked_bytes = ((signaled_overload_rate - incoming_PCN_marking_rate) * T) / N } ELSE signaled_remarked_bytes = signaled_overload_rate * T / N } Where the "incoming_PCN_marking_rate" is calculated as follows: incoming_PCN_marking_rate = N * (input_PCN_marking_bytes) / T where input_PCN_marking_bytes represents the measured number of bytes carried by PCN_marking encoded packets. When incoming PCN_marking encoded packets (or bytes) are dropped, the operation of the admission control algorithm may be affected, e.g., the algorithm may become in certain situations slower. An implementation of the algorithm may assure as much as possible that the incoming PCN_marking encoded packets (or bytes) are not dropped. This could for example be accomplished by using different dropping rate thresholds for PCN_marking encoded and PCN unmarked (and PCN_Affected_Marking encoded, when ECMP solution is used) bytes, see Section 3.1.1.1. 4.1.2. Operation in PCN-egress-nodes The PCN-egress-node measures the rate of the received PHB aggregated PCN unmarked and PCN_marking marked packets. The measurements on the PCN unmarked and unmarked traffic can be implemented using several alternatives. One alternative is to use a similar functionality as the one specified in [Char07] and [CL-ARCH] to calculate the Congestion Level Estimate (CLE), which is the fraction of the marked traffic received from one PCN-ingress-node. If the value of CLE is higher than a certain value, e.g., 1%, then the PCN-egress-node is changing its operational state from Normal state to Admission Control state, see Section 3.3.2. Another alternative is to use other type of rate measurements, than the Exponential Weighted Moving Average (EWMA) measurements used to calculate the CLE. The detailed description of this procedure is given below. In this case, during a measurement interval T, the PCN- Westberg, et al. Expires August 28, 2008 [Page 21] Internet-Draft LC-PCN February 2008 egress-node measures the input_PCN_marking_bytes by counting, during the interval T, the bytes that are carried in PCN_marking encoded packets. The incoming_PCN_marking_rate can be then calculated as follows: incoming_PCN_marking_rate = N * input_PCN_marking_bytes / T where input_PCN_marking_bytes represents the measured number of bytes carried by the PCN_marking encoded packets To provide reliable estimation of the encoded information several techniques can be used, see [AtLi01], [AdCa03], [ThCo04], [AnHa06]. If the ratio: incoming_PCN_marking_rate/(incoming_PCN_marking_rate + PCN unmarked rate) is higher than a predefined value, say 1% then the communication path between PCN-ingress-node and PCN-egress-node is considered to be pre-congested. PCN-ingress-node PCN-interior-node PCN-interior-node PCN-egress-node user | | | | data | user data | | | ------>|----------------->| user data | | | |---------------->| user data | | | |----------------->| user | | | | data | user data | | | ------>|----------------->| user data | user data | | |---------------->S(# marked bytes) | | | S----------------->| | | S(# unmarked bytes)| | | S----------------->| | | S | request for reservation | S | ------->| probe packet S | |----------------------------------->S | | | S probe packet | | | S----------------->| | |response | |<------------------------------------------------------| response | | | <------| | | | Figure: 3 Admission control based on data marking and probing The admission control procedure is accomplished by using a Westberg, et al. Expires August 28, 2008 [Page 22] Internet-Draft LC-PCN February 2008 combination of the PCN operational state of the PCN-egress-node and an admission control request provided by an external to PCN, signaling protocol. When the admission control request arrives at a PCN-egress-node that operates in admission control state then the request SHOULD be rejected. If it operates in Normal state it SHOULD be accepted. When DSCP is used for PCN encoding and no trust relationships exist between the PCN-domains, then for packets that are forwarded outside the PCN-domain, the PCN-egress-node SHOULD restore the original DSCP values of the PCN remarked packets, otherwise multiple actions for the same event might occur, see Section 3.4. 4.1.3. Operation in PCN-ingress-nodes The PCN-ingress-node can receive a reservation request message belonging to an external to PCN, signaling protocol, e.g., RSVP. This reservation request message can be used during the admission control process. If the PCN-ingress-node receives a response, from the PCN-egress-node, which notifies that the reservation request message belonging to the external signaling protocol was successfully processed, then the reservation request SHOULD be admitted. Otherwise it SHOULD be rejected, see Section 3.3.3. Both situations SHOULD be notified to the sender of the flow. When DSCP is used for PCN encoding and no trust relationships exist between the PCN-domains, then for packets that are forwarded outside the PCN-domain, the PCN-ingress-node SHOULD restore the original DSCP values of the PCN remarked packets, otherwise multiple actions for the same event might occur, see Section 3.4. Furthermore, when the DSCP encoding is used to encode the not congested PCN state, see Section 3.4, then the PCN- ingress-node SHOULD remark to not congested PCN encoding state, all incoming to PCN domain, packets associated to flows that need to use the LC-PCN features. 4.2. Admission control based on probing for unidirectional flows This type of admission control uses probing, whereby a probe packet is sent along the forwarding path in a network to determine whether a unidirectional flow can be admitted based upon the current congestion state of the network. In pre-congestion situations the probe packets are PCN_marking encoded to notify the PCN-egress-node that a congestion occurred on a particular PCN-ingress-node to PCN-egress- node path. The Admission control based on probing feature is used to solve the ECMP issue that might occur during the process of admission control, see Section 3.1.3. Westberg, et al. Expires August 28, 2008 [Page 23] Internet-Draft LC-PCN February 2008 4.2.1. Operation in PCN-interior-nodes The PCN-interior-node features that are used to detect the PCN operational states, are the same as the ones described in section 4.1.1. In this scenario an IP packet is used as a probe packet, see Figure 3. A probe packet that passes through a PCN-interior-node that operates in Admission Control state (or in Flow Termination state, when either the Flow Termination optimization mode or the ECMP solution described in Section 3.2.4 are used) MUST remark the PCN unmarked encoded probe packet to PCN_marking encoded probe packet. The PCN-interior-nodes SHOULD detect a probe packet by observing the Router Alert option, which is carried by the probe packet. Note that a PCN-ingress-node sets the Router Alert option of all packets that are used as probe packets. This also holds for signaling protocol messages that are used by LC-PCN as probe packets. Thus if a PCN- interior-node receives a probe packet then, due to the Router Alert option it has to handle it differently then the user packets. If the PCN-interior-node operates in Admission Control state (or in Flow Termination state, when either the Flow Termination optimization mode or the ECMP solution described in Section 3.2.4 are supported) then PCN-interior-node SHOULD PCN_marking encode the probe packet. Otherwise, the encoding state of the probe packet SHOULD NOT change. 4.2.2. Operation in PCN-egress-nodes The PCN-egress-node measures the rate of the received aggregated PCN unmarked and PCN_marking encoded packets. When the probe packet arrives at the PCN-egress-node that is belonging to a certain ingress-egress PCN aggregate, and it is PCN_marking encoded then the request SHOULD be rejected. In this way it is ensured that the probe packet passed through the node that it is congested and therefore, it can be used to solve the associated ECMP issue, see Section 3.4. This feature is very useful when ECMP based routing is used to detect only flows that are passing through the pre- congested router. Note that even when no edge-to-edge pair PCN aggregate state is available at the PCN-egress-node and when a probe packet arrives at the PCN- egress-node, then this request SHOULD be rejected if the probe packet is PCN_marking encoded. Otherwise, i.e., if the probe packet is not PCN_marking encoded, it SHOULD be accepted. When DSCP is used for PCN encoding and no trust relationships exist between the PCN- domains, then for packets that are forwarded outside the PCN-domain, the PCN-egress-node SHOULD restore the original DSCP values of the PCN remarked packets, otherwise multiple actions for the same event might occur, see Section 3.4. Westberg, et al. Expires August 28, 2008 [Page 24] Internet-Draft LC-PCN February 2008 4.2.3. Operation in PCN-ingress-nodes Similar, to Section 4.1.3, the PCN-ingress-node can receive a reservation request message belonging to an external to PCN, signaling protocol, e.g., RSVP PATH message. Subsequently, the PCN- ingress-node sends a probe packet, see Figure 3, towards the PCN- egress-node. When RSVP is used, the RSVP PATH message is the probe packet. Note that the probe packet should use the same flow ID information and encoding state (ECN and/or DSCP) as the data packets associated with the received reservation request message. The PCN- ingress-node SHOULD set the Router Alert option carried by the probe packet. Note that probe packets can be either user data packets or messages used by an external, to PCN, on path signaling protocol, e.g., RSVP PATH. If the PCN-ingress-node receives a response that notifies that the probe was successfully processed, then the reservation request is admitted. In case of RSVP, the response is RSVP RESV message. Otherwise it is rejected, see Section 3.3.3. Both situations have to be notified to the sender of the flow. When DSCP is used for PCN encoding and no trust relationships exist between the PCN-domains, then for packets that are forwarded outside the PCN-domain, the PCN-ingress-node SHOULD restore the original DSCP values of the PCN remarked packets, otherwise multiple actions for the same event might occur, see Section 3.4. Furthermore, when the DSCP encoding is used to encode the not congested PCN state, see Section 3.4, then the PCN- ingress-node SHOULD remark to not congested PCN encoding state, all incoming to PCN domain, packets associated to flows that need to use the LC-PCN features. 4.3. Flow Termination for unidirectional flows The Flow Termination handling method requires the following functionalities. 4.3.1. Operation in the PCN-interior-node The PCN-interior-nodes can measure the PHB aggregated PCN traffic that exceeds a configured-admissible-rate and mark this excess PCN traffic, see Figure 4. This can be accomplished using different metering and marking features, see Section 4.1.1. The admission control features described in Section 4.1.1 can be applied also for the situation that the PCN-interior-node operates in the base mode of the Flow Termination state. However, in the Flow termination base mode, inaccuracies in excess Westberg, et al. Expires August 28, 2008 [Page 25] Internet-Draft LC-PCN February 2008 rate measurements might occur due to the delay between the metering and marking event that occurs at the PCN-interior-nodes, the decisions that are made at PCN-egress-nodes, and the termination of flows that are performed by PCN-ingress-nodes, see section 6 in [CsTa05]. Furthermore, until the overload decreases at the PCN-interior-node that operates in Flow Termination state, an additional trip time from the PCN-ingress-node to this PCN-interior-node must expire. This is because immediately before receiving the flow termination notification, the PCN-ingress-node may have sent out packets in the flows that were selected for termination. That is, a terminated flow may contribute to congestion for a time longer that is taken from the PCN-ingress-node to the PCN-interior-node. Without considering the above, PCN-interior-nodes would continue marking the packets until the measured utilization falls below the flow termination threshold. In this way, at the end more flows will be terminated than necessary, i.e., an over-reaction takes place. In order to reduce these inaccuracies an optimization mode can be added to the base mode of Flow Termination. PCN-ingress-node PCN-interior-node PCN-interior-node PCN-egress-node user | | | | data | user data | | | ------>|----------------->| user data | user data | | |---------------->S(# marked bytes) | | | S----------------->| | | S(# unmarked bytes)| | | S----------------->|Term. | notification for termination |flow? |<-----------------|-----------------S------------------|YES release | S | | -----------------|----------------------------------->| | | | | Figure: 4 LC-PCN Flow Termination handling 4.3.1.1. Optimization mode features for Flow termination In order to solve the excess rate inaccuracies, an additional optimization mode feature can be used, see also [CsTa05]. The main addition that this optimization mode solution requires is that an additional operational state has to be maintained by the PCN- interior-node, i.e., a Flow Termination state, see Section 3.4.1. In particular, when the measured PHB aggregated PCN traffic is higher than the threshold equal to (U * configured-admissible-rate), then the PCN-interior-node changes from the Admission Control state to the Westberg, et al. Expires August 28, 2008 [Page 26] Internet-Draft LC-PCN February 2008 Flow Termination state. Furthermore, when operating in Flow Termination state, the PCN- interior-nodes use a sliding window memory to keep track of the signaling overload in a couple of previous measurement intervals. At the end of a measurement intervals, T, before encoding and signaling the excess rate as PCN_marking encoded packets, the actual overload is decreased with the sum of already signaled overload stored in the sliding window memory, since that overload is already being handled in the flow termination handling control loop. The sliding window memory consists of an integer number of cells, i.e, n = maximum number of cells. Guidelines for configuring the sliding window parameters are given in [CsTa05]. At the end of each measurement interval, the newest calculated overload is pushed into the memory, and the oldest cell is dropped. If Mi is the overload_rate stored in ith memory cell (i = [1..n]), then at the end of every measurement interval, the overload rate that is signaled to the PCN-egress-node, i.e., signaled_overload_rate is calculated as follows: Sum_Mi =0 For i =1 to n { Sum_Mi = Sum_Mi + Mi } signaled_overload_rate = measured_overload_rate - Sum_Mi, where Sum_Mi is calculated as above. Next, the sliding memory is updated as follows: for i = 1..(n-1): Mi < - Mi+1 Mn < - signaled_overload_rate The bytes that have to be remarked to satisfy the signaled overload rate: signaled_remarked_bytes, are calculated as follows: Westberg, et al. Expires August 28, 2008 [Page 27] Internet-Draft LC-PCN February 2008 IF (measured PHB rate > U * configured-admissible-rate) THEN { IF (incoming_PCN_marking_rate <> 0) THEN { signaled_remarked_bytes = ((signaled_overload_rate - incoming_PCN_marking_rate) * T) / N } ELSE signaled_remarked_bytes = signaled_overload_rate * T / N } The incoming_PCN_marking_rate can be calculated as follows: incoming_PCN_marking_rate = N * input_PCN_marking_bytes / T where input_PCN_marking_bytes represents the measured number of bytes carried by PCN_marking encoded packets. The signal_remarked_bytes represents also the number of the outgoing packets (after the dropping stage) that SHOULD be remarked, during each measurement interval T, by a node when operates in Flow Termination state. 4.3.1.2. ECMP solutions As discussed in Section 3.2.4, the ECMP issue that may occur during Flow Termination operational state, could be solved by using an additional PCN marking encoding approach, denoted as: PCN_Affected_Marking. In this case both the Flow Termination base and optimization modes have to be slightly modified. When the measured PHB aggregated PCN traffic is higher than the threshold equal to (U * configured- admissible-rate), then the PCN-interior-node changes from the Admission Control state to Flow Termination state, see Section 3.3.1. Furthermore, in Flow Termination state, the PCN-interior-node marks all PCN unmarked (i.e., not congested PCN encoded) packets that are passing through the PCN-interior-node. 4.3.2. Operation in PCN-egress-nodes The PCN-egress-node measures the rate of the received PHB aggregated PCN unmarked and PCN_marking encoded packets, see Figure 4. To activate/trigger the Flow Termination a similar functionality can be used as discussed in [Char07], see Section 3.2.2 and 3.3.2. The Westberg, et al. Expires August 28, 2008 [Page 28] Internet-Draft LC-PCN February 2008 implementation of the Flow Termination algorithm can be accomplished in the following way. The PCN-egress-node node applies a predefined policy to solve the flow termination situation, by selecting a number of inter-domain (end-to-end) flows that should be terminated, or forwarded in a lower priority queue. Some flows, belonging to the same PHB traffic class might get other priority than other flows belonging to the same PHB traffic class. It is considered that this difference in priority can be notified by a signaling protocol and that the PCN-edge-nodes can store and maintain the priority information releted to each of the end-to-end flows. The terminated flows are selected from the flows belonging to the same edge-to-edge pair PCN aggregate and having the same PHB traffic class as the PHB of the PCN_marking encoded packets (and PCN_Affected_Marking encoded packets, when the ECMP solution is used). For flows associated with the same PHB traffic class the priority of the flow plays a significant role. An example of calculating the number of flows associated with each priority class that have to be terminated is described below. An example of the algorithm for the calculation of the number of flows, belonging to the same edge-to-edge pair PCN aggregate and associated with each priority class that have to be terminated is described using the pseudocode below. First, when the PCN-egress- node operates in the Flow Termination state, see Section 3.4.2, then the total amount of PCN_marking rate, per edge-to-edge pair PCN aggregate, associated with the PHB traffic class, say incoming_PCN_marking_rate, is calculated. This rate represents per edge-to-edge pair PCN aggregate, the flow termination bandwidth, that should be terminated. The incoming_PCN_marking_rate can be calculated as follows: incoming_PCN_marking_rate = N * input_PCN_marking_bytes / T where input_PCN_marking_bytes represents the measured number of bytes carried by PCN_marking encoded packets. To provide reliable estimation of the encoded information several techniques can be used, see [AtLi01], [AdCa03], [ThCo04], [AnHa06]. The term denoted as terminated_bandwidth in the below pseudocode is a temporal variable representing the total bandwidth that have to be terminated, belonging to the same PHB traffic class. The Westberg, et al. Expires August 28, 2008 [Page 29] Internet-Draft LC-PCN February 2008 terminate_flow_bandwidth(priority_class) is the total of bandwidth associated with flows of priority class equal to priority_class. The parameter priority_class is an integer fulfilling 0 < priority_class =< Maximum_priority. Note that if the PCN domain does not support priority differentiation then the variable Maximum_priority SHOULD be equal to 0. The calculate_terminate_flows(priority_class) function determines the flows for a given priority class and per PHB that has to be terminated. This function also calculates the term sum_bandwidth_terminate(priority_class), which is the sum of the bandwith associated with the flows that will be terminated. The constraint of finding the total number of flows that have to be terminated is that sum_bandwidth_terminate(priority_class), should be smaller or approximatelly equal to the variable terminate_bandwidth(priority_class). terminated_bandwidth = 0; priority_class = 0; while terminated_bandwidth < termination_PCN_marking_rate { terminate_bandwidth(priority_class) = termination_PCN_marking_rate - terminated_bandwidth calculate_terminate_flows(priority_class); terminated_bandwidth = sum_bandwidth_terminate(priority_class) + terminated_bandwidth; priority_class = priority_class + 1; } where: * termination_PCN_marking_rate = incoming_PCN_marking_rate - configured-termination-rate-egress * configured-termination-rate-egress is defined in Section 3.2.2 and in the description of event B in Section 3.3.2. For the end-to-end flows (sessions) that have to be terminated, the PCN-egress-node SHOULD generate and send notification message to the PCN-ingress-node to indicate the flow termination in the communication path. Furthermore, for the aggregated sessions that are affected, the PCN-egress-node SHOULD send within a notify message the to be released bandwidth, associated with the edge-to-edge pair PCN aggregated state. When DSCP is used for PCN encoding and no trust relationships exist between the PCN-domains, then for packets Westberg, et al. Expires August 28, 2008 [Page 30] Internet-Draft LC-PCN February 2008 that are forwarded outside the PCN-domain, the PCN-egress-node SHOULD restore the original DSCP values of the PCN remarked packets, otherwise multiple actions for the same event might occur, see Section 3.4. 4.3.2.1. ECMP solutions When the ECMP solution is used by the PCN-egress-node then the following modifications are required. The rate of the PCN unmarked (or bytes), used on the calculations of the event that triggers the Flow Termination state, see Section 3.3.2 has to be replaced, by the rate of PCN_Affected_Marking encoded packets (or bytes). Furthermore, the PCN-egress-node uses the PCN_Affected_Marking to identify which flows were affected by the exceptional/severe congestion. In this way the PCN-egress-node, when operating in Flow Termination state, see Section 4.3.2, is able to terminate only the flows that received one or more PCN_Affected_Marking packets. 4.3.3. Operation in PCN-ingress-nodes Upon receiving the notification message sent by the PCN-egress-node, the PCN-ingress-node resolves the flow termination congestion by a predefined policy, e.g., by refusing new incoming flows (sessions), terminating the affected and notified flows (sessions), and blocking their packets or shifting them to an alternative LC-PCN traffic class (PHB). This operation is depicted in Figure 4, where the PCN- ingress- node, for each flow (session) to be terminated, receives a notification message. When the PCN-ingress-node receives the notification message, it starts the termination of the flows within the LC-PCN domain by e.g., sending external to PCN, release signaling messages. Furthermore, the PCN-ingress-node SHOULD reject all new flow admission requests that are associated with the same edge-to-edge pair PCN aggregate, see Section 3.3.3. When the PCN-ingress-node receives the notification message that contains the to be released aggregation bandwidth, it can use it to resize the size of the aggregation size accordingly. The functionality required to resize the edge-to-edge pair PCN aggregated state is out of the scope of PCN. When DSCP is used for PCN encoding and no trust relationships exist between the PCN-domains, then for packets that are forwarded outside the PCN-domain, the PCN-ingress-node SHOULD restore the original DSCP values of the PCN remarked packets, otherwise multiple actions for the same event might occur, see Section 3.4. Furthermore, when the DSCP encoding is used to encode the not congested PCN state, see Westberg, et al. Expires August 28, 2008 [Page 31] Internet-Draft LC-PCN February 2008 Section 3.4, then the PCN- ingress-node SHOULD remark to not congested PCN encoding state, all incoming to PCN domain, packets associated to flows that need to use the LC-PCN features. 4.4. Admission control based on data marking and probing for bi- directional flows This section describes the admission control scheme that uses the admission control function based on datamarking and probing when bi- directional reservations are supported. PCN-ingress-node PCN-interior-node PCN-interior-node PCN-egress-node user| | | | | data| | | | | --->| | user data | |user data | |-------------------------------------------->S (#marked bytes) | | | S-------------->| | | | S(#unmarked bytes) | | | S-------------->| | | | S | | | probe(re-marked DSCP) | | | | S | |-------------------------------------------->S | | | | S-------------->| | | | S | | | response(unsuccessful) | |<------------------------------------------------------------| | | | S | Figure 5: Admission control based on data marking and probing for bi-directional admission control (pre-congestion on path from PCN-ingress-node towards PCN-egress-node) This procedure is similar to the admission control procedure described in Section 4.1 and 4.2 for the situations that the admission control with data marking and admission control with probing are used, respectively. The main difference is related to the location of the PCN-interior-node that operates in admission control state, i.e., "forward" path (i.e., path between PCN-ingress- node towards PCN- egress-node) or "reverse" path (i.e., path between PCN- egress-node towards PCN-ingress-node). Figure 5 shows the scenario where the pre-congested PCN-interior-node is located in the "forward" path. The functionality of providing admission control is the same as the one described in Section 4.1 and 4.2, Figure 3. Figure 6 shows the scenario where the pre-congested PCN-interior-node is located in the "reverse" path. The probe packet sent in the Westberg, et al. Expires August 28, 2008 [Page 32] Internet-Draft LC-PCN February 2008 "forward" direction will not be affected by the pre-congested PCN- interior-node, while the probe packet and any packet of the "reverse" direction flows will be PCN_marking encoded. The PCN-ingress-node is in this way notified that a pre-congestion situation occurred in the network and therefore it will able to reject the new initiation of the reservation. PCN-ingress-node PCN-interior-node PCN-interior-node PCN-egress-node user| | | | | data| | | | | --->| | user data | | | |-------------------------------------------->|user data |user | | | |-------------->|data | | | | |---> | | | | |user | | | | |data | | | | |<--- | S | user data | | | S user data |<--------------------------| | user data S<---------------| | | |<---------------S | | | | user data S | | | | (#marked bytes)S | | | |<---------------S | | | | S probe(unmarked DSCP) | | S | | | |----------------S------------------------------------------->| | S probe(re-marked DSCP) | | S<-------------------------------------------| |<---------------S | | | Figure 6: Admission control based on data marking and probing for bi-directional admission control (pre-congestion on path PCN-egress-node towards PCN-ingress-node) 4.5. Flow Termination handling for bi-directional flows This section describes the flow termination handling operation for bi-directional flows. This flow termination handling operation is similar to the one described in Section 4.3. Westberg, et al. Expires August 28, 2008 [Page 33] Internet-Draft LC-PCN February 2008 PCN-ingress-node PCN-interior-node PCN-interior-node PCN-egress-node user| | | | | data| user | | | | --->| data | user data | |user data | |--------------->| | S | | |--------------------------->S (#marked bytes) | | | S-------------->| | | | S(#unmarked bytes) | | | S-------------->|Term | | | S |flow? | | notification (terminate) |YES |<------------------------------------------------------------| |release (forward) | S | |------------------------------------------------------------>| | release (reverese) | S | |<------------------------------------------------------------| | | | S | Figure 7: Flow termination handling for bi-directional reservation (congestion on path PCN-ingress-node towards PCN-egress-node) This procedure is similar to the flow termination handling procedure described in Section 4.3. The main difference is related to the location of the the PCN-interior-node that operates in Flow Termination state, , i.e. "forward" or "reverse" path. Figure 7 shows the scenario where the severe congested node is located in the "forward" path. This scenario is very similar to the flow termination handling scenario described in Section 4.3. The difference is related to the release procedure, which is accomplished in both directions "forward" and "reverse". Figure 8 shows the scenario where the severe congested node is located in the "reverse" path. The main difference between this scenario and the scenario shown in Figure 7 is that no notification messages have to be generated by the PCN-egress-node. This is because the (#marked and #unmarked) user data is arriving at the PCN-ingress-node. The PCN- ingress-node will be able to calculate the number of flows that have to be terminated or forwarded in a lower priority queue. When a flow termination congestion occurs on e.g., in the forward path, and when the algorithm terminates flows to solve the flow termination in the forward path (see Figure 7), then the reserved bandwidth associated with the terminated bidirectional flows is also released. Therefore, a careful selection of the flows that have to be terminated should take place. A possible method of selecting the flows belonging to the same priority type passing through the flow termination congestion point on a unidirectional path can be the following: Westberg, et al. Expires August 28, 2008 [Page 34] Internet-Draft LC-PCN February 2008 o the PCN-egress-node should select, if possible, first unidirectional flows instead of bidirectional flows o the PCN-egress-node should select, if possible, bidirectional flows that reserved a relatively small amount of resources on the path reversed to the path of congestion. PCN-ingress-node PCN-interior-node PCN-interior-node PCN-egress-node user| | | | | data| user | | | | --->| data | user data | |user data | |--------------->| | | | | |--------------------------->|user data |user | | | |-------------->|data | | | | |---> | | | user | |<--- | user data | | data |<--------------| | (#marked bytes)| S<----------| | |<--------------------------------S | | | (#unmarked bytes) S | | Term|<--------------------------------S | | Flow? | S | | YES | | S | | |release (forward) S | | |------------------------------------------------------------>| | release (reverse) S | | |<------------------------------------------------------------| | | S | | Figure 8: Flow termination handling for bi-directional reservation (flow termination congestion on path PCN-egress-node towards PCN-ingress-node) Furthermore, a special case of this operation is associated to the Flow Termination situation occurring simultaneously on the forward and reverse paths. An example of this operation is given below (see Figure 9). Consider that the PCN-egress-node selects a number of bi- directional flows to be terminated, see Figure 9. In this case the PCN-egress- node will send for each bi-directional flows a notification message to PCN-ingress-node. If the PCN-ingress-node receives these notification messages and its operational state (associated with reverse path) is in the Flow Termination state (see Section 3.3.3), then the PCN-ingress-node operates in the following way: Westberg, et al. Expires August 28, 2008 [Page 35] Internet-Draft LC-PCN February 2008 PCN-ingress-node PCN-interior-node PCN-interior-node PCN-egress-node user| | | | | data| user | | | | --->| data | #unmarked bytes| | | |--------------->S #marked bytes | | | | S--------------------------->| | | | | |-------------->|data | | | | |---> | | | | Term.? | NOTIFY | | |Yes |<------------------------------------------------------------| | | | | |data | | | user | |<--- | user data | | data |<--------------| | (#marked bytes)| S<----------| | |<--------------------------------S | | | (#unmarked bytes) S | | Term|<--------------------------------S | | Flow? | S | | YES | | S | | |release (forward) S | | |------------------------------------------------------------>| | release (reverse) S | | |<------------------------------------------------------------| Figure 9: Flow termination handling for bi-directional reservation (flow termination congestion on both forward and reverse direction) o For each notification message, the PCN-ingress-node should identify the bidirectional flows that have to be terminated. o The PCN-ingress-node then calculates the total bandwidth that should be released in the reverse direction (thus not in forward direction) if the bidirectional flows will be terminated (preempted), say "notify_reverse_bandwidth". This bandwidth can be calculated by the sum of the bandwidth values associated with all the end-to-end flows that received a (flow termination) notification message. o Furthermore, using the received marked packets (from the reverse path) the PCN-ingress-node will calculate, using the algorithm used by an PCN-egress-node and described in Section 4.3.2, the total bandwidth that has to be terminated in order to solve the flow termination congestion in the reverse path direction, say "marked_reverse_bandwidth". Westberg, et al. Expires August 28, 2008 [Page 36] Internet-Draft LC-PCN February 2008 o The PCN-ingress-node then calculates the bandwidth of the additional flows that have to be terminated, say "additional_reverse_bandwidth", in order to solve the flow termination congestion in the reverse direction, by taking into account: * the bandwidth in the reverse direction of the bidirectional flows that were appointed by the PCN-egress-node (the ones that received a notification message) to be preempted, i.e., "notify_reverse_bandwidth" * the total amount of bandwidth in the reverse direction that has been calculated by using the received marked packets, i.e., "marked_reverse_bandwidth". This additional bandwidth can be calculated using the following algorithm: IF ("marked_reverse_bandwidth" > "notify_reverse_bandwidth") THEN "additional_reverse_bandwidth" = "marked_reverse_bandwidth"- "notify_reverse_bandwidth"; ELSE "additional_reverse_bandwidth" = 0 o PCN-ingress-node terminates the flows that experienced a severe congestion in the "forward" path and received a (flow termination) notification message o If possible the PCN-ingress-node should terminate unidirectional flows that are using the same egress-ingress reverse direction communication path to satisfy the release of a total bandiwtdh up equal to the: "additional_reverse_bandwidth". o If the number of required uni-directional flows (to satisfy the above issue) is not available, then a number of bi-directional flows that are using the same egress-ingress reverse direction communication path may be selected for flow termination in order to satisfy the release of a total bandiwtdh equal up to the: "additional_reverse_bandwidth". Note that using the guidelines given in above, first the bidirectional flows that reserved a relatively small amount of resources on the path reversed to the path of congestion should be selected for termination. o Furthermore, the PCN-egress-node includes the to be released aggregated bandwidth value in one of the notification messages. o The PCN-ingress-node receives this notification message and reads the value of the carried to be released aggregated bandwidth. Westberg, et al. Expires August 28, 2008 [Page 37] Internet-Draft LC-PCN February 2008 The size of the aggregated reservation state can be reduced in the "forward" and "reverse" by using the received to be reduced values the aggregated bandwidth in "forward" and "reverese" directions. 5. Security Considerations The security considerations associated with this document are similar to the one described in [Eard08]. 6. IANA Considerations To be Added 7. Acknowledgements The authors express their acknowledgement to the following colleagues: Andras Csaszar, Attila Takacs, David Partain, Zoltan Turanyi, Geert Heijenk, Anna Charny, Philip Eardley, Kwok Chan, Bob Briscoe, Joe Babiarz, Michael Menth. 8. Informative References [AdCa03] Adler, M., Cai, J., Shapiro, J., and D. Towsley, "Estimation of congestion price using probabilistic packet marking", Proc. IEEE INFOCOM, pp. 2068-2078, 2003. [AnHa06] Lachlan, A. and S. Hanly, "The Estimation Error of Adaptive Deterministic Packet Marking", 44th Annual Allerton Conference on Communication, Control and Computing, , 2006. [AtLi01] Athuraliya, S., Li, V., Low, S., and Q. Yin, "REM: active queue management", IEEE Network, vol. 15, pp. 48-53, May/ June 2001. [Babi07] Babiarz, J. and et. al., "Three State PCN Marking", draft-babiarz-pcn-3sm-01 (work in progress), , November 2007. [Bernet99] Bernett, Y., Yavatkar, R., Ford, P., Baker, F., Zhang, L., Speer, M., and R. Braden, "Interoperation of RSVP/Intserv and Diffserv Networks", Work in Progress , March 1999. Westberg, et al. Expires August 28, 2008 [Page 38] Internet-Draft LC-PCN February 2008 [Berson97] Berson, S. and R. Vincent, "Aggregation of Internet Integrated Services State", Work in Progress, , December 1997. [CL-ARCH] Briscoe, B. and et. al., "An edge-to-edge Deployment model for pre-congestion notification: Admission control over a Diffserv region", , October 2006. [CL-PHB] Briscoe, B. and et. al., "Pre-congestion notification marking", , October 2006. [Char07] Charny, A. and et. al., "Pre-Congestion Notification Using Single Marking for Admission and Termination", draft-charny-pcn-single-marking-03 (work in progress), , November 2007. [CsTa05] Csaszar, A., Takacs, A., Szabo, R., and T. Henk, "Resilient Reduced-State Resource Reservation", Journal of Communication and Networks Vol. 7, Num. 4, December 2005. [Eard08] Eardley, P., "Pre-Congestion Notification Architecture", draft-ietf-pcn-architecture-03 (work in progress), , February 2008. [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., and W. Weiss, "An Architecture for Differentiated Services", RFC 2475, December 1998. [RFC3175] Baker, F., Iturralde, C., Le Faucheur, F., and B. Davie, "Aggregation of RSVP for IPv4 and IPv6 Reservations", RFC 3175, September 2001. [RMD] Bader, A., "RMD-QOSM: The resource management in Diffserv QoS Model", draft-ietf-nsis-rmd-12.txt (work in progress), , November 2007. [Stoica99] Stoica, I. and et. al., "Per Hop Behaviors Based on Dynamic Packet States", Work in Progress , February 1999. [ThCo04] Thommes, R. and M. Coates, "Deterministic packet marking for congestion packet estimation", Proc. IEEE Infocom , 2004. [Westberg00] Westberg, L. and et. al., "Load Control of Real-Time Traffic", IETF Work in Progress , April 2000. Westberg, et al. Expires August 28, 2008 [Page 39] Internet-Draft LC-PCN February 2008 Authors' Addresses Lars Westberg Ericsson Torshamnsgatan 23 SE-164 80 Stockholm Sweden Email: Lars.westberg@ericsson.com Anurag Bhargava Ericsson 920 Main Campus Dr., Suite 500 Raleigh, NC 27606 USA Phone: +1 919 472 9964 Email: anurag.bhargava@ericsson.com Attila Bader Ericsson Laborc 1 Budapest Hungary Email: Attila.Bader@ericsson.com Georgios Karagiannis University of Twente P.O. Box 217 7500 AE Enscede Netherlands Email: g.karagiannis@ewi.utwente.nl Westberg, et al. Expires August 28, 2008 [Page 40] Internet-Draft LC-PCN February 2008 Full Copyright Statement Copyright (C) The IETF Trust (2008). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Intellectual Property The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Acknowledgment Funding for the RFC Editor function is provided by the IETF Administrative Support Activity (IASA). Westberg, et al. Expires August 28, 2008 [Page 41]