Internet Draft RMD On DemAnd PHR April 2001 Internet Engineering Task Force L. Westberg INTERNET-DRAFT M. Jacobsson Expires October 2001 G. Karagiannis S. Oosthoek D. Partain V. Rexhepi P. Wallentin Ericsson April 2001 Resource Management in Diffserv On DemAnd (RODA) PHR draft-westberg-rmd-od-phr-00.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Distribution of this memo is unlimited. Copyright Notice Copyright (C) The Internet Society (2001). All Rights Reserved. Abstract Westberg, et al. Expires October 2001 [Page 1] Internet Draft RMD On DemAnd PHR April 2001 The purpose of this draft is to present the Resource Management in Diffserv (RMD) On DemAnd (RODA) Per Hop Reservation (PHR) protocol. The RODA PHR protocol is used on a per-hop basis in a Differentiated Services (Diffserv) domain and extends the Diffserv Per Hop Behavior (PHB) with resource provisioning and control. 1. Introduction The current definition of Diffserv [RFC2475] does not contain a simple and scalable solution to the problem of resource provisioning and control. The Resource Management in Diffserv (RMD) On DemAnd (RODA) Per Hop Reservation (PHR) protocol presented in this document operates in an edge-to-edge Diffserv domain extending the Per Hop Behavior (PHB) functionality with resource provisioning and control. The RODA PHR is a unicast edge-to-edge protocol that is applied in a Diffserv domain and aims at extreme simplicity and low cost of implementation along with good scaling properties. The RODA PHR protocol operates on a hop-by-hop basis on all nodes, both edge and interior, located in an edge-to-edge Diffserv domain. This PHR protocol can be applied in Diffserv domains that use either IPv4 [RFC791] or IPv6 [RFC2460]. The Resource Management in Diffserv (RMD) Framework document [RMD- frame] specifies how a PHR can interoperate with a Per Domain Reservation (PDR) protocol. A PDR scheme represents the resource reservation in the Diffserv domain, and it is implemented only at the boundary of the domain (in the edge nodes). 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. Furthermore, all new terms used in this draft are defined in [RMD- frame]. 3. RODA PHR functionality The RODA PHR protocol performs the following functions: * Installation and maintenance of one reservation soft state Westberg, et al. Expires October 2001 [Page 2] Internet Draft RMD On DemAnd PHR April 2001 per Diffserv PHB. An efficient algorithm should be used to establish, maintain and release this resource reservation state. An example of such an algorithm can be found in Section 4. * Detection of severe congestion. All nodes MUST be able to identify a severe congestion situation. Severe congestion might occur due to a route change, a link failure or a long period of congestion. The RODA PHR protocol provides the means of informing other nodes of the congestion situation on a hop-by-hop basis * Stores a pre-configured threshold value on maximal allowable resource units per PHB. * Transport of transparent PDR messages. The PHR protocol may encapsulate and transport PDR messages sent from an ingress node to an egress node. 4. RODA PHR protocol operation 4.1. Normal operation Two PHR protocol messages are specified: the "PHR_Resource_Request" and "PHR_Refresh_Update". Both messages pass through the same nodes as the actual traffic will pass through. 1. The PHR_Resource_Request is used to initiate or update the PHB reservation state on all nodes located on the communication path between the ingress and egress nodes according to an external QoS Request. Furthermore, the PHR_Resource_Request message does not refresh any existing soft state reservation. 2. The PHR_Refresh_Update is used to refresh the PHB reservation soft state on all nodes located on the communication path between the ingress and egress nodes according to a resource reservation request that was successfully processed by the PHR functionality during a previous refresh period. All nodes SHOULD process the "PHR_Refresh_Update" message with a higher priority than the "PHR_Resource_Request" message. The Westberg, et al. Expires October 2001 [Page 3] Internet Draft RMD On DemAnd PHR April 2001 detailed RODA PHR message format is described in Section 6 below. Any node that receives a RODA PHR message (a "PHR_Resource_Request" or a "PHR_Refresh_Update" message) MUST identify the DSCP of these signaling messages and, if possible, reserve the requested units of resources contained in the "Requested Resources" field of these signaling messages. If this can be accomplished then the node reserves the requested resources by adding the requested on-demand units of resources to the total amount of reserved units associated with that DSCP. Otherwise, these messages are marked, which means setting the "M" bit to "1". Any "M" marked (the "M" bit is 1) "PHR_Resource_Request" messages that arrives in an interior node are not processed and are forwarded untouched. Any "PHR_Refresh_Update" message, whether it is marked or not, is always processed, but marked bits are not changed. 4.2. Fault handling operation When a node detects this situation it MUST inform the egress node by setting the "S" field of any received PHR message to "1" and sending this message towards the egress node. In the situation that this cannot be done, operational management solutions, such as Simple Network Management Protocol (SNMP) notifications SHOULD be used. Any "S" marked (the "S" bit is 1) "PHR_Resource_Request" messages that arrives in an interior node are not processed and are forwarded untouched. Any "PHR_Refresh_Update" message, whether it is marked or not, is always processed, but marked bits are not changed. 4.3. Implementation Example using Pseudo Code In this section, we describe an implementation example of the PHR functionality using pseudo code. In the example. we disregard the DSCP detection requirement. The variables used in this pseudo code are: rsv_state: the level of reserved units of the previous period count: counts the current number of reserved units threshold: upper bound of the amount of units that can be reserved for this service class p_phr : the incoming PHR message ON INITIALIZE Westberg, et al. Expires October 2001 [Page 4] Internet Draft RMD On DemAnd PHR April 2001 rsv_state = 0 count = 0 END EVENT ON arrival of PHRsignal packet p_phr /* don't process reservation request marked packets */ IF p_phr is a PHR_Resource_Request AND is marked (S or M ) THEN forward p_phr message to next hop wait for next phr message ENDIF IF p_phr is a PHR_Resource_Request AND there is severe congestion for some reason THEN mark p_phr (S-bit = 1) forward p_phr message to next hop wait for next phr message ENDIF /* process refresh requests, whether marked or not */ IF p_phr is a PHR_Refresh_Update AND count + requested units < threshold THEN count = count + requested units ELSEIF p_phr is a PHR_Refresh_Update THEN count = count + requested units mark packet (M-bit = 1) ENDIF /* process new requests */ IF p_phr is a PHR_Resource_Request AND rsv_state + requested units < threshold AND count + requested units < threshold THEN count = count + requested units rsv_state = rsv_state + requested units ELSEIF p_phr is a PHR_Resource_Request THEN mark p_phr (M-bit = 1) ENDIF /* check and mark for severe congestion */ Westberg, et al. Expires October 2001 [Page 5] Internet Draft RMD On DemAnd PHR April 2001 IF there is severe congestion for some reason THEN mark p_phr (S-bit = 1) ENDIF forward p_phr message to next hop END EVENT ON end of refreshperiod /* at the end of each refreshperiod, */ rsv_state = count /* update reservation state / count = 0 /* and reset count */ END EVENT Another implementation example of the PHR functionality using pseudo code is given in Section 12 (Appendix 1). 5. PHR message formats The PHR protocol information is carried in: * a IP header Options field, as defined in the [RFC791], when IPv4 is used * an option field encoded into the Hop-by-Hop Options Extended Header, as defined in [RFC2460], when IPv6 is used We denote this IP Option field as the RODA PHR option. Westberg, et al. Expires October 2001 [Page 6] Internet Draft RMD On DemAnd PHR April 2001 5.1. Message Format in IPv4 The RODA PHR protocol messages used in IPv4 Diffserv domains are represented by the combination of the DSCP field and the contents of an IPv4 option header field [RFC791]. This IPv4 option header field has the following format. Note that the contents of the PDR (per- domain reservation) encapsulated data are simply opaque data to the PHR and are processed in no way by the PHR. Please see [RMD-frame] for a description of PDR functionality. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Option Type | Option Length |P-LEN| P-ID |S|M| C |Unused | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Requested Resources | Unused | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . . . PDR encapsulated data . . Variable length field used to . . encapsulate PDR messages . | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 1: PHR Option field in the IPv4 Option header field Option Type 8-bit identifier of the type of option. The semantics of this field are specified in [RFC791]. Option Length 8-bit field. This is specified in [RFC791] and represents the length of the Option-Data field of this option, in octets. The option data field consists of all fields included in the option field of the IPv4 header and are placed after the "Option Length" field. P-LEN 3-bit field. This specifies the length in (PHR length) octets of the specific PHR information data included in the "Option-Data" field. This information does not include the encapsulated PDR information. The value 0 specifies that this IP option field contains only PDR data and no PHR data. The PDR data MUST begin on the next 32-bit word boundary after the P-LEN field (after Westberg, et al. Expires October 2001 [Page 7] Internet Draft RMD On DemAnd PHR April 2001 the first "unused" field). In this case, the sender MUST set the "S", "M", "C", and "unused" fields to 0. The P-ID MUST have the value 1. If a receiver receives a packet with a P-LEN value of 0, it MUST ignore the values in the "S", "M", "C", and "unused" fields. P-ID (PHR type) 4-bit field. This specifies the PHR type. For the RODA PHR, the value MUST be 1. S 1-bit field. The sender MUST set the "S" (Severe field to 0. This field is set to 1 Congestion) by an interior or edge node when a severe congestion situation occurs. M 1-bit field. The sender MUST set the "M" (Marked) field to 0. This field is set to 1 by an interior or edge node when the node cannot satisfy the "Requested Resources" value. C 3-bit field. This field specifies the (Message type) type of the PHR message. C Description ------------------------------- 0 Reserved 1 "PHR_Resource_Request" 2 "PHR_Refresh_Update" 3-7 Unused UNUSED A 4-bit and a 16-bit field that are currently unused. Reserved for future PHR extensions. Requested 16-bit field. This field specifies the Resources requested number of units of resources to be reserved by a node. The unit is not necessarily a simple bandwidth value. It may be defined in terms of any resource unit (e.g., effective bandwidth) to support statistical multiplexing at message level. Westberg, et al. Expires October 2001 [Page 8] Internet Draft RMD On DemAnd PHR April 2001 PDR PDR encapsulated information data. encapsulated This field is only processed by the data edge nodes. 5.2. Message Format in IPv6 The PHR protocol messages used in IPv6 Diffserv domains are represented by the combination of the DSCP field and the contents of an option field of a IPv6 Hop-by-Hop header option [RFC2460]. This IPv6 option field has the following format. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Next Header | Hdr Ext Len | Option Type | Opt Data Len | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |P-LEN|P-ID |S|M| C |Unused | Requested Resources | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Unused | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . . . PDR encapsulated data . . Variable length field used to . . encapsulate PDR messages . | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 2: PHR Option field in the IPv6 Hop-by-Hop Header Option Next Header 8-bit selector. This is specified in [RFC2460] and identifies the type of header immediately following the Hop-by-Hop Options header. Hdr Ext Len 8-bit field. This is specified in [RFC2460] and represents the length of the Hop-by-Hop Options header in 8-octet units, not including the first 8 octets. Option Type 8-bit identifier of the type of option. The semantics of this field are specified in [RFC2460]. Opt Data Len 8-bit field. This is specified in [RFC2460] and represents the length in octets of the Option Data field of this option. The option data field consists Westberg, et al. Expires October 2001 [Page 9] Internet Draft RMD On DemAnd PHR April 2001 of all fields included in the Hop-by-Hop header option and placed after the "Opt Data Len" field. P-LEN 3-bit field. The semantics of this field (PHR length) are identical to the field in the IPv4 option. Just as for IPv4, the value 0 specifies that this IP option field contains only PDR data and no PHR data. The PDR data MUST begin on the next 32-bit word boundary after the P-LEN field (after the first "Requested Resources" field). In this case, the sender MUST set the "S", "M", "C", "unused", and "Requested Resources" fields to 0. The P-ID MUST have the value 1. If a receiver receives a packet with a P-LEN value of 0, it MUST ignore the values in the "S", "M", "C", and "unused" fields. UNUSED A 4-bit and a 32-bit field that are currently unused. Reserved for future PHR extensions. PDR encapsulated a variable length field that contain PDR data encapsulated information data. This field is only processed by the edge nodes. The "Requested Resources", "P-LEN", "P-ID", "S", "M" and "C" fields in Figure 2 are identical to those shown in Figure 1. 6. Adaptation for load sharing Due to load sharing (see e.g., [RFC2676]), a node may cycle between different routes in order to balance the load. This will imply that the traffic (user) data will not follow exactly the same paths as the PHR messages used to reserve or refresh the transport resources used by this traffic (user) data. As such, interior and edge nodes MUST be able to observe when a load sharing situation occurs. It is recommended that interior and edge nodes SHOULD forward the PHR Westberg, et al. Expires October 2001 [Page 10] Internet Draft RMD On DemAnd PHR April 2001 messages in such a way that they will follow the same forwarding path as the traffic (user) data associated with these PHR messages. When this cannot be done, we propose use of the same solutions as the multi-path route solutions proposed in Section 1.4.6 of [BaIt01]. These are: * the data may be tunneled from the ingress to egress node using technologies such as IP-in-IP, GRE (Generic Routing Encapsulation), MPLS (Multiple Label Protocol Switching) label-switched paths, and so on. * measurement could be used to determine what proportion of traffic for a given reservation travels along each of the load sharing paths, thereby verifying that there is sufficient bandwidth for the reservation. * by reserving the total capacity of the route down each load sharing path. 7. Tunneling When PHR messages are tunneled within the RMD Diffserv domain, the tunneling messages MUST include the PHR option field. 8. Security considerations The general security and tunneling considerations stated in Section 6 of [RFC2475] and [RMD-frame] also apply to this PHR. In addition, unlike Differentiated Services PHBs, the RODA PHR allows the edge nodes to reserve bandwidth or other QoS parameters dynamically. This flexibility makes it more vulnerable to erroneous reservations and sabotage. In order to keep functioning properly, the edge nodes MUST be certain that any flow reserving bandwidth in the network is authorized to do this and only up to that flow's agreed upon limit. If the edge node detects erroneous or malicious behavior, it MUST police that flow to the agreed upon limits or reject it entirely. Because of the soft state principle used, the PHR can recover relatively easily from incorrect reservations. Thus it is quite safe to deploy the RODA PHR in a well-controlled network with trustworthy edge nodes. Westberg, et al. Expires October 2001 [Page 11] Internet Draft RMD On DemAnd PHR April 2001 In order to prevent abuse of the QoS capabilities of the core network, the ingress nodes SHOULD filter any PHR or PDR related header information coming from the outside before sending it through the core network. Whether this information needs to be preserved and later re-inserted or if it should be discarded from the packet or if the entire packet should be discarded is an open issue. 9. References [BaIt01] Baker, F., Iturralde, C., Le Faucher, F., Davie, B., "Aggregation of RSVP for IPv4 and IPv6 Reservations", Internet draft, Work in progress. [RMD-frame] Karagiannis, G., Rexhepi, V., Westberg, L., Partain, D., Oosthoek, S., Jacobsson, M., Szabo, R., Wallentin, P., "Resource Management in Diffserv Framework", Internet draft, April 2001 (work in progress). [RFC791] DARPA INTERNET PROGRAM PROTOCOL SPECIFICATION, "Internet Protocol", IETF RFC 791, September 1981. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC2119, March 1997. [RFC2205] Braden, R., Zhang, L., Berson, S., Herzog, A., Jamin, S., "Resource ReSerVation Protocol (RSVP) -- Version 1 Functional Specification", IETF RFC 2205, 1997. [RFC2460] Deering, S., Hinden, R., "Internet Protocol, Version 6 (IPv6) Specification", IETF RFC 2460, December 1998. [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., Weiss, W., "An Architecture for Differentiated Services", IETF RFC 2475, December 1998. [RFC2676] Apostolopoulos, G., Willians, D., Kamat, S., Guerin, R., Orda, A., Przygienda, T., "QoS Routing Mechanisms and OSPF Extensions", IETF Experimental RFC 2676, August 1999. Westberg, et al. Expires October 2001 [Page 12] Internet Draft RMD On DemAnd PHR April 2001 [RFC2859] Fang, W., Seddigh, N., Nandy, B., "A Time Sliding Window Three Colour Marker (TSWTCM)", IETF Experimental RFC 2859, June 2000. 10. Acknowledgments Thanks to Robert Szabo and Geert Heijenk for reviewing this draft and providing useful input. 11. Authors' Addresses Lars Westberg Ericsson Research Torshamnsgatan 23 SE-164 80 Stockholm Sweden EMail: Lars.Westberg@era.ericsson.se Martin Jacobsson Ericsson EuroLab Netherlands B.V. Institutenweg 25 P.O.Box 645 7500 AP Enschede The Netherlands EMail: Martin.Jacobsson@eln.ericsson.se Georgios Karagiannis Ericsson EuroLab Netherlands B.V. Institutenweg 25 P.O.Box 645 7500 AP Enschede The Netherlands EMail: Georgios.Karagiannis@eln.ericsson.se Simon Oosthoek Ericsson EuroLab Netherlands B.V. Institutenweg 25 P.O.Box 645 7500 AP Enschede The Netherlands EMail: Simon.Oosthoek@eln.ericsson.se David Partain Westberg, et al. Expires October 2001 [Page 13] Internet Draft RMD On DemAnd PHR April 2001 Ericsson Radio Systems AB P.O. Box 1248 SE-581 12 Linkoping Sweden EMail: David.Partain@ericsson.com Vlora Rexhepi Ericsson EuroLab Netherlands B.V. Institutenweg 25 P.O.Box 645 7500 AP Enschede The Netherlands EMail: Vlora.Rexhepi@eln.ericsson.se Pontus Wallentin Ericsson Radio Systems AB P.O. Box 1248 SE-581 12 Linkoping Sweden EMail: Pontus.Wallentin@era.ericsson.se 12. Appendix 1 In this appendix, we describe a second implementation example of the PHR functionality using pseudo code. This algorithm is described in an event-driven way. When a packet arrives or a timer expires, an event is generated and the algorithm acts on it. nrofsubwindows = 15 // number of subwindows in the periodlength countarray[0..nrofsubwindows-1] = 0 // exactly nrofsubwindows // subwindows in countarray rfcount = 0 // count of PHR_Resource_Request // and PHR_Refresh_Update messages // in current subwindow // "invariants" at the subwindow boundary: lastsum = sum from i=0 to nrofsubwindows-1 of countarray[i] // lastsum represents the total amount of reserved bandwidth up // to the current subwindow for a complete refresh period-length newsum = lastsum - countarray[0] + rfcount Westberg, et al. Expires October 2001 [Page 14] Internet Draft RMD On DemAnd PHR April 2001 // see this as a macro that is constantly up to date! // newsum is going to be the "lastsum" for the next subwindow. // So the oldest subwindow is left out and the new rfcount // (PHR_Resource_Request/PHR_Refresh_Update) value is included. ON arrival of PHRsignal packet p_phr // do not process marked PHR_Resource_Request messages IF p_phr is a PHR_Resource_Request AND severe-bit of marked-bit is set to 1 in p_phr THEN forward p_phr to next hop wait for next phr message ENDIF // mark packets (severe) if router is in a state of // severe congestion IF p_phr is a PHR_Resource_Request AND router is in a state of severe congestion THEN mark p_phr (S = 1) forward p_phr to next hop wait for next phr message ENDIF // Process unmarked PHR_Resource_Request packets and // PHR_Refresh_Update packets (marked or unmarked) IF p_phr is a PHR_Resource_Request AND units + lastsum <= threshold THEN rfcount = rfcount + units lastsum = lastsum + units ELSEIF p_phr is a PHR_Resource_Request THEN mark p_phr (M = 1) ENDIF IF p_phr is a PHR_Refresh_Update AND units + newsum <= threshold THEN rfcount = rfcount + units // don't update lastsum if it's a refresh update ELSEIF p_phr is a PHR_Refresh_Update THEN Westberg, et al. Expires October 2001 [Page 15] Internet Draft RMD On DemAnd PHR April 2001 mark p_phr (M = 1) ENDIF forward p_phr to next hop END EVENT ON next subwindow // advance the window to the next subwindow slide_window(countarray) // after this operation, a[0] contains what was previously // in a[1]. The same goes for all the other values in a, // except for the last, which is set to 0 countarray[nrofsubwindows-1] = rfcount // sum all the subwindows in countarray lastsum = sum(countarray) newsum = lastsum - countarray[0] rfcount = 0 END EVENT Table of Contents 1 Introduction .................................................... 2 2 Terminology ..................................................... 2 3 RODA PHR functionality .......................................... 2 4 RODA PHR protocol operation ..................................... 3 4.1 Normal operation .............................................. 3 4.2 Fault handling operation ...................................... 4 4.3 Implementation Example using Pseudo Code ...................... 4 5 PHR message formats ............................................. 6 5.1 Message Format in IPv4 ........................................ 7 5.2 Message Format in IPv6 ........................................ 9 6 Adaptation for load sharing ..................................... 10 7 Tunneling ....................................................... 11 8 Security considerations ......................................... 11 9 References ...................................................... 12 10 Acknowledgments ................................................ 13 Westberg, et al. Expires October 2001 [Page 16] Internet Draft RMD On DemAnd PHR April 2001 11 Authors' Addresses ............................................. 13 12 Appendix 1 ..................................................... 14 Westberg, et al. Expires October 2001 [Page 17]