Internet Engineering Task Force Ed Ellesson INTERNET DRAFT Steven Blake draft-ellesson-tos-00.txt IBM November 1997 Expires May 1998 A Proposal for the Format and Semantics of the TOS Byte and Traffic Class Byte in IPv4 and IPv6 Headers Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." To view the entire list of current Internet-Drafts, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Abstract This draft proposes an arrangement of fields in the IPv4 TOS byte [RFC795, RFC1349], and in the IPv6 Traffic Class byte [IPv6], and proposes a definition for their associated semantics. The intention is to enable the preservation of currently useful differential class- based queueing behavior on existing network devices, while simultaneously enabling new methods of bandwidth allocation and policing via drop preference, all within the context of flows which may be encrypted at the IP layer using IPSEC. (Note: the IPv6 Class field has recently been expanded to eight bits, but this in not yet available in any version of the specification). 1. Introduction Many intranets have deployed class-based queuing and/or class-based buffer management for a number of years. Routers and/or other network edge devices in these networks typically classify each packet received based on IP and TCP/UDP header fields which offer clues as to the application type or traffic type, and which therefore Ellesson/Blake Expires 5/98 [Page 1] INTERNET-DRAFT draft-ellesson-tos-00.txt November, 1997 reflect the relative business importance of the communicating application. These clues include the source/destination port numbers, protocol id, and source/destination addresses. These classification criteria are then used to direct packets to statically configured queues, or to identify the packets for drop eligibility within the network device. With the advent of IPSEC encryption of the IP payload, it is no longer possible for routers and other network devices to look inside the payload of an encrypted IP packet to extract the protocol id or source/destination port numbers for the purpose of learning the traffic type. As a result, the functionality of packet classification and differential queueing/dropping based on traffic type is lost when migrating applications to a network that supports IPSEC, unless a different mechanism is enabled for packet classification. (The other circumstance when the classification information is lost is IP fragmentation). Commercial enterprises which are considering migrating their private corporate network traffic to use the Internet, must ensure the security and privacy of their traffic, while, at the same time, preserving the precedence of certain traffic types over others in order to support service level agreements with their users. If no solution is standardized, either the deployment of VPN's and extranets based on IPSEC will be impeded, or the current ability to prioritize by traffic type will be lost. 2. Requirements The following requirements drove the proposed solution: -ability to prioritize packets by traffic class, for both delay and drop preference, in a standardized, multivendor-interoperable way. -ability to support explicit congestion notification, as proposed by [ECN94, ECN97, Clark97]. -ability to isolate certain transport protocols (especially those using non-TCP congestion control algorithms) and/or traffic types into separate traffic classes for individualized queueing treatment in a network device, according to the needs of individual users and network operators and network administrators in the context of intranets, extranets, and VPN's [Blake97]. -ability to provision minimum bandwidth available to specific traffic classes (for starvation avoidance), either via marking at the network edge for dropping priority (new capability), or via setting dropping thresholds, leaky bucket parameters, and/or minimum drain rate in each queue that is dedicated to a specific traffic class (existing capability extended to work within the IPSEC context). Ellesson/Blake Expires 5/98 [Page 2] INTERNET-DRAFT draft-ellesson-tos-00.txt November, 1997 -ability to mark packets for differential service treatment at the packet source, or at an intervening network device, such as a router or traffic shaper/policer, for example, at the edge of the access to an ISP's wide area network. -avoid introduction of a migration penalty for current best effort users of a network which has partially deployed profile-meter marking. -ability to provide all the above functionality in the context of a network that supports IPSEC encryption. (Encryption may be implemented either at the packet stream source, or in a downstream device. Packet marking, via the TOS byte or the Traffic Class byte, must be introduced at, or upstream of, the encrypting device). -same semantics for IPv4 and IPv6, to enable coexistence/migration. 3. Proposed Solution IPv4 TOS Byte (and IPv6 Traffic Class Byte) Definition: 0 1 2 3 4 5 6 7 +---+---+---+---+---+---+---+---+ |CE |ECT|DP | Service Class |MBZ| +---+---+---+---+---+---+---+---+ Explicit Congestion Notification (ECN): CE: Congestion Experienced (1 bit) (router-, or traffic shaper/policer-settable) - 0: no congestion experienced - 1: congestion experienced ECT: ECN-capable transport (1 bit): (source host-settable) - 0: transport protocol not ECN-capable - 1: transport protocol ECN-capable Notes on Explicit Congestion Notification: -There are two proposals for the use of the CE bit. In [ECN94] and [ECN97], the CE bit is set stochastically based on random early detection of congestion (when ECT is set) [RED]. In [Clark97], the CE bit is set deterministically for every packet under impending congestion, and then the CE bit is filtered downstream by a receiver profile meter. The proposed mapping supports both approaches. Routers shall support the behavior specified in [ECN97] by default. Routers shall support the behavior specified in [Clark97] as a configuration option (to allow the deployment of receiver-based service allocation). Ellesson/Blake Expires 5/98 [Page 3] INTERNET-DRAFT draft-ellesson-tos-00.txt November, 1997 Drop Preference (DP) (1 bit): - 0: not discard-eligible/in-profile/loss-sensitive - 1: discard eligible/out-of-profile/loss-insensitive Notes on Drop Preference: -[Clark97]- and [Feng97]-style service-allocation implementations should use the DP bit to signal in- or out-of- profile packets. Integrated Services policers shall use the DP bit to signal non-conformant packets. Routers shall not re- order packets within a flow when the DP bit is toggled. Explicit Service Class (4 bits): - 0001: Delay Insensitive (minimize cost/worse than best effort) - 0000: Normal (best effort) - 1000: Interactive Delay (low average delay) - 1001: Low Maximum Delay (ie: low delay/jitter, eg. real time) - 0010: Network Control (maximize reliability) - 0100: Maximize Throughput - 0011: Network-specific 1 - 0101: Network-specific 2 - 0110: Network-specific 3 - 0111: Network-specific 4 - 1010: Network-specific 5 - 1011: IntServ (low priority), Network-specific 6 - 1100: IntServ (medium priority), Network-specific 7 - 1101: IntServ (high priority), Network-specific 8 - 1110: Reserved 1 - 1111: Reserved 2 Notes on Explicit Service Class: -The specific value assignments noted above were chosen to preserve backwards compatibility with [RFC1349]. -The value assignments noted above are shown out of numerical order to highlight proximity of interpretation. -The first four service classes noted above are ranked in order of increasing delay priority. -Any of the above service classes may have specific minimum bandwidth allocations, delay priorities, and/or drop thresholds configured within the routers. The means of configuring these parameters are beyond the scope of this specification. (One possible implementation method is to direct packets with a specific service class to an associated class-based queue [CBQ]. This permits the service class to act as an index Ellesson/Blake Expires 5/98 [Page 4] INTERNET-DRAFT draft-ellesson-tos-00.txt November, 1997 into a queue that has been configured for a particular traffic type, in the same way that the port numbers are used today on commercial intranets. Note that the combination of source/ destination address, protocol id, and service class can be used in the same way, that is, as an index into a queue.) -The network-specific code points are available to be used with any application/traffic type, or set of application types, agreed to by the network administrator/network operator and the end user/service subscriber. The network administrator may provision resources for each network-specific service class as appropriate to provide the level of performance required for traffic mapped to that class. The network administrator may associate any network-specific service class with a particular drop or delay priority. By specifying minimum bandwidths per- class (mechanism for doing this is outside the scope of this draft), the network administrator can avoid starvation of lower priority flows. -The network-specific code points are not ranked in any implied order of loss, delay, or throughput priority. -Routing and other network control protocols using this mapping which require prioritized handling or reliable delivery by the network shall be marked with service class '0010' (Network Control). -IntServ code values '1011', '1100', '1101', are intended for use by packets of a reserved Integrated Services flow where RSVP aggregation is deployed [GBH97]. Packets within an aggregated reservation should be mapped to one of the three service classes (depending on the IntServ traffic class) to facilitate packet classification. The router implementation must isolate traffic in these service classes from traffic which has not been policed at an RSVP aggregation point. The IntServ service class implementations must prevent non- conformant packets (marked by DP) from degrading the QoS of other flows within the same service class. If RSVP aggregation is not deployed, then these code values are available as network-specific service classes. If additional IntServ service classes are desired, they may be allocated from the network-specific code points at the discretion of the network administrator. -Service class-specific routing (i.e., TOS routing) may be implemented at the option of the network administrator. Specification of the routing metrics to be associated with each service class is beyond the scope of this draft. Ellesson/Blake Expires 5/98 [Page 5] INTERNET-DRAFT draft-ellesson-tos-00.txt November, 1997 -Interaction of network-specific code points with DP and ECN field values: If supported by network policy, an edge device may instead use the DP marking of out-of-profile traffic to provide minimum bandwidth guarantees, rather than using minimum bandwidths configured in each router. In addition, drop and congestion control may be provided individually within each service class. That is, packets may be marked for drop eligibility or for explicit congestion notification within a specific allocated bandwidth, configured for a specific service class of traffic, rather than relative to the entire bandwidth available on an outgoing interface. The service class implementation may choose to ignore the CE, ECT, or DP bits. -Reserved codepoints are available for future standardization or experimentation. MBZ (1bit): Must be zero. Reserved for use on experimental networks with TOS Byte or Traffic Class Byte definition other than above. 4. Network Interoperability Service providers which exchange traffic and support differentiated services via service-class-value-marked packets, should either agree to compatible definitions for Network-specific values, or they should agree to map the Network-specific values into one of the standardized values at their interconnection point. A scaleable administrative mechanism for managing the mapping of traffic type to service class, and from service class to service class (across a domain boundary) is key to the manageable deployment of this solution on a wide scale. Scaleable administrative mechanisms are beyond the scope of this draft. 5. Backwards Compatibility With RFC 795 [Ferg97] suggests the usage of the IPv4 Precedence field to signify the drop preference of in-profile or out-of-profile packets, as defined for example by [Clark97]. Out-of-profile packets would be marked with lower precedence than in-profile packets. Routers which implemented preferential discard based on the semantics of [RFC795] would preferentially discard out-of-profile packets in times of impending congestion. Ellesson/Blake Expires 5/98 [Page 6] INTERNET-DRAFT draft-ellesson-tos-00.txt November, 1997 A potential problem that may occur during the phase of partial deployment of traffic profile meters is that the bulk of existing best-effort traffic is marked with the "Routine" precedence value '000'. This un-metered traffic which enters a network implementing precedence dropping would be treated as out-of-profile. It is not clear that this is always the correct choice, which motivated our decision to abandon the existing semantics of the Precedence field and explicitly allocate a drop preference bit. However, it is also the case that many routing protocol implementations transmit their packets with "Internetwork Control" precedence '110', as specified in [RFC1812]. During a transition period where not all routers have been upgraded to use the proposed service class mapping for network control (service class '0010'), it may be valuable to provide backwards compatibility with RFC 795 Precedence semantics. We know of two approaches to achieve this: -utilize the MBZ bit as an indicator of the version of the TOS/Class mapping semantics - 0: compliant with [RFC795] (Precedence) and RFC 1349 (TOS) - 1: compliant with the proposed mapping -utilize the service class '0000' to indicate that the mapping is compliant with [RFC795] (Precedence) and RFC 1349 (TOS). In the first approach, the MBZ bit is consumed as a version indicator. In the second approach, the service class '0000' is consumed, but the MBZ bit remains reserved for future specification. The authors believe it to be the case that existing routing protocols typically use a zero TOS value, and further, that most best- effort traffic utilizes the zero TOS value. Existing routing traffic transmitted with a non-zero precedence and zero TOS would continue to receive preferential queueing by routers which implemented either of the above approaches. Best-effort traffic which was transmitted with zero precedence and zero TOS (which we believe to include the bulk of Internet data traffic) and which was not metered would not receive degraded service from routers which implemented either of the above TOS approaches (this could be configurable in routers implementing the proposed mapping). Traffic generated by hosts or routers which have not implemented the proposed TOS semantics and which utilize a non-zero TOS value would be mapped into the corresponding service class by routers implementing the proposed semantics (unless the traffic was remapped upstream by some other device). Since the proposed semantics are Ellesson/Blake Expires 5/98 [Page 7] INTERNET-DRAFT draft-ellesson-tos-00.txt November, 1997 compatible with the TOS classes defined in [RFC1349], this is no more of a potential problem than in the case where hosts or routers which have implemented the proposed TOS mapping are able to send traffic mapped into a service class without network authorization or monitoring. (A scaleable mechanism for network devices to remotely acquire network authorization policy are beyond the scope of this draft.) One effect of this change is that the "Normal" service class can no longer be utilized by hosts, routers, and profile-meters implementing the new proposed TOS semantics (since "Normal" equals "old" semantics). Best-effort traffic which does not require service differentiation, but wishes to take advantage of ECN, for example, would need to specify an alternative service class, such as "Delay Insensitive" ('0001'), or one of the reserved classes (one of these options would be standardized). It should be noted that backwards compatibility is proposed *FOR IPV4 ONLY*. The [RFC795] Precedence semantics would never be utilized by IPv6 routers, and the "Normal" service class ('0000') would be available for best-effort traffic not requiring service differentiation. 6. Security Considerations Security considerations are not discussed in this memo. 7. References [Blake97] S. Blake, "Some Issues and Applications of Packet Marking for Differentiated Services", Internet Draft , November 1997. [Clark97] D. Clark and J. Wroclawski, "An Approach to Service Allocation in the Internet", Internet Draft , July 1997. [CBQ] S. Floyd and V. Jacobson, "Link-sharing and Resource Management Models for Packet Networks", IEEE/ACM Transactions on Networking, Vol. 3 no. 4, pp. 365-386, August 1995. [ECN94] S. Floyd, "TCP and Explicit Congestion Notification", ACM Computer Communications Review, Vol. 24 no. 5, pp. 10- 23, October 1994. [ECN97] K. Ramakrishnan and S. Floyd, "A Proposal to Add Explicit Congestion Notification (ECN) to IPv6 and to TCP", Internet Draft , November 1997. Ellesson/Blake Expires 5/98 [Page 8] INTERNET-DRAFT draft-ellesson-tos-00.txt November, 1997 [Feng97] W. Feng, D. Kandlur, D. Saha, and K. Shin, "Adaptive Packet Marking for Providing Differentiated Services in the Internet", Univ. Michigan Technical Report CSE-TR-347-97, October 1997, http://www.eecs.umich.edu/~wuchang/work/pmg.ps.Z. [Ferg97] P. Ferguson, "Simple Differential Services: IP TOS and Precedence, Delay Indication, and Drop Preference, Internet Draft , November 1997. [GBH97] R. Guerin, S. Blake, and S. Herzog, "Aggregating RSVP- based QoS Requests", Internet Draft , November 1997. [IPv6] S. Deering and R. Hinden, "Internet Protocol, Version 6 (IPv6) Specification", Internet Draft , July 1997. [RED] S. Floyd and V. Jacobson, "Random Early Detection Gateways for Congestion Avoidance", IEEE/ACM Transactions on Networking, August 1993. [RFC795] J. Postel, "Service Mappings", Internet RFC 795, September 1981. [RFC1349] P. Almquist, "Type of Service in the Internet Protocol Suite", Internet RFC 1349, July 1992. [RFC1812] F. Baker, editor, "Requirements for IP Version 4 Routers", Internet RFC 1812, June 1995. [SIMA] K. Kilkki, "Simple Integrated Media Access (SIMA)", Internet Draft , June 1997. Ellesson/Blake Expires 5/98 [Page 9] INTERNET-DRAFT draft-ellesson-tos-00.txt November, 1997 Appendix A: Specifying Multiple Drop Preference Levels Because the concept of service class is completely general, it is possible to utilize different service classes to represent different drop preference levels, as would be needed for example by [SIMA]. Shown below is a possible mapping implementing eight separate drop preference levels (where higher drop preference results in higher probability of loss): DP Service Class (Value) (Name) Drop Preference -- --------------------- ------------------ --------------- 0 0011 Network-specific 1 0 1 0011 Network-specific 1 1 0 0101 Network-specific 2 2 1 0101 Network-specific 2 3 0 0110 Network-specific 3 4 1 0110 Network-specific 3 5 0 0111 Network-specific 4 6 1 0111 Network-specific 4 7 Appendix B: Specifying Multiple Delay Priorities As was mentioned in Sec. 3, there are four service classes defined which specify relative delay priority (three for IPv4 if backwards compatibility with [RFC795] is required): - 0001: Delay Insensitive (minimize cost/worse than best effort) - 0000: Normal (best effort) - 1000: Interactive Delay (low average delay) - 1001: Low Maximum Delay (ie: low delay/jitter, eg. real time) If additional provisioned levels of delay priority are required, they can be implemented using the Network-specific service classes. Shown below is a possible mapping implementing eight separate delay priorities (where a higher priority results in lower average/maximum delay): Service Class (Value) (Name) Delay Priority --------------------- ------------------ -------------- 0001 Delay Insensitive 0 0011 Network-specific 1 1 0101 Network-specific 2 2 1000 Interactive Delay 3 0110 Network-specific 3 4 0111 Network-specific 4 5 1001 Low Maximum Delay 6 1010 Network-specific 5 7 Ellesson/Blake Expires 5/98 [Page 10] INTERNET-DRAFT draft-ellesson-tos-00.txt November, 1997 Authors' Addresses Ed Ellesson JDGA/501 IBM Corporation 4205 S. Miami Blvd. Research Triangle Park, NC 27709 Phone: +1-919-254-4115 Fax: +1-919-254-6243 E-mail: ellesson@raleigh.ibm.com Steven Blake E95/664 IBM Corporation 800 Park Offices Drive Research Triangle Park, NC 27709 Phone: +1-919-254-2030 Fax: +1-919-254-5483 E-mail: slblake@raleigh.ibm.com Ellesson/Blake Expires 5/98 [Page 11]