INTERNET-DRAFT Michael Welzl draft-welzl-ptp-02.txt University of Linz Expiration Date: 7. May 2000 Telecooperation Department Experimental November 1999 The Performance Transparency Protocol (PTP) Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC 2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Abstract More and more modern Internet applications need to know certain network performance parameters in order to adapt. Recent efforts have shown that it is indeed possible for end-systems to determine quite a few of these parameters by probing the network, but the methods used are often time-consuming, mostly show network-unfriendly behaviour and in some cases the results are just not precise enough. We describe a protocol that allows end-systems to efficiently retrieve this information without putting much load onto the network or complexity in routers. Author's Note While this memo has an acknowledgements section, it was decided to mention Alfred Cihal beforehand because he contributed so much. Michael Welzl Expires: 7. May 2000 [Page 1] Internet Draft draft-welzl-ptp-02.txt November 1999 1. Definitions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY" and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. When referring to network performance parameters, "GOOD", "BAD", "BETTER", "WORSE", "BEST" and "WORST" refer to values yielding transmission results that are generally conceived in this manner. For instance, much bandwidth is good whereas much delay is bad. IP considers any and all protocols it runs over as a single "network interface" layer. A similar view is taken by PTP. Unless otherwise noted, only outgoing traffic on an interface is of interest. The term "incoming traffic" refers to incoming traffic at the router's interface where the current PTP-packet has arrived. "Senders" and "Receivers" are applications that send or receive PTP- packets, respectively. In some cases, an application will act as both a sender and receiver. "Applications" can be actual Internet applications or any higher-level protocol that is making use of PTP. TTL is the IP header's "Time to Live" field. 2. Introduction Traditional Internet applications regard the network as a black-box that provides them with some services while maintaining certain characteristics [Jacobson88]. Based on this point of view, bandwidth adaptation involves relying on packet loss as a sign of congestion. Clearly, this is neither an efficient nor a network-friendly way to detect the currently available bandwidth or any other network performance parameter. Among others, [Bolot], [Paxson] and [Jacobson97] have demonstrated how certain router-specific network performance parameters can be estimated without having to change the software in the routers involved. [Carter] and [Sisalem] make use of these rather bandwidth consuming approaches to enhance the performance of Internet applications, indicating that the information obtained from the black box model is not sufficient for this purpose. RFC 2481 [1] also states that the black-box model is inappropriate for delay- or loss- sensitive applications. Using SNMP, network administrators can remotely monitor routers; various tools (Probably the most popular one being the "Multi Router Michael Welzl Expires: 7. May 2000 [Page 2] Internet Draft draft-welzl-ptp-02.txt November 1999 Traffic Grapher" [MRTG]) help them keep track of the data so they can estimate less evident factors such as bandwidth utilization. Concerning performance parameters, it would be desirable to make this kind of functionality available to all Internet applications, but MIB access is restricted. Other than network administrators, the everyday Internet user is interested in the overall network performance, which is the worst performance packets encounter along the path in use (several paths for multicast). Since this path may change at any time, router- specific data should be retrieved as quickly as possible. PTP provides this functionality in an efficient yet simple way designed to avoid flooding the network. 3. Protocol description 3.1. Overview This memo describes a technique to retrieve performance specific information from routers. The PTP model merely consists of a sender that sends any number of PTP packets and a receiver receiving them. As stated earlier, the sender and receiver may be the same. There are two basic methods to get the information: (1) PTP packets are examined by PTP-compliant routers along the path in between the sender and the receiver. They will add their own information and check if the packet has encountered a non-PTP- compliant router. If this is the case and applicable information is being asked for, routers add data for incoming traffic, too. This way, it is possible to compensate for single intermediate routers that do not support PTP. (2) An application may set the Compare Bit and literally ask if certain traffic requirements will be met by the network. Routers will then compare the sender's values with their own and make sure the data is updated with the worst values found. This mechanism was designed for use in conjunction with the Echo Bit which makes a router directly return the packet to the sender if the requirements can not be met. Implementation in routers is optional. There are various ways in which PTP could be used; it is not meant to be a substitute for traditional flow control mechanisms but rather an additional tool to enhance congestion avoidance. For instance, implementing Slow-Start might not be feasible for a distributed real-time multimedia application which uses a certain number of compression ratios for adaptation. Using PTP, such an Michael Welzl Expires: 7. May 2000 [Page 3] Internet Draft draft-welzl-ptp-02.txt November 1999 application can simply ask which ratio will be appropriate. This will be particularly useful to notice when the currently available bandwidth is rising. A receiver can tell if the information obtained for a path is complete. If the information is incomplete, the worst case is an upper limit of the path's performance; for instance, a path's available bandwidth will certainly not exceed the minimum available (or even nominal) bandwidth obtained via PTP. 3.2. End-to-end operation Most applications that use PTP will need to do a reasonable amount of communication not described within this memo, e.g., to send computational results back to a sender, for connection set-up or to request a certain Content Type. If a receiver needs to calculate the maximum amount of non-PTP-compliant routers a packet has encountered, it must be informed about the sender's initial TTL value. In the author's opinion, the way this end-to-end communication is carried out (how often feedback is sent back to the sender, over which period an average bandwidth must be calculated, ..) will highly depend on the application's requirements. Therefore, this memo makes as few prescribtions about its usage as possible. A PTP Content Type's description may contain some specifics about end-to-end treatment. While PTP itself does not provide any multicast functionality, it is possible to send PTP packets to various receivers. It should be noted that results derived from PTP multicasts with set Echo Bits must be interpreted carefully. 4. Specification 4.1. IP requirements and recommendations Hosts and routers that implement PTP need to follow a set of rules with regard to the IP header: o Protocol The "Protocol" field must be set to 123. o Fragmentation The "Don't Fragment" bit must be set by the sender. Generally, PTP packets will not be very large and are not likely to exceed the Michael Welzl Expires: 7. May 2000 [Page 4] Internet Draft draft-welzl-ptp-02.txt November 1999 size of 576 octets (RFC 791 [6] states that a host must be prepared to accept datagrams of that size). In any case, the use of Path MTU Discovery - preferably based on the approach mentioned in this memo rather than RFC 1191 [2] - is highly recommended. o TTL RFC 791 [6] defines TTL as measured in units of seconds, yet RFC 1812 [8] states that "many current implementations treat TTL as a pure hop count, and in parts of the Internet community there is a strong sentiment that the time limit function should instead be performed by the transport protocols that need it". TTL has been replaced by a "Hop Limit" without any time function in IPv6 [9]. For these reasons, PTP-compliant routers are required to treat TTL like the "Hop Limit" described in RFC 1883 [9]. o Router Alert Option The "Router Alert Option" described in RFC 2113 [5] MUST be used with a value of 0 by a sender and SHOULD be supported by routers. 4.2. PTP Header Format A packet has a fixed length header: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | TTL-Check |C|E| DS Length | Dataset Count | Content Type | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Destination Port | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ TTL-Check: 8 bits If information about incoming traffic is desired from the first-hop router and TTL < 255, TTL-Check can initially be set to TTL + 1; otherwise, it must be set to TTL. If a router encounters a packet with TTL-Check <> TTL, the Compare and Echo Bits are 0, the Content Type is indicating a request for interface-specific information and at least part of this data is at hand for incoming traffic, a dataset must be added containing: o as much of the requested interface-specific information as possible for incoming traffic Michael Welzl Expires: 7. May 2000 [Page 5] Internet Draft draft-welzl-ptp-02.txt November 1999 o the address of the interface where the packet arrived in the dataset's address field (if such a field is provided). If the Compare Bit is set and TTL-Check - TTL >= 1, the incoming traffic data must be used for the comparison process described for the Compare Bit instead of adding a new dataset and Dataset Count must be increased by (TTL-Check - TTL) - 1. On forwarding, a router must copy TTL to the TTL-Check field. If the Compare Bit is set and TTL-Check - TTL > 1 or no available incoming traffic information can be used for comparison, the packet must be dropped. Bit 8 - the Compare Bit If the Compare Bit is 1, there is only one dataset. It contains values that have been set by the sender. A router that receives such a packet compares all these values with its own (including incoming traffic data if available and TTL-Check - TTL = 1) and updates the dataset's according fields with the worst values found. If a comparison is impossible, no change should be made. If provided, an address field must always contain the relevant interface's address of the router that did the last update. Implementation of this mechanism is optional; a router may forward packets with a set Compare Bit without doing any PTP processing. It is not advisable for a sender to set the Compare Bit but not the Echo Bit unless the packet size is expected to exceed the Path MTU. Bit 9 - the Echo Bit The Echo Bit may only be set in conjunction with the Compare Bit by a sender. Packets where only the Echo Bit is set must be forwarded without doing any PTP processing. Only if the Echo Bit is set AND the Compare Bit mechanism resulted in a change of the dataset, a router with Echo Bit support will: o copy the IP "Source Address" field to the IP "Destination Address" field o write the forwarding interface's IP address into the IP "Source Address" field Michael Welzl Expires: 7. May 2000 [Page 6] Internet Draft draft-welzl-ptp-02.txt November 1999 o set TTL and TTL-Check to the same initial value (usually 64 [8]) o clear the Compare Bit o forward the packet accordingly Implementation of this mechanism is optional; a router may forward packets with a set Echo Bit without doing any PTP processing. Dataset Length: 6 bits Half the length of a dataset in octets (depending on the Content Type in use). Dataset Count: 8 bits This field must initially be set to 0. At the receiver, (initial TTL - TTL) - Dataset Count represents the maximum amount of non-PTP- compliant routers the packet has encountered if the Compare Bit is not set. If the Compare or Echo Bit is set, this field directly represents the maximum amount of non-PTP-compliant routers the packet has encountered. Content Type: 8 bits This field is used to describe the dataset structure. It will be subject to future standardizations; for now, only 3 values are defined. Data types beginning with "MIB2-" are defined as Object- types in RFC 1213 [4]. Some of the relevant definitions have been updated in RFC 2233 [10]; since the definitions made in MIB RFC's are generally downwards compatible but may require special treatment, it is highly recommended to consider PTP implementation following the most recent definitions, regardless of the document's status. Interface-specific data and addresses always refer to the interface where the PTP packet will be forwarded ("Out") or incoming traffic ("In") in special cases (see TTL-Check), so for all objects from the RFC 1213 [4] Interfaces group, the words "In" and "Out" will be omitted. The sum of the sizes of all fields within a dataset is a multiple of two octets. Depending on the Compare Bit, routers will either read and possibly update an existing dataset or add one or two new ones. In either Michael Welzl Expires: 7. May 2000 [Page 7] Internet Draft draft-welzl-ptp-02.txt November 1999 case, the dataset(s) will have the structure defined by the Content Type. If such a field is provided, a router that writes to a dataset must write the forwarding interface's IP address into the dataset's address field. Defined values: (0) +------------+ | | | MIB2-ifMTU | | | +------------+ Using this Content Type, Path MTU discovery as described in RFC 1191 [2] can be replaced by PTP, provided that enough routers implement it. Senders expecting packet size related problems should make use of this Content Type. The Compare Bit MAY be used in conjunction with this Content Type; a greater ifMTU value will be regarded as BETTER. (1) +--------------+ | | | MIB2-ifSpeed | | | +--------------+ In most cases, ifSpeed will contain the nominal bandwidth. This Content Type, which can be particularly useful in conjunction with the Echo Bit, can be used to determine an upper limit to a path's available bandwidth very simply and quickly. The Compare Bit MAY be used in conjunction with this Content Type; a greater ifSpeed value will be regarded as BETTER. (2) +---------+--------------+ | | | | Address | MIB2-ifSpeed | | | | +---------+--------------+ Address is the forwarding interface's IP address. Using this Content Michael Welzl Expires: 7. May 2000 [Page 8] Internet Draft draft-welzl-ptp-02.txt November 1999 Type, it is possible to build a table of routers with ifSpeed values for later available bandwidth determination using Content Type 4. The Compare Bit MAY be used in conjunction with this Content Type; a greater ifSpeed value will be regarded as BETTER. (3) +---------+------------------+ | | | | Address | MIB2-ifHighSpeed | | | | +---------+------------------+ Address is the forwarding interface's IP address. RFC 2233 [10] defines the ifHighSpeed object that could be used if the value for ifSpeed exceeds the size of the ifSpeed object. This Content Type should only be used if ifSpeed as reported by any other Content Type shows its maximum value (4,294,967,295). The Compare Bit MAY be used in conjunction with this Content Type; a greater ifHighSpeed value will be regarded as BETTER. (4) +---------+-----------+---------------+ | | | | | Address | Timestamp | MIB2-ifOctets | | | | | +---------+-----------+---------------+ Address is the forwarding interface's IP address. The Timestamp format is a right-justified, 32-bit timestamp in milliseconds since midnight UT. Using two packets of this Content Type, it is possible to determine the amount of traffic over a certain period of time. It was was designed to determine the currently available bandwidth by subtracting the calculated traffic from the nominal bandwidth obtained via Content Type 1. This is sensitive to path changes. ifOctets is a 32-bit value which can not be expected to yield unambiguous results if the time between two packets of this Content Type is greater than (4,294,967,296 * 8 / nominal bandwidth in bits per second) seconds. Therefore, the period between two packets carrying MIB2-ifOctets should be significantly less than the time calculated for the maximum nominal bandwidth obtained for the path in order to obtain useful results. The Compare Bit MUST NOT be used in conjunction with this Content Michael Welzl Expires: 7. May 2000 [Page 9] Internet Draft draft-welzl-ptp-02.txt November 1999 Type. (5) +---------+-----------+--------------+---------------+ | | | | | | Address | Timestamp | MIB2-ifSpeed | MIB2-ifOctets | | | | | | +---------+-----------+--------------+---------------+ This Content Type yields all the information which can be obtained with Content Types 2 and 4, thereby reducing the amount of necessary packets for available bandwidth determination to a minimum of two. It has the disadvantage of being bigger, thus increasing the chance of exceeding a link's MTU. The Compare Bit MUST NOT be used in conjunction with this Content Type. Destination Port: 16 bits The Destination Port is used to identify the receiver. Checksum: 16 bits Since the checksum must be recomputed by every router that writes to a PTP packet, its calculation follows the simple algorithm used for both the IP and UDP checksum [6] [7]. Other than the IP checksum, it is a checksum on a complete PTP packet. The algorithm is: The checksum field is the 16 bit one's complement of the one's complement sum of all 16 bit words in the PTP packet. For purposes of computing the checksum, the value of the checksum field is zero. 4.3. PDU format The protocol data unit consists of datasets. A dataset's size (Dataset Length * 2) is given by its Content Type and is a multiple of 2 octets, so padding is not necessary. All datasets are structured equally. The number of datasets is denoted by Dataset Count or is 1 if the Compare or Echo Bit is set. 5. IPv6 usage Although all other IP references in this memo refer to IPv4, PTP is Michael Welzl Expires: 7. May 2000 [Page 10] Internet Draft draft-welzl-ptp-02.txt November 1999 IPv6-compliant. There are a few differences in the way it is used, though: o IPv6 addresses are 128-bit instead of 32-bit, so a dataset's address field will be 128-bit, too. o TTL is called and treated as a "Hop Limit" in IPv6. The only difference in its usage with respect to PTP is that all calculations based on TTL will yield a precise number of routers rather than a maximum or minimum. o There is no "Don't Fragment" flag - IPv6 packets can only be fragmented by the sender itself, which MUST NOT be done in the case of PTP. Path MTU discovery using PTP is still possible, an alternate method for use with IPv6 is described in RFC 1981 [3]. o The "Protocol" field is replaced by the "Next Header" field. Its function remains the same. o RFC 2711 [11] explains how the "Router Alert Option" is used with IPv6. 6. Security Considerations In RFCs and IETF Working Groups, there is a very strict separation between network management and transport protocols. PTP is not meant to be a network management protocol, yet it allows access to data specified in RFC 1213 [4] somewhat similar to the SNMP "read" command. At first sight, network administrators will have their concerns about that. They must accept that certain performance-specific parameters will have to be available to the public in order to make the best utilization of the network. Information about delay and bandwidth should not be regarded as confidential. Care has been taken to restrict the Content Types presented in this document to information that is partially available to end-nodes via the use of more bandwidth- or time-consuming methods anyway. Content Types 1, 2 and 3 contain the currently available nominal bandwidth - pathchar can be used to determine this information [Jacobson97]. Content Type 4 helps to estimate the currently available bandwidth; for the bottleneck bandwidth router, this has been explained in [Carter]. The timestamp and address alone could just as well be determined via IP options, a similar functionality is provided by traceroute. RFC 1191 [2] and RFC 1981 [3] elaborate on how to determine the PathMTU (Content Type 0). Michael Welzl Expires: 7. May 2000 [Page 11] Internet Draft draft-welzl-ptp-02.txt November 1999 While it is impossible to retrieve any information not described as part of a Content Type with PTP, it is insecure in that the throughput of applications relying on PTP data could be restricted if the packets include false information. Programmers using PTP should be aware of that and consider occasionally probing the network with traditional mechanisms in case of very unexpected or particularly bad results. 7. IANA Considerations Future values for the PTP header's 8-bit Content Type field (in this memo, only the values 0, 1 and 2 are defined) require IANA assignment. Following the policies outlined in [IANA-CONSIDERATIONS], numbers in the range 3-199 are allocated via a required specification and numbers in the range 200-255 are allocated through an IETF Consensus action. It would be desirable to have IANA administrate the specifications for numbers in the range 3-199 and make assignments without requiring an outside review. In order for a number to be assigned, the following issues concerning the dataset structure that will be identified through the Content Type field must be addressed in the specification: o The name (the names already used in this memo may be used but not redefined), size (the sum of the sizes of all fields within a dataset must be a multiple of two octets) and number of fields o A description of each field (a field can be a reference to a certain field in a RFC) o What kind of new information this Content Type provides o Why the information that can be obtained via this field should not be regarded as confidential o Whether use of the Compare Bit will be appropriate, and if so, a definition of "BETTER" or "WORSE" o If necessary, specifics about its usage or end-to-end requirements References [1] Ramakrishnan, K., and Floyd, S., "A Proposal to add Explicit Congestion Notification (ECN) to IP", RFC 2481, January 1999. Michael Welzl Expires: 7. May 2000 [Page 12] Internet Draft draft-welzl-ptp-02.txt November 1999 [2] Mogul, J.C., and Deering, S.E., "Path MTU discovery", RFC 1191, November 1990. [3] McCann, J., Deering, S. and Mogul, J., "Path MTU Discovery for IP version 6", RFC 1981, August 1996. [4] McCloghrie, K., and Rose, M.T., "Management Information Base for Network Management of TCP/IP-based internets:MIB-II", STD 17, RFC 1213, March 1991. [5] Katz, D., "IP Router Alert Option", RFC 2113, February 1997. [6] Postel, J., "Internet Protocol", STD 5, RFC 791, September 1981. [7] Postel, J., "User Datagram Protocol", STD 6, RFC 768, August 1980. [8] Baker, F., "Requirements for IP Version 4 Routers", RFC 1812, June 1995. [9] Deering, S., and Hinden, R., "Internet Protocol, Version 6 (IPv6) Specification", RFC 1883, December 1995. [10] McCloghrie, K., and Kastenholz, F., "The Interfaces Group MIB using SMIv2", RFC 2233, November 1997. [11] Partridge, C., and Jackson, A., "IPv6 Router Alert Option", RFC 2711, October 1999. [Bolot] Jean-Chrysostome Bolot, "End-to-End Packet Delay and Loss Behavior in the Internet", Proceedings of SIGCOMM 1993, pp. 289-298. also in Computer Communication Review 23 (4), Oct. 1992. [Carter] Robert L. Carter and Mark E. Crovella, "Measuring Bottleneck Link Speed in Packet-Switched Networks", Technical Report BU-CS-96- 006, Boston University, 1996. [IANA-CONSIDERATIONS] Alvestrand, H. and T. Narten, "Guidelines for Writing an IANA Considerations Section in RFCs", BCP 26, RFC 2434, October 1998. [Jacobson88] Van Jacobson, "Congestion Avoidance and Control", Proceedings of SIGCOMM 1988, pp. 314-329. [Jacobson97] Van Jacobson, "pathchar - a tool to infer characteristics of Internet paths", presented at the Mathematical Sciences Research Institute (MSRI); slides available from ftp://ftp.ee.lbl.gov/pathchar/, April 1997. Michael Welzl Expires: 7. May 2000 [Page 13] Internet Draft draft-welzl-ptp-02.txt November 1999 [MRTG] The "Multi Router Traffic Grapher", available from http://www.mrtg.org/ [Paxson] Vern Paxson, "End-to-End Internet Packet Dynamics", IEEE/ACM Transactions on Networking 7(4), pp. 1-16. [Sisalem] Dorgham Sisalem, Henning Schulzrinne, "The Loss-Delay Based Adjustment Algorithm: A TCP-friendly Adaptation Scheme", Proceedings of NOSSDAV, July 1998. Acknowledgements Without the help of these people, this memo would look quite different and, without a doubt, much worse (in alphabetical order): Alfred Cihal Jon Crowcroft Max Muehlhaeuser Michael Welzl Expires: 7. May 2000 [Page 14] Internet Draft draft-welzl-ptp-02.txt November 1999 Author's Address Michael Welzl University of Linz Institute for Technical Computer Science and Telematics Telecooperation Department Altenberger Str. 69 4040 Linz, Austria Phone: +43 (732) 2468-9264 Fax: +43 (732) 2468-9829 Email: michael@tk.uni-linz.ac.at Michael Welzl Expires: 7. May 2000 [Page 15] Internet Draft draft-welzl-ptp-02.txt November 1999 Full Copyright Statement Copyright (C) The Internet Society (1997). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implmentation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE." Michael Welzl Expires: 7. May 2000 [Page 16]