NVO3 Working Group H. Chen INTERNET-DRAFT Y. Li Intended Status: Informational Huawei Technologies Expires: January 7, 2016 July 6, 2015 Using IPID for Performance Monitoring in VxLAN Network draft-chen-nvo3-ipid-pm-00 Abstract IP Identification(IPID)is a field in IP header primarily used to uniquely identify the group of fragments of a single IP packet. The value of IPID field in a packet from a specific traffic flow or source IP address keeps increasing until wrapped-around. This document specifies a method by carefully examining IPID value to monitor the performance of VXLAN network. In this memo packet loss measurement is mainly considered. This method requires no extra hardware support, which means it is compatible with most of the deployed routers or switches. Such a mechanism is applicable to IPv4 network and potential useful in overlay network with different data encapsulation. Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Chen & Li, et al [Page 1] INTERNET DRAFT IPID based Performance Monitoring July 2015 Copyright and License Notice Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. IPID Overview . . . . . . . . . . . . . . . . . . . . . . . . . 5 4. Packet Loss Measurement . . . . . . . . . . . . . . . . . . . . 8 5. Security Considerations . . . . . . . . . . . . . . . . . . . . 8 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8 7.1 Normative References . . . . . . . . . . . . . . . . . . . 8 7.2 Informative References . . . . . . . . . . . . . . . . . . 9 Chen & Li, et al [Page 2] INTERNET DRAFT IPID based Performance Monitoring July 2015 1. Introduction Performance Monitoring(PM) is a crucial part of network OAM, which mainly includes the packets loss and delay measurement. PM methods are usually classified into two categories: active(involving the addition of test traffic) or passive(no interference with normal traffic). Both of active and passive methods have their own strengths. Active method needs injecting test traffic from one measurement point to the other point, which can not be guaranteed to experience the same path with the data traffic where Equal Cost Multiple Paths(ECMP) exists. However, in overlay network, e.g VxLAN, ECMP is common, which means passive method is more appropriate. IP Identification(IPID) is a field in IP header, which can be used to implement the passive PM method. The example IPv4 header is shown in Figure 1. IPID is primarily used for uniquely identifying the group of fragments of a single IP packet. The value of IPID field in a packet from a specific traffic flow or source IP address keeps increasing until wrapped-around. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Version| IHL |Type of Service| Total Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Identification |Flags| Fragment Offset | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Time to Live | Protocol | Header Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Destination Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 1: Example IPv4 Header IPID is required to be unique within the maximum lifetime for all packets with a given source address/destination address/protocol tuple. Hence, each packet in a specific flow has a unique IPID. Packets within a flow continuously increases the IPID value till it reaches the maximum value. Then it wraps around and increases again. An example Controller-based VxLAN network can be shown as Figure 2. There is a controller connects to NVE A and NVE B. Assume there is a flow transmitted from VM1 to VM3(VM1->NVE A->SW M->SW N->NVE B->VM3), Chen & Li, et al [Page 3] INTERNET DRAFT IPID based Performance Monitoring July 2015 it is necessary to implement the packet loss measurement at NVE A and NVE B. This document specifies a method by carefully examining IPID value to monitor the performance of Controller-based VXLAN network. In this memo packet loss measurement is mainly considered. The Controller will specify which flow to be monitored. Before start monitoring, it will send the flow information to the specific NVEs. During the monitoring period, the Controller will collect statistics information from the specific NVEs to measure packet loss and delay value. *************************** * +--------------+ * * | Controller | * * +-|---------|--+ * * / | | \ * * / | | \ * +---------+ * / | | \ * +---------+ |+---+ | * / | | \* | +---+| ||VM1| | +--/+ +-|-+ +-|-+ +-\-+ | |VM3|| |+---+ +---+NVE+---+SW +-----+SW +---+NVE+---+ +---+| | +---+| +-A-+ +-M-+ +-N-+ +-B-+ |+---+ | | |VM2|| * * ||VM4| | | +---+| * VxLAN Overlay * |+---+ | +---------+ * Network * +---------+ Tenant * * Tenant System * * System *************************** Figure 2: Example Controller-based VxLAN Network This method requires no extra hardware support, which means it is compatible with most of the deployed routers or switches. Such a mechanism is applicable to IPv4 network and potential useful in overlay network with different data encapsulation. 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. This document makes use of the following terms, additional terms are defined in [RFC7348] o ECMP - Equal Cost Multiple Paths Chen & Li, et al [Page 4] INTERNET DRAFT IPID based Performance Monitoring July 2015 o IPID - IP Identification o MSB - Most Significant Bit o OG - Observation Group o PM - Performance Monitoring 3. IPID Overview This document mainly considers the IPID in IPv4 header. As defined in[RFC791], IPID field holds 16 bits. It is used together with the source and destination address, and the protocol fields, to identify datagram fragments for reassembly. There used to be some experimental works using IPID field for other purposes, such as for adding packet-tracing information to help trace packets with spoofed source addresses[Savage_2000]. However, [RFC6864] prohibits these kind of uses. It claims that the IPv4 ID field MUST NOT be used for purposes other than fragmentation and reassembly. Besides, [Chen_2004] describes that the 16-bit IPID field carries a copy of the current value of a counter in a host's IP stack. Current versions of Windows implement this counter as a global counter. That is, IPID value is continuously increasing per source IP address. On the contrary, current versions of Linux implement this counter as a per-flow counter. That is, IPID value is continuously increasing per flow. The authors also did extensive experiment to prove the incremental feature of IPID value. To sum up, IPID field can only be set by the Tenant-system and used as a sequence number of packets flow. Observing IPID's incremental feature, it is possible to take one bit in IPID field as the Criterion bit(C bit), to divide one packets flow into several Observation Groups(OGs). By collecting the observed packet number and starting time of each OG from the relevant NVEs, the controller can measure packet loss and delay of the flow. The VxLAN encapsulation [RFC7348] includes an outer IP header and an inner IP header, both of which have IPID field - i.e. outer IPID and inner IPID respectively. Because it's the inner header that reflects the real flow info, this memo only use the inner IPID for performance monitoring. Theoretically, each bit of IPID field can be used as the C bit. But selecting the Criterion bit is a little bit tricky, because high- order bit varies slowly while low-order bit varies quickly. The selection of C bit should consider the flow rate. To illustrate, as Chen & Li, et al [Page 5] INTERNET DRAFT IPID based Performance Monitoring July 2015 Figure 3 shows, taking IPID's most significant bit(MSB) as the C bit, then each OG contains up to 2^15 = 32,768 packets. In the real deployment in data center network, most of the user traffic is usually lower than the rate of 1G bps. In this case, IPID will wrap- around in approximate 0.8s. When user traffic is up to 10G bps, the IPID will wrap-around more quickly, may be less than 80ms. 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | | | | | | | | | | | | |C| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 3: Example Criterion Bit Figure 4 is a simple example to illustrate how the C bit is used to divide the packets flow into sequential OGs. Suppose the first packet observed has the IPID value 0x00FC(bit 8 = 0). The first 4 packets have the same C bit(C = 0) while the last 4 packets have the same C bit(C = 1). Index H C L +-+ 1 0 0 0 0 | 0 0 0|0|| 1 1 1 1 | 1 1 0 0 <-+ 2 0 0 0 0 | 0 0 0|0|| 1 1 1 1 | 1 1 0 1 | Group 1 3 0 0 0 0 | 0 0 1|0|| 1 1 1 1 | 1 1 1 0 | (C = 0) 4 0 0 0 0 | 0 0 1|0|| 1 1 1 1 | 1 1 1 1 <-+ 5 0 0 0 0 | 0 0 1|1|| 0 0 0 0 | 0 0 0 0 <-+ 6 0 0 0 0 | 0 0 1|1|| 0 0 0 0 | 0 0 0 1 | Group 2 7 0 0 0 0 | 0 0 1|1|| 0 0 0 0 | 0 0 1 0 | (C = 1) 8 0 0 0 0 | 0 0 1|1|| 0 0 0 0 | 0 0 1 1 <-+ ... ... +-+ ... ... Group k Figure 4: Example C bit based OG division To illustration, as shown in Figure 2 VM1 initiates a communication to VM3. The packets flow from VM1 to VM3 will go through NVE A/B and underlay switch M/N . The Controller will send a PM command to NVE A and NVE B simultaneously. The PM command specifies the following information: 1. which bit in IPID field will be taken as the C bit; 2. flow information, including IP address of VM1 and VM3 and the the protocol type(e.g. TCP or UDP). On receipt of this command, NVE A/B will count the transmitted Chen & Li, et al [Page 6] INTERNET DRAFT IPID based Performance Monitoring July 2015 /received packets respectively in each OGs. The OGs are divided based on the value of C bit. An integrated OG could be determined by two adjacent switching of C bit. To illustrate, as shown in Figure 4, switching from 0 to 1 could be seen as the start point of group 2 while switching from 1 to 0 could be seen as the end point of group 2. When NVE A and B start to count, firstly they have to determine the integrated OGs. Then NVE A and NVE B will report the counting results to the Controller. The example counting results of NVE A is shown as below +-------------+-------+---------+ | Group index | C bit | pkt num | +-------------+-------+---------+ | 1 | 1 | a | | 2 | 0 | b | | 3 | 1 | c | | 4 | 0 | d | +-------------+-------+---------+ Table 1: Example counting results of NVE A Each time an integrated OG is counted, NVE A will report the results to the Controller. The controller will record the time on receipt of the results as t_A. The example counting results of NVE B is shown as below +-------------+-------+---------+ | Group index | C bit | pkt num | +-------------+-------+---------+ | 1 | 0 | k' | | 2 | 1 | a' | | 3 | 0 | b' | | 4 | 1 | c' | +-------------+-------+---------+ Table 2: Example counting results of NVE B NVE B will report the counting results to the controller in the same way as NVE A. The controller will also record the time on receipt of the results as t_B. In order to determine whether these two OGs are matched, the Controller has to go through the following two step Chen & Li, et al [Page 7] INTERNET DRAFT IPID based Performance Monitoring July 2015 1. compare the C bit value of these two OGs, 2. compare |t_A - t_B| with the value of T, where T is the time duration of one single OG. T is determined by the configuration of C bit and the flow rate. For example, OG(1) in Table 1 has C = 1 while OG(1) in Table 2 has C = 0. These two OGs do not have the same C bit value, thus the Controller does not consider these two OGs are matched. On the other hand, OG(2) in Table 2 is the next immediate OG and has C = 1. These two OGs have the same C bit value, then the Controller will go to next step to compare |t_A - t_B| with T. If |t_A - t_B| < T, then the Controller considers these two OGs are matched. Otherwise, the Controller considers these two OGs are not matched and simply ignores them. For the case these two OGs are matched, packet number counted in these two OGs can be used to determine whether the packet loss take place between NVE A and NVE B. 4. Packet Loss Measurement Packet loss measurement could be done by comparing the counted packet number between the matched OGs. In the example of Section 3, packet loss could be computed as follows: Pkt_Loss = |a - a'| + |b - b'| + |c - c'|. 5. Security Considerations Security considerations are not addressed in this document. 6. IANA Considerations No IANA action is needed for this document. 7. References 7.1 Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC791] Postel, J., "Internet Protocol", September 1981. Chen & Li, et al [Page 8] INTERNET DRAFT IPID based Performance Monitoring July 2015 7.2 Informative References [Chen_2004] Chen, W., Huang, Y., Ribeiro, B., Suh, K., Zhang, H., Silva, E., Kurose, J. and D. Towsley, "Exploiting the IPID field to infer network path and end-system characteristics", 2004. [RFC6864] Touch, J., "Updated Specification of the IPv4 ID Field", February 2013. [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, L., Sridhar, T., Bursell, M. and C. Wright, "Virtual eXtensible Local Area Network (VXLAN): A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks", August 2014. [Savage_2000] Savage, S., Wetherall, D., Karlin, A. and T. Anderson, "Practical Network Support for IP Traceback", October 2000. Authors' Addresses Hao Chen Huawei Technologies 101 Software Ave., Yuhuatai Dist. Nanjing, Jiangsu 210012 China Phone: +86-25-56624440 EMail: philips.chenhao@huawei.com Yizhou Li Huawei Technologies 101 Software Ave., Yuhuatai Dist. Nanjing, Jiangsu 210012 China Phone: +86-25-56624629 EMail: liyizhou@huawei.com Chen & Li, et al [Page 9]