IPPM H. Song, Ed. Internet-Draft Futurewei Intended status: Informational T. Zhou Expires: October 15, 2020 Z. Li Huawei J. Shin SK Telecom K. Lee LG U+ April 13, 2020 Postcard-based On-Path Flow Data Telemetry draft-song-ippm-postcard-based-telemetry-07 Abstract The document describes a variation of the Postcard-Based Telemetry (PBT), the marking-based PBT. Unlike the instruction-based PBT, as embodied in [I-D.ietf-ippm-ioam-direct-export], the marking-based PBT does not require the encapsulation of a telemetry instruction header so it avoids some of the implementation challenges of the instruction-based PBT. This documents discuss the issues and solutions of the marking-based PBT. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on October 15, 2020. Copyright Notice Copyright (c) 2020 IETF Trust and the persons identified as the document authors. All rights reserved. Song, et al. Expires October 15, 2020 [Page 1] Internet-Draft Postcard-Based Telemetry April 2020 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 2 2. PBT-M: Marking-based PBT . . . . . . . . . . . . . . . . . . 4 3. New Challenges . . . . . . . . . . . . . . . . . . . . . . . 6 4. Considerations on PBT-M Design . . . . . . . . . . . . . . . 6 4.1. Packet Marking . . . . . . . . . . . . . . . . . . . . . 7 4.2. Flow Path Discovery . . . . . . . . . . . . . . . . . . . 7 4.3. Packet Identity for Export Data Correlation . . . . . . . 8 4.4. Avoid Packet Marking through Node Configuration . . . . . 8 5. Postcard Format . . . . . . . . . . . . . . . . . . . . . . . 9 6. Security Considerations . . . . . . . . . . . . . . . . . . . 9 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 9 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 9 10. Informative References . . . . . . . . . . . . . . . . . . . 9 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11 1. Motivation In order to gain detailed data plane visibility to support effective network OAM, it is important to be able to examine the trace of user packets along their forwarding paths. Such on-path flow data reflect the state and status of each user packet's real-time experience and provide valuable information for network monitoring, measurement, and diagnosis. The telemetry data include but not limited to the detailed forwarding path, the timestamp/latency at each network node, and, in case of packet drop, the drop location and reason. The emerging programmable data plane devices allow user-defined data collection or conditional data collection based on trigger events. Such on-path flow data are from and about the live user traffic, which complement the data acquired through other passive and active OAM mechanisms such as IPFIX [RFC7011] and ICMP [RFC2925]. On-path telemetry was developed to cater the need for collecting on- path flow data. There are two basic modes for on-path telemetry: the Song, et al. Expires October 15, 2020 [Page 2] Internet-Draft Postcard-Based Telemetry April 2020 passport mode and the postcard mode. In the passport mode, each node on the path adds the telemetry data to the user packets (i.e., stamp the passport). The accumulated data trace carried by user packets are exported at a configured end node. In the postcard mode, each node directly exports the telemetry data using an independent packet (i.e., send a postcard) to avoid the need of carrying the data with user packets. In-situ OAM trace option (IOAM) [I-D.ietf-ippm-ioam-data] is a representative of the passport mode on-path telemetry. A prominent advantage of the passport mode is that it naturally retains the telemetry data correlation along the entire path. The passport mode also reduces the number of data export packets. These help to simplify the data collector and analyzer's work. On the other hand, the passport mode faces the following challenges. o Issue 1: Since the telemetry instruction header and data processing must be done in the data-plane fast-path, it may interfere with the normal traffic forwarding (e.g., leading to forwarding performance degradation) and lead to inaccurate measurements (e.g., resulting in longer latency measurements than usual). This undesirable "observer effect" is problematic to carrier networks where stringent SLA must be observed. o Issue 2: The passport mode may significantly increase the user packet's original size by adding data at each on-path node. The size may exceed the path MTU so either the techniuqe cannot apply or the packet needs to be fragmented. This is especially troubling when some other network service headers (e.g., segment routing or service functoin chaining) are also present. Limiting the data size or path length reduces the effectiveness of INT. o Issue 3: The instruction header needs to be encapsulated into user packets for transport. [I-D.brockners-inband-oam-transport] has discussed several encapsulation approaches for different transport protocols. However, There is no feasible solutions so far to encapsulate the instruction header in MPLS and IPv4 networks which are still the most widely deployed. It is also challenging to encapsulate the instruciton header in IPv6 [I-D.song-ippm-ioam-ipv6-support]. o Issue 4: Transported in plain text along the network paths, the instruction header and data are vulnerable to eavesdropping and tampering as well as DoS attack. Extra protective measurement is difficult on the data-plane fast-path. o Issue 5: Since the passport mode only exports the telemetry data at the designated end node, if the packet is dropped in the Song, et al. Expires October 15, 2020 [Page 3] Internet-Draft Postcard-Based Telemetry April 2020 network, the data will be lost as well. It cannot pinpoint the packet drop location which is desired by fault diagnosis. Even worse, the end node may be unaware of the packet and data loss at all. The postcard mode provides a perfect complement to the passport mode. In postcard-based telemetry (PBT), the postcards that carry telemetry data can be generated by a node's slow path and transported in band or out of band, independent of the original user packets. IOAM direct export option (DEX) [I-D.ietf-ippm-ioam-direct-export] is a representative of PBT. Since an instruction header is still needed, while successfully addressing the Issue 2 and 5 and partially addressing the Issue 1 and 4, this type of instruction-based PBT still cannot address the Issue 3. This document describes another variation of the postcard mode on- path telemetry, the marking-based PBT (PBT-M). Unlike the instruction-based PBT, the marking-based PBT does not require the encapsulation of a telemetry instruction header so it avoids some of the implementation challenges of the instruction-based PBT. This documents discuss the issues and solutions of the marking-based PBT. 2. PBT-M: Marking-based PBT As the name suggests, PBT-M only needs a marking-bit in the existing headers of user packets to trigger the telemetry data collection and export. The sketch of PBT-M is as follows. The user packet, if its on-path data need to be collected, is marked at the path head node. At each PBT-aware node, if the mark is detected, a postcard (i.e., the dedicated OAM packet triggered by a marked user packet) is generated and sent to a collector. The postcard contains the data requested by the management plane. The requested data are configured by the management plane through data set templates (as in IPFIX [RFC7011]). Once the collector receives all the postcards for a single user packet, it can infer the packet's forwarding path and analyze the data set. The path end node is configured to unmark the packets to its original format if necessary. The overall architecture of PBT-M is depict in Figure 1. Song, et al. Expires October 15, 2020 [Page 4] Internet-Draft Postcard-Based Telemetry April 2020 +------------+ +-----------+ | Network | | Telemetry | | Management |(-------| Data | | | | Collector | +-----:------+ +-----------+ : ^ :configurations |postcards (OAM pkts) : | ...............:.....................|........ : : : | : : +---------:---+-----------:---+--+-------:---+ : | : | : | : | V | V | V | V | +------+-+ +-----+--+ +------+-+ +------+-+ usr pkts | Head | | Path | | Path | | End | ====>| Node |====>| Node |====>| Node |====>| Node |====> | | | A | | B | | | +--------+ +--------+ +--------+ +--------+ gen postcards gen postcards gen postcards gen postcards mark usr pkts unmark usr pkts Figure 1: Architecture of PBT-M PBT-M aims to fully address the issues listed above. It also introduces some new benefits. The advantages of PBT-M are as follows. o 1: PBT-M avoid augmenting user packets with new headers and introducing new data plane protocols. The telemetry data collecting signaling remains in data plane. o 2: PBT-M is extensible for collecting arbitrary new data to support possible future use cases. The data set to be collected can be configured through management plane or control plane. Since there is no limitation on the types of data, any data other than those defined in [I-D.ietf-ippm-ioam-data] can also be collected. Since there is no size constraints any more, it is free to use the more flexible data set template for data type definition. o 3: PBT-M avoids interfering the normal forwarding and affecting the forwarding performance. Hence, the collected data are free to be transported independently through in-band or out-of-band channels. The data collecting, processing, assembly, encapsulation, and transport are therefore decoupled from the forwarding of the corresponding user packets and can be performed in data-plane slow-path if necessary. Song, et al. Expires October 15, 2020 [Page 5] Internet-Draft Postcard-Based Telemetry April 2020 o 4: For PBT-M, the types of data collected from each node can vary depending on application requirements and node capability. This is either impossible or very difficult to be supported by the passport mode in which data types collected per node are conveyed by the instruction header. o 5: PBT-M makes it easy to secure the collected data without exposing it to unnecessary entities. For example, both the configuration and the telemetry data can be encrypted before being transported, so passive eavesdropping and man-in-the-middle attack can both be deterred. o 6: Even if a user packet under inspection is dropped at some node in network, the postcards that are collected from the previous nodes are still valid and can be used to diagnose the packet drop location and reason. 3. New Challenges Although PBT-M addresses the issues of the passport mode telemetry and the instruction-based PBT, it introduces a few new challenges. o Challenge 1: A user packet needs to be marked in order to trigger the path-associated data collection. Since we do not want to augment user packets with any new header fields, we must reuse some bit from existing header fields. o Challenge 2: Since the packet header will not carry OAM instructions any more, the data plane devices need to be configured to know what data to collect. However, in general, the forwarding path of a flow packet (due to ECMP or dynamic routing) is unknown beforehand (note that there are some notable exceptions such as segment routing). Configuring the data set for each flow at all data plane devices is expensive in terms of configuration load and data plane resources. o Challenge 3: Due to the variable transport latency, the dedicated postcard packets for a single packet may arrive at the collector out of order or be dropped in networks for some reason. In order to infer the packet forwarding path, the collector needs some information from the postcard packets to identify the user packet affiliation and the order of path node traversal. 4. Considerations on PBT-M Design To address the above challenges, we propose several design details of PBT-M. Song, et al. Expires October 15, 2020 [Page 6] Internet-Draft Postcard-Based Telemetry April 2020 4.1. Packet Marking To trigger the path-associated data collection, usually a single bit from some header field is sufficient. While no such bit is available, other packet marking techniques are needed. we discuss three possible application scenarios. o IPv4. IPFPM [I-D.ietf-ippm-alt-mark] is an IP flow performance measurement framework which also requires a single bit for packet coloring. The difference is that IPFPM does in-network measurement while PBT-M only collects and exports data at network nodes (i.e., the data analysis is done at the collector rather than in the network nodes). IPFPM suggests to use some reserved bit of the Flag field or some unused bit of the TOS field. Actually, IPFPM can be considered a subcase of PBT-M so the same bit can be used for PBT-M. The management plane is responsible to configure the actual operation mode. o SFC NSH. The OAM bit in NSH header can be used to trigger the on- path data collection [I-D.ietf-sfc-nsh]. PBT does not add any other metadata to NSH. o MPLS. Instead of choosing a header bit, we take advantage of the synonymous flow label [I-D.bryant-mpls-synonymous-flow-labels] approach to mark the packets. A synonymous flow label indicates the on-path data should be collected and forwarded through a postcard. o SRv6: A flag bit in SRH can be reserved to trigger the on-path data collection. 4.2. Flow Path Discovery In case the path a flow traverses is unknown in advance, all PBT- aware nodes are configured to react to the marked packets by exporting some basic data such as node ID and TTL before a data set template for that flow is configured. This way, the management plane can learn the flow path dynamically. If the management plane wants to collect the on-path data for some flow, it configures the head node(s) with a probability or time interval for the flow packet marking. When the first marked packet is forwarded in the network, the PBT-aware nodes will export the basic data to the collector. Hence, the flow path is identified. If other types of data need to be collected, the management plane can further configure the data set template to the target nodes on the flow's path. The PBT-aware nodes would collect and export data Song, et al. Expires October 15, 2020 [Page 7] Internet-Draft Postcard-Based Telemetry April 2020 accordingly if the packet is marked and a data set template is present. If for any reason the flow path is changed, the new path nodes can be learned immediately by the collector, so the management plane controller can be informed to configure the new path nodes. The outdated configuration can be automatically timed out or explicitly revoked by the management plane controller. 4.3. Packet Identity for Export Data Correlation The collector needs to correlate all the postcard packets for a single user packet. Once this is done, the TTL (or the timestamp, if the network time is synchronized) can be used to infer the flow forwarding path. The key issue here is to correlate all the postcards for a same user packet. The first possible approach is to include the flow ID plus the user packet ID in the OAM packets. For example, the flow ID can be the 5-tuple IP header of the user traffic, and the user packet ID can be some unique information pertaining to a user packet (e.g., the sequence number of a TCP packet). If the packet marking interval is large enough, then the flow ID itself is enough to identify the user packet. That is, we can assume all the exported postcard packets for the same flow during a short period of time belong to the same user packet. Alternatively, if the network is synchronized, then the flow ID plus the timestamp at each node can also infer the postcard affiliation. However, some errors may occur under some circumstances. For example, if two consecutive user packets from the same flows are both marked but one exported postcard from a node is lost, then it is difficult for the collector to decide which user packet the remaining postcard belongs to. In many cases, such rare error has no catastrophic consequence therefore is tolerable. 4.4. Avoid Packet Marking through Node Configuration It is possible to avoid needing to mark user packets yet still allowing in-band flow data collection. We could simply configure the Access Control List (ACL) to filter out the set of target flows. This approach has two potential issues: (1) Since the packet forwarding path is unknown in advance, one needs to configure all the nodes in a network to filter the flows and capture the complete data set. This wastes the precious ACL resource and is not scalable. (2) If a node cannot collect data for all the filtered packets of a flow, it needs to determine which packets to sample independently, so the Song, et al. Expires October 15, 2020 [Page 8] Internet-Draft Postcard-Based Telemetry April 2020 collector may not be able to receive the full set of postcards for a same user packet. Nevertheless, since this approach does not require to touch the user packets at all, it has its unique merits: (1) User can freely choose any nodes as vantage points for data collection; (2) No need to worry that any "modified" user packets to leak out of the PBT domain; (3) It has the minimum impact to the forwarding of the user traffic. No data plane standard is required to support this mode, except the postcard format. 5. Postcard Format Postcard can use the same data export format as that used by IOAM. [I-D.spiegel-ippm-ioam-rawexport] proposes a raw format that can be interpreted by IPFIX. 6. Security Considerations Several security issues need to be considered. o Eavesdrop and tamper: the postcards can be encrypted and authenticated to avoid such security threats. o DoS attack: PBT can be limited to a single administration domain. The mark must be removed at the egress domain edge. The node can rate limit the extra traffic incurred by postcards. 7. IANA Considerations No requirement for IANA is identified. 8. Contributors TBD. 9. Acknowledgments TBD. 10. Informative References Song, et al. Expires October 15, 2020 [Page 9] Internet-Draft Postcard-Based Telemetry April 2020 [I-D.brockners-inband-oam-transport] Brockners, F., Bhandari, S., Govindan, V., Pignataro, C., Gredler, H., Leddy, J., Youell, S., Mizrahi, T., Mozes, D., Lapukhov, P., and R. Chang, "Encapsulations for In- situ OAM Data", draft-brockners-inband-oam-transport-05 (work in progress), July 2017. [I-D.bryant-mpls-synonymous-flow-labels] Bryant, S., Swallow, G., Sivabalan, S., Mirsky, G., Chen, M., and Z. Li, "RFC6374 Synonymous Flow Labels", draft- bryant-mpls-synonymous-flow-labels-01 (work in progress), July 2015. [I-D.ietf-ippm-alt-mark] Fioccola, G., Capello, A., Cociglio, M., Castaldelli, L., Chen, M., Zheng, L., Mirsky, G., and T. Mizrahi, "Alternate Marking method for passive and hybrid performance monitoring", draft-ietf-ippm-alt-mark-14 (work in progress), December 2017. [I-D.ietf-ippm-ioam-data] Brockners, F., Bhandari, S., Pignataro, C., Gredler, H., Leddy, J., Youell, S., Mizrahi, T., Mozes, D., Lapukhov, P., remy@barefootnetworks.com, r., daniel.bernier@bell.ca, d., and J. Lemon, "Data Fields for In-situ OAM", draft- ietf-ippm-ioam-data-09 (work in progress), March 2020. [I-D.ietf-ippm-ioam-direct-export] Song, H., Gafni, B., Zhou, T., Li, Z., Brockners, F., Bhandari, S., Sivakolundu, R., and T. Mizrahi, "In-situ OAM Direct Exporting", draft-ietf-ippm-ioam-direct- export-00 (work in progress), February 2020. [I-D.ietf-sfc-nsh] Quinn, P., Elzur, U., and C. Pignataro, "Network Service Header (NSH)", draft-ietf-sfc-nsh-28 (work in progress), November 2017. [I-D.song-ippm-ioam-ipv6-support] Song, H., Li, Z., and S. Peng, "Approaches on Supporting IOAM in IPv6", draft-song-ippm-ioam-ipv6-support-00 (work in progress), March 2020. [I-D.spiegel-ippm-ioam-rawexport] Spiegel, M., Brockners, F., Bhandari, S., and R. Sivakolundu, "In-situ OAM raw data export with IPFIX", draft-spiegel-ippm-ioam-rawexport-01 (work in progress), October 2018. Song, et al. Expires October 15, 2020 [Page 10] Internet-Draft Postcard-Based Telemetry April 2020 [RFC2925] White, K., "Definitions of Managed Objects for Remote Ping, Traceroute, and Lookup Operations", RFC 2925, DOI 10.17487/RFC2925, September 2000, . [RFC7011] Claise, B., Ed., Trammell, B., Ed., and P. Aitken, "Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of Flow Information", STD 77, RFC 7011, DOI 10.17487/RFC7011, September 2013, . Authors' Addresses Haoyu Song (editor) Futurewei 2330 Central Expressway Santa Clara, 95050 USA Email: hsong@futurewei.com Tianran Zhou Huawei 156 Beiqing Road Beijing, 100095 P.R. China Email: zhoutianran@huawei.com Zhenbin Li Huawei 156 Beiqing Road Beijing, 100095 P.R. China Email: lizhenbin@huawei.com Jongyoon Shin SK Telecom South Korea Email: jongyoon.shin@sk.com Song, et al. Expires October 15, 2020 [Page 11] Internet-Draft Postcard-Based Telemetry April 2020 Kyungtae Lee LG U+ South Korea Email: coolee@lguplus.co.kr Song, et al. Expires October 15, 2020 [Page 12]