Network Working Group A. Choudhary Internet-Draft Cisco Systems Intended status: Standards Track May 12, 2018 Expires: November 13, 2018 QoS Telemetry Requirements draft-asechoud-rtgwg-qos-telemetry-req-00 Abstract This document discusses QoS requirements for data model based network telemetry. QoS configuration and operational models have been defined as part of [I-D.asechoud-rtgwg-qos-model] and [I-D.asechoud-rtgwg-qos-oper-model] respectively. This document describes the requirement to extend the models to support QoS Telemetry. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on November 13, 2018. Copyright Notice Copyright (c) 2018 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of Choudhary Expires November 13, 2018 [Page 1] Internet-Draft QoS Telemetry Requirements May 2018 the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 2 3. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 2 3.1. Granularity and Completeness . . . . . . . . . . . . . . 2 3.2. Scale . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3.3. Cadence . . . . . . . . . . . . . . . . . . . . . . . . . 3 3.4. Time-stamping . . . . . . . . . . . . . . . . . . . . . . 3 3.5. Grouping . . . . . . . . . . . . . . . . . . . . . . . . 3 3.6. Filtering . . . . . . . . . . . . . . . . . . . . . . . . 3 3.7. Aggregation . . . . . . . . . . . . . . . . . . . . . . . 3 3.8. Threshold . . . . . . . . . . . . . . . . . . . . . . . . 4 4. Security Considerations . . . . . . . . . . . . . . . . . . . 4 5. Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 4 6. Normative References . . . . . . . . . . . . . . . . . . . . 4 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 5 1. Motivation Network visibility is an important aspect of Network availability. QoS counters provide good insight into network device performance, congestion and security. Continuous monitoring of each QoS resource may not be always desired. Mechanism to monitor data set of QoS resources is needed. The motivation of this document is to come up with the set of requirements of such a mechanism. 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. 3. Requirements 3.1. Granularity and Completeness Statistics passed for a QoS resource should be complete at the same time granular to avoid sending undesired information. A particular counter in isolation may not provide sufficient information about a QoS resource. E.g tail-drop counter of a queue is not sufficient to define state of a Queue unless some other counters like maximum queue size and average queue size is known. Hence, it is important to get complete context of a resource based on device capability. Choudhary Expires November 13, 2018 [Page 2] Internet-Draft QoS Telemetry Requirements May 2018 3.2. Scale It is important to have visibility into vast number of QoS resources in a network device on a regular basis. Some of the devices, e.g., may support millions of queues in a single device. To be able to scale, it is desired to have data sets of important resources and monitor based on the need. 3.3. Cadence Cadence defines the frequency of data collection from the forwarding path. Cadence can be limited by device capability as well as based on the amount of data requested. It can also be desired to have higher cadence of a resource in critical condition versus when it is not in critical condition. 3.4. Time-stamping Time stamping defines the time when the data was collected from the data path. Time stamping helps in calculating various traffic rates and draw right patterns. 3.5. Grouping There may be multiple collectors of same telemetry data. The purpose and focus of each collector may be different. By defining the right set of groupings, a collector may be able to easily fetch the desired data. E.g A network slice may define set of QoS resources on each interface. A collector may be interested in a particular network slice may request the data accordingly. Similarly, queues data on an interfaces or set of interfaces can be defined as group. 3.6. Filtering Many times a collector is interested in specific data, e.g. Real- time queue on an egress interface or metering ([RFC2697] and [RFC2698]) data on a best-effort traffic of an ingress interface. An effective filtering mechanism can be done in the network device or by the collector. 3.7. Aggregation Sometime aggregation of data becomes important to define meaning of the data. E.g. Consider a QoS policy applied on various ingress interfaces. An underway DDOS attack can be better understood when all the traffic to a particular destination coming through various interfaces is summed up. Aggregation can also be done for multiple Choudhary Expires November 13, 2018 [Page 3] Internet-Draft QoS Telemetry Requirements May 2018 QoS resources within a Policy to save important hardware counter resources. 3.8. Threshold Collector may not be interested in a QoS resource data till it is in critical condition. E.g. a tail-drop is seen on a particular best-effort queue or queue is built up on a critical data of WFQ. Many times it follows a pattern, like 9 am in the morning when the trading starts, drops are seen on a particular queue but otherwise there are no drops. So, it becomes important to observe a resource in a critical condition and avoid otherwise. Defining a threshold helps collector and device alike. Also, it is important to define how long a resource will be monitored once it is out of critical condition. 4. Security Considerations 5. Acknowledgement 6. Normative References [I-D.asechoud-rtgwg-qos-model] Choudhary, A., Jethanandani, M., Strahle, N., Aries, E., and I. Chen, "YANG Model for QoS", draft-asechoud-rtgwg- qos-model-05 (work in progress), March 2018. [I-D.asechoud-rtgwg-qos-oper-model] Choudhary, A. and I. Chen, "YANG Model for QoS Operational Parameters", draft-asechoud-rtgwg-qos-oper-model-01 (work in progress), May 2018. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC2697] Heinanen, J. and R. Guerin, "A Single Rate Three Color Marker", RFC 2697, DOI 10.17487/RFC2697, September 1999, . [RFC2698] Heinanen, J. and R. Guerin, "A Two Rate Three Color Marker", RFC 2698, DOI 10.17487/RFC2698, September 1999, . Choudhary Expires November 13, 2018 [Page 4] Internet-Draft QoS Telemetry Requirements May 2018 Author's Address Aseem Choudhary Cisco Systems 170 W. Tasman Drive San Jose, CA 95134 US Email: asechoud@cisco.com