Internet DRAFT - draft-he-netconf-adaptive-collection-usecases

draft-he-netconf-adaptive-collection-usecases







NETCONF Working Group                                              X. He
Internet-Draft                                                    X. Mao
Intended status: Standards Track                           China Telecom
Expires: 23 March 2023                                             Q. Ma
                                                                 T. Zhou
                                                                  Huawei
                                                       19 September 2022


  Problem Statement and Use Cases of Adaptive Traffic Data Collection
            draft-he-netconf-adaptive-collection-usecases-01

Abstract

   IP carrier network needs to provide real-time traffic visibility to
   help network operators quickly and accurately locate network
   congestion and packet loss, and make timely path adjustment for
   deterministic services in order to avoid congestion.  It is essential
   to explore the adaptive traffic data collection mechanism so as to
   capture real-time network state at minimum resource consumption.

   This document summarizes the problems currently faced by network
   operators when attempting to provide timely traffic data collection
   to satisfy various scenarios that require real-time network state and
   traffic visibility, and aggregates the requirements for adaptive
   traffic collecting mechanism from a variety of deployment scenarios.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 23 March 2023.

Copyright Notice

   Copyright (c) 2022 IETF Trust and the persons identified as the
   document authors.  All rights reserved.



He, et al.                Expires 23 March 2023                 [Page 1]

Internet-Draft      Adaptive Traffic Data Collection      September 2022


   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
     1.1.  Abbreviations . . . . . . . . . . . . . . . . . . . . . .   3
   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   4
   3.  Problem Statement . . . . . . . . . . . . . . . . . . . . . .   4
   4.  Scenarios of Adaptive Traffic data collection . . . . . . . .   6
     4.1.  Multi-dimensional real-time portrait of interface traffic
           characteristic  . . . . . . . . . . . . . . . . . . . . .   6
     4.2.  Microburst traffic detecting  . . . . . . . . . . . . . .   7
     4.3.  Congestion avoidance for deterministic services . . . . .   8
     4.4.  On-path telemetry based on adaptive traffic sampling  . .   8
   5.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   9
   6.  Security Considerations . . . . . . . . . . . . . . . . . . .   9
   7.  References  . . . . . . . . . . . . . . . . . . . . . . . . .   9
     7.1.  Normative References  . . . . . . . . . . . . . . . . . .   9
     7.2.  Informative References  . . . . . . . . . . . . . . . . .  10
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  10

1.  Introduction

   With the advent of cloud computing, big data and Artificial
   Intelligence (AI) , as well as the large-scale deployment of 5G
   mobile communication technology, a large number of ultra Reliable &
   Low Latency Communication (uRLLC) services such as Augmented Reality
   (AR)/Virtual Reality (VR), Industrial Internet and Computing Power
   Network (CPN) have emerged, which puts forward higher requirements
   for service quality of IP carrier networks.  IP carrier networks need
   to provide real-time traffic visibility to help network operators
   quickly and accurately locate network congestion and packet loss, and
   make timely path adjustment for the services of deterministic delay
   in order to avoid the congested nodes and links.  For such business
   scenarios, the network needs to provide traffic sampling interval of
   sub seconds or even milliseconds level so as to gain real-time
   network state.

   For decades, SNMP [RFC3416] and the Management Information Bases
   (MIBs) have been widely deployed and the de facto choice for many
   monitoring solutions, especially in collecting interface traffic.



He, et al.                Expires 23 March 2023                 [Page 2]

Internet-Draft      Adaptive Traffic Data Collection      September 2022


   Arguably the biggest shortcoming of SNMP for those applications
   concerns the need to rely on periodic polling, because it introduces
   an additional load on the network and devices, and it is brittle if
   polling cycles are missed.  Therefore, SNMP has no capability to
   realize real-time traffic sampling at sub seconds or even
   milliseconds intervals.  Telemetry, as a revolutionary data
   acquisition technique, based on pull mechanism that is able to
   deliver object changes as they happen, overcomes the limitations of
   SNMP such as "low sampling rate, inefficiency and more processing
   resources".  Nevertheless, for the sake of capturing real- time
   network state, persistent sampling of interface traffic at
   milliseconds intervals will generate a considerable amount of data
   which may claim too much transport bandwidth and overload the servers
   for data collection, storage and analysis.  Increasing the data
   handling capacity is technically feasible but expensive, and
   difficult to achieve large-scale deployment in operator's networks.
   It is essential to explore the adaptive traffic data collection
   mechanism so as to capture real-time network state at minimum
   resource consumption.

   This document summarizes the problems currently faced by network
   operators when attempting to provide timely traffic data collection
   to satisfy the aforementioned new services and applications that
   require real-time network state and traffic visibility.  Also, this
   document aggregates the requirements for adaptive traffic collection
   mechanism from a variety of deployment scenarios.

1.1.  Abbreviations

   AI:  Artificial Intelligence


   AR:  Augmented Reality


   VR:  Virtual Reality


   CPN:  Computing Power Network


   gNMI:  Google Network Management Interface


   IP RAN:  IP Radio Access Network


   DetNet:  Deterministic Networking



He, et al.                Expires 23 March 2023                 [Page 3]

Internet-Draft      Adaptive Traffic Data Collection      September 2022


   QoE:  Quality of Experience


   SLA:  Service Level Agreement


   uRLLC:  ultra Reliable & Low Latency Communication


   NMS:  Network Management System


   IDC:  Internet Data Center


   SNMP:  Simple Network Management Protocol


   MIB:  Management Information Base


2.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in BCP
   14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

   The following terms are defined in this document:

   adaptive traffic data collection:  Allow servers automatically switch
      to different telemetry sampling period to collect traffic data
      according to the threshold change.


3.  Problem Statement

   As is well known ,IP network, based on statistical multiplexing
   model, is of traffic burst characteristic.  In order to avoid
   congestion, network operators have been keeping network utilization
   at a rather low level.  For a long time, operators have obtained
   traffic visibility from the Network Management System (NMS), and
   satisfied with 30~40% bandwidth utilization from traffic statistics
   curves.  In spite of such a low link usage, many complaints have
   still been received about poor Quality of Experience (QoE) in
   delivering applications with the sensitivity of delay and packet
   loss.  The fundamental cause lies in that the observed average



He, et al.                Expires 23 March 2023                 [Page 4]

Internet-Draft      Adaptive Traffic Data Collection      September 2022


   network traffic at every sampling cycle masks the characteristic of
   traffic burst, given that SNMP is widely employed in operator's
   networks to collect interface traffic at 5 minutes intervals.
   Because of low sampling rate, SNMP has no capability to capture
   traffic burst characteristic.

   A large quantity of laboratory data and network operational data
   indicate that a microburst phenomenon occurs frequently in operator's
   carrier networks, such as IP based Radio Access Network (IP RAN), IP
   metropolitan network, IP backbone network and Internet Data Center
   (IDC).  The typical duration of such a microburst is tens to hundreds
   of milliseconds, easy to cause instantaneous congestion of interface
   output queue.  Network congestion amplifies queuing delay and jitter,
   and in severe cases, it may even cause packet loss.  Thus, the
   congestion caused by microburst is not beneficial to the
   deterministic-delay applications.  The congestion problem is a major
   challenge for IP networks, and the congestion caused by microburst is
   difficult to eliminate, but must be avoided.

   Although the mechanism of microburst is not very distinct, it does
   not hinder us to detect it.  Fortunately, Telemetry (e.g., YANG PUSH
   [RFC8639] [RFC8641], gNMI [gNMI]) has the capability to collect
   interface traffic at a higher frequency, i.e., milliseconds interval.
   So, by means of telemetry technique, we can capture the complete
   aspects of a microburst traffic.  However, it is impractical to gain
   the real-time traffic visibility at the cost of persistent sampling
   at millisecond intervals.  For example, in order to capture a
   microburst traffic of interface, at least 10-millisecond sampling
   cycle is necessary, and as a result, the required resources for data
   storage and analysis will increase by 30000 times, compared with the
   today's widely employed 5-minute sampling cycle based on SNMP.




















He, et al.                Expires 23 March 2023                 [Page 5]

Internet-Draft      Adaptive Traffic Data Collection      September 2022


   It is essential to investigate the adaptive traffic data collection
   mechanism so as to capture real-time network state at minimum
   resource consumption.  Generally speaking, under normal non-congested
   network conditions, which happen at the time of 95% above, minutes-
   level sampling cycle is enough because of almost invariable
   forwarding delay and less jitter of interface.  However, when
   detecting a congestion state or congestion trend, sampling period
   must be timely tuned to milliseconds to capture a microburst traffic
   of interface.  A congestion state or congestion trend of interface
   manifests itself in the form of packet loss due to queue overflow,
   queue depth beyond the threshold or too high link utilization, which
   can be defined as Event-triggered data.  Such event data can be
   actively pushed through subscription or passively polled through
   query.  Although the microburst phenomenon occurs frequently, it is
   transient and a real-time detection tool is preferable to pinpoint it
   timely.  The traditional method of using CPU on main control board
   through query is processing resources consuming, the network device
   must possess built-in hardware designed specially to monitor it.

   In order to reduce the excessive consumption of resources caused by
   milliseconds-level collection of the single data, batch data such as
   hundreds of sampled traffic data from an interface can be packaged as
   a telemetry packet and is sent to the collector.  The timestamp is
   required for every sampled traffic data for the convenience of the
   collector visualizing the interface traffic trend in the form of
   curve.  And the collector must make traffic visualization in real-
   time manner so that the operators can observe it immediately.

4.  Scenarios of Adaptive Traffic data collection

   This section presents several typical scenarios which require
   adaptive traffic data collection to gain real-time network state and
   traffic visibility at minimum resource consumption.

4.1.  Multi-dimensional real-time portrait of interface traffic
      characteristic

   Interface traffic data collection is one of the most important
   functions for NMS.  Today, more and more applications are of latency-
   sensitive and loss-sensitive characteristic, and the real-time
   traffic visibility can help operators better understand network
   performance so as to achieve SLA guarantees.  On the other hand,
   obtaining the holistic and genuine characteristic of interface
   traffic is also a basic requirement for the statistical multiplexing
   model of IP network, which is of great significance for traffic
   prediction, network planning, network capacity expansion, network
   optimization, etc.  For example, a higher long-term average
   utilization prompts need of capacity expansion; a higher ratio



He, et al.                Expires 23 March 2023                 [Page 6]

Internet-Draft      Adaptive Traffic Data Collection      September 2022


   between the peak and the average, as well as frequent microbursts
   detected, implies a intense traffic burst characteristic, suggesting
   the timely path adjustment for those key traffic flows of
   deterministic delay.  However, the traditional NMS based on SNMP has
   no capability to depict genuine characteristic of interface traffic,
   and interface traffic data collection based on telemetry techniques
   is preferable.

   It is essential to exploit the adaptive traffic data collection
   techniques to depict multi-dimensional real-time portrait of
   interface traffic characteristic at minimum resource consumption.
   That is to say, in normal non-congested network conditions, which
   happen at the time of 95% above, minutes-level sampling cycle is
   enough as it is.  But, while detecting a congestion state or
   congestion trend, sampling cycle must be timely tuned to milliseconds
   to capture a microburst traffic of interface.  Such an adaptive
   traffic data collection technique can not only reflect the coarse-
   grained interface traffic characteristics, but also capture the
   congestion state of interface with finer time granularity.  Because
   the traffic data collection with very high rate is seldom (i.e., only
   triggered by the detected microbursts), we can depict multi-
   dimensional real-time portrait of interface traffic characteristic at
   minimum resource consumption.  Because of the lower cost, it can be
   deployed on large-scale in operator's networks.

4.2.  Microburst traffic detecting

   Microburst traffic, as an instantaneous congestion phenomenon
   occurring frequently in IP carrier network, will cause critical delay
   jitter and even packet loss, which will seriously affect the QoE of
   latency-sensitive and loss-sensitive applications.  The ability of
   detecting microburst traffic of interface will help network operators
   quickly and accurately locate network congestion and packet loss, and
   make timely path adjustment for deterministic-delay services in order
   to avoid the congested nodes and links.  In order to have a
   comprehensive understanding of microburst, we must timely collect
   interface traffic as soon as it occurs.  For example, how often does
   it occur? and what duration does it last? only event data
   representing a microburst such as packet loss and queue length beyond
   threshold are not enough to describe its characteristic.

   Because the typical duration of such a microburst is generally tens
   to hundreds of milliseconds, at least 10-millisecond sampling cycle
   is necessary.  Although the microburst phenomenon occurs frequently,
   it takes very little time of 24 hours a day.  It is not a good
   approach to observe it through persistent millisecond sampling
   period.  Preferably, we can capture it as soon as a microburst occurs
   to ensure important diagnose data will not be missed.  Because it is



He, et al.                Expires 23 March 2023                 [Page 7]

Internet-Draft      Adaptive Traffic Data Collection      September 2022


   transient, and an on-line detection tool based on the dedicated
   hardware is required to pinpoint it timely.  Triggered by the events
   such as packet loss, queue depth beyond the threshold which is
   detected timely, sampling period must be timely tuned to milliseconds
   to capture a microburst traffic of interface.  In a word, it is of
   practical significance to explore the microburst detection technique
   aiming at minimizing resource consumption.

4.3.  Congestion avoidance for deterministic services

   Network congestion will rapidly increase queuing delay and jitter,
   and may even give rise to packet loss, which will seriously affect
   the QoE of delay-sensitive and packet loss-sensitive applications.
   The goal of network optimization is to reduce the occurrence of
   network congestion as much as possible.

   It is a complicated problem for network operators to accurately
   predict the trend of network congestion and make network adjustment
   in advance.  The real-time traffic visibility based on the adaptive
   traffic data collection techniques can accurately predict the long-
   term congestion, and quickly capture the instantaneous congestion
   (i.e., microburst) of interface.  By means of the real-time traffic
   visibility, the automatic optimization tool (e.g., AI) can make
   timely path adjustment for key traffic flows.  For example, based on
   the real-time traffic visibility and microburst events (e.g., packet
   loss, queue depth) collected, the controller can accurately predict
   the congestion trend of interface and make timely traffic redirection
   to the non-congested interface for deterministic delay applications.

4.4.  On-path telemetry based on adaptive traffic sampling

   On-path telemetry (e.g., IOAM [RFC9197]) is useful for application-
   aware networking operations.  For example, it is critical for the
   operators who offer high-bandwidth, latency and loss-sensitive
   services such as video streaming and online gaming to closely monitor
   the relevant flows in real-time as the basis for any further
   optimizations.  Applying on- path telemetry on all packets of the
   selected flows is resource consuming.  A sampling rate should be set
   for these flows and only enable telemetry on the sampled packets.
   However, a too high rate would exhaust the network resource and even
   cause packet drops; an overly low rate, on the contrary, would result
   in the loss of information and inaccuracy of measurements.

   An adaptive approach can be used based on the network conditions to
   dynamically adjust the sampling rate.  In normal network state, a low
   sampling rate is enough to reflect network performance (i.e., almost
   invariable forwarding delay and less jitter of interface) ; But, in
   the case of network congestion, the controller is aware of it from



He, et al.                Expires 23 March 2023                 [Page 8]

Internet-Draft      Adaptive Traffic Data Collection      September 2022


   the real- time traffic visibility and events data collected (e.g.,
   packet loss, queue depth), and timely adjust the packet sampling rate
   at very high level.  Even all packets of the selected flows are
   applicable to be sampled so as to acquire actual measurement data
   such as latency, jitter and packet loss.

   Similarly, such an adaptive approach can applicable to the
   traditional active measurement methods (e.g., a Two-Way Active
   Measurement Protocol (TWAMP)[RFC5357]), so as to improve measurement
   accuracy at minimal resource consumption.  In the case of normal non-
   congested conditions, the probing packets are send at longer
   intervals, But, in case of network congestion caused by microburst,
   the controller is aware of it from the real- time traffic visibility,
   and change the probing packets to the shorter intervals timely, which
   can capture the microburst traffic and therefore get real
   measurements of congestion state.

5.  IANA Considerations

   This document does not include an IANA request.

6.  Security Considerations

   This document provides an adaptive telemetry mechanism to minimize
   the resource consumption.  The increased complexity of network
   telemetry may give rise to some security concerns.  For example,
   persistent traffic collection at very high rate (e.g., at
   milliseconds intervals) induced by misconfiguration or spurious
   triggering might exhaust resources of network device as well as the
   collector; Also, an inappropriate threshold setting which trigger
   high sampling rate should be avoided.  Therefore, access control for
   enabling and disabling adaptive telemetry is required , also, rate
   control for collecting telemetry data is recommended so as to avoid
   degradation of network performance.

   On the other hand, for security considerations of telemetry
   management interface such as NETCONF or gNMI, it must provide
   authentication, data integrity,confidentiality, and replay
   protection.  The lowest NETCONF layer is the secure transport layer,
   and the mandatory-to-implement secure transport is Secure Shell (SSH)
   [RFC6242].  The lowest gNMI layer is HTTPS, and the mandatory-to-
   implement secure transport is TLS [RFC5246].  And further study of
   the security issues will be required.

7.  References

7.1.  Normative References




He, et al.                Expires 23 March 2023                 [Page 9]

Internet-Draft      Adaptive Traffic Data Collection      September 2022


   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC5246]  Dierks, T. and E. Rescorla, "The Transport Layer Security
              (TLS) Protocol Version 1.2", RFC 5246,
              DOI 10.17487/RFC5246, August 2008,
              <https://www.rfc-editor.org/info/rfc5246>.

   [RFC6242]  Wasserman, M., "Using the NETCONF Protocol over Secure
              Shell (SSH)", RFC 6242, DOI 10.17487/RFC6242, June 2011,
              <https://www.rfc-editor.org/info/rfc6242>.

   [RFC8639]  Voit, E., Clemm, A., Gonzalez Prieto, A., Nilsen-Nygaard,
              E., and A. Tripathy, "Subscription to YANG Notifications",
              RFC 8639, DOI 10.17487/RFC8639, September 2019,
              <https://www.rfc-editor.org/info/rfc8639>.

   [RFC8641]  Clemm, A. and E. Voit, "Subscription to YANG Notifications
              for Datastore Updates", RFC 8641, DOI 10.17487/RFC8641,
              September 2019, <https://www.rfc-editor.org/info/rfc8641>.

7.2.  Informative References

   [gNMI]     "https://github.com/openconfig/gnmi".

   [RFC3416]  Presuhn, R., Ed., "Version 2 of the Protocol Operations
              for the Simple Network Management Protocol (SNMP)",
              STD 62, RFC 3416, DOI 10.17487/RFC3416, December 2002,
              <https://www.rfc-editor.org/info/rfc3416>.

   [RFC5357]  Hedayat, K., Krzanowski, R., Morton, A., Yum, K., and J.
              Babiarz, "A Two-Way Active Measurement Protocol (TWAMP)",
              RFC 5357, DOI 10.17487/RFC5357, October 2008,
              <https://www.rfc-editor.org/info/rfc5357>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

   [RFC9197]  Brockners, F., Ed., Bhandari, S., Ed., and T. Mizrahi,
              Ed., "Data Fields for In Situ Operations, Administration,
              and Maintenance (IOAM)", RFC 9197, DOI 10.17487/RFC9197,
              May 2022, <https://www.rfc-editor.org/info/rfc9197>.

Authors' Addresses




He, et al.                Expires 23 March 2023                [Page 10]

Internet-Draft      Adaptive Traffic Data Collection      September 2022


   Xiaoming He
   China Telecom
   Email: hexm4@chinatelecom.cn


   Dongfeng Mao
   China Telecom
   Email: maodf@chinatelecom.cn


   Qiufang Ma
   Huawei
   101 Software Avenue, Yuhua District
   Nanjing
   Jiangsu, 210012
   China
   Email: maqiufang1@huawei.com


   Tianran Zhou
   Huawei
   Email: zhoutianran@huawei.com





























He, et al.                Expires 23 March 2023                [Page 11]