Internet DRAFT - draft-chen-npm-use-cases

draft-chen-npm-use-cases







Network Working Group                                            H. Chen
Internet-Draft                                             China Telecom
Intended status: Informational                                     Z. Li
Expires: September 12, 2019                                 China Mobile
                                                                   F. Xu
                                                                 Tencent
                                                                   Y. Gu
                                                                   Z. Li
                                                                  Huawei
                                                          March 11, 2019


           Network-wide Protocol Monitoring (NPM): Use Cases
                      draft-chen-npm-use-cases-00

Abstract

   As networks continue to scale, we need a coordinated effort for
   diagnosing control plane health issues in heterogeneous environments.
   Traditionally, operators developed internal solutions to address the
   identification and remediation of control plane health issues, but as
   networks increase in size, speed and dynamicity, new methods and
   techniques will be required.

   This document highlights key network health issues, as well as
   network planning requirements, identified by leading network
   operators.  It also provides an overview of current art and
   techniques that are used, but highlights key deficiencies and areas
   for improvement.

   This document proposes a unified management framework for
   coordinating diagnostics of control plane problems and optimization
   of network design.  Furthermore, it outlines requirements for
   collecting, storing and analyzing control plane data, to minimise or
   negate control plane problems that may significantly affect overall
   network performance and to optimize path/peering/policy planning for
   meeting application-specific demands.

Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].








Chen, et al.           Expires September 12, 2019               [Page 1]

Internet-Draft                NPM Use Cases                   March 2019


Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on September 12, 2019.

Copyright Notice

   Copyright (c) 2019 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
     1.1.  Role of Telemetry . . . . . . . . . . . . . . . . . . . .   3
     1.2.  Role of Control Plane Telemetry . . . . . . . . . . . . .   4
   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   5
   3.  Problem Statement . . . . . . . . . . . . . . . . . . . . . .   5
     3.1.  Network Troubleshooting Challenges  . . . . . . . . . . .   5
     3.2.  Network Planning Challenges . . . . . . . . . . . . . . .   7
   4.  Network-wide Protocol Monitoring (NPM)  . . . . . . . . . . .   7
   5.  NPM Use Cases . . . . . . . . . . . . . . . . . . . . . . . .   9
     5.1.  Network Troubleshooting Use Cases . . . . . . . . . . . .   9
       5.1.1.  IS-IS Route Flapping  . . . . . . . . . . . . . . . .  10
       5.1.2.  LSDB Synchronization Failure  . . . . . . . . . . . .  11
       5.1.3.  Route Loop  . . . . . . . . . . . . . . . . . . . . .  11
       5.1.4.  Tunnel Set Up Failure . . . . . . . . . . . . . . . .  12



Chen, et al.           Expires September 12, 2019               [Page 2]

Internet-Draft                NPM Use Cases                   March 2019


     5.2.  Network Planning Use Cases  . . . . . . . . . . . . . . .  12
       5.2.1.  Route Policy Validation . . . . . . . . . . . . . . .  12
   6.  Security Considerations . . . . . . . . . . . . . . . . . . .  13
   7.  Contributors  . . . . . . . . . . . . . . . . . . . . . . . .  13
   8.  Acknowledgments . . . . . . . . . . . . . . . . . . . . . . .  13
   9.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  13
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  16

1.  Introduction

   Recently, significant effort has been made to evolve control network
   resources, using management plane enhancements and control of network
   state via centralized and distributed control plane methods.  There
   is ongoing effort in the diagnosing of forwarding plane performance
   degradation, using telemetry-based solutions and in-band data plane
   OAM.  However, less emphasis has been applied on the diagnosing and
   remediation of health problems related to optimal control of network
   resources, and diagnosing control plane health issues.

   The document outlines the existing set of standards-based tools and
   highlights the lack of capability for addressing control plane
   monitoring.

1.1.  Role of Telemetry

   The concept of network telemetry has been proposed to meet the
   current and future OAM demands, supporting real-time data collection,
   process, exportation, and analysis, and an architectural framework of
   existing Telemetry approaches is introduced in [I-D.song-ntf]
   [I-D.song-ntf].  Network telemetry provides visibility to the network
   health conditions, and is beneficial for faster network
   troubleshooting, network OpEx (operating expenditure) reduction, and
   network optimization.  Telemetry can be applied to the data plane,
   control plane and management plane.  There have been various methods
   proposed for each plane:

   o  Management Plane Telemetry: The management plane telemetry focuses
      on network operational state retrieval and configuration
      management.  SNMP (Simple Network Management Protocol) [RFC1157],
      NETCONF (Network Configuration Protocol) [RFC6241] and gNMI (gRPC
      Network Management Interface) [I-D.openconfig-rtgwg-gnmi-spec] are
      three widely adopted management plane Telemetry approaches.  Data
      consumers can subscribe to specific data stores through SNMP/gRPC/
      NETCONF.

   o  Control Plane Telemetry: The control plane telemetry works on
      routing protocol monitoring and routing related data retrieval,
      e.g., topology, route policy, RIB and so on.  BGP monitoring



Chen, et al.           Expires September 12, 2019               [Page 3]

Internet-Draft                NPM Use Cases                   March 2019


      protocol (BMP) [RFC7854] is proposed to monitor BGP sessions and
      intended to provide a convenient interface for obtaining BGP route
      views.  Date collected using BMP can be further analyzed with big
      data platforms for network health condition visualization,
      diagnose and prediction applications.

   o  Data Plane Telemetry: The data plane telemetry works on traffic
      performance measurement and traffic related data retrieval, e.g.,
      lattency, jitter, buffer size and so on.  For example, In-situ OAM
      (iOAM) [I-D.brockners-inband-oam-requirements] embeds an
      instruction header to the user data packets, and collects the
      requested data and adds it to the user packet at each network node
      along the forwarding path.  Applications such as path
      verification, SLA (service-level agreement) assurance can be
      enabled with iOAM.

1.2.  Role of Control Plane Telemetry

   The above mentioned telemetry approaches may vary in data type and
   form, including: encapsulation, serialization,
   transportation,subscription, and data analysis, thus resulting in
   various applications.  With the network operations and maintenance
   evolving towards automation and intent-driven, higher requirements
   are set for each plane.  Healthy management plane and control plane
   are essential for high-quality data service provisioning.  The
   visibility of management and control planes' healthiness provides
   insights for changes in the data plane.

   First of all, the running of control protocols aims to provide and
   guarantee the network connectivity and reachability, which is the
   foundation of any data service running above it.  The monitoring of
   the control plane detects the healthiness issue in real time so that
   immediate troubleshooting actions can be taken, and thus mitigating
   the affect on data services as much as possible.

   Secondly, without route analytics, the dynamic nature of IP
   networking makes it virtually impossible to know at any time point
   how traffic is traversing the networks.  For example, by collecting
   real-time BGP routes through BMP and correlating them with traffic
   data retrieved through data plane telemetry, the operator is able to
   provide both inter-domain and intra-domain traffic optimization.

   Finally, the validation and evaluation of route policies is another
   common appeal from both carriers and OTTs.  The difficulty here
   majorly lies in the precise definition of the correctness of
   policies.  In other words, the policy validation depends largely on
   the operator's understanding and manual judgement of the current
   network status instead of formatted and quantitive command executed



Chen, et al.           Expires September 12, 2019               [Page 4]

Internet-Draft                NPM Use Cases                   March 2019


   at devices.  Thus, it demands visualized presentations of how the
   policies impact the route changes through control plane telemtry so
   that operators may have direct judgement of the policy correctness.
   The conventional separated data collections of route policy and route
   information is not sufficient for the correctness validation of route
   policy.

   Based on discussions with leading operators, this document identifies
   the challenges and problems that the current control plane telemetry
   faces and suggests the data collection requirements.  The necessity
   for a Network-wide Protocol Monitoring (NPM) framework is illustrated
   and conducted through the discussion of specific use cases.

2.  Terminology

   IGP: Interior Gateway Protocol

   IS-IS: Intermediate System to Intermediate System

   BGP: Boarder Gateway Protocol

   BGP-LS: Boarder Gateway Protocol-Link State

   MPLS: Multi-Protocol Label Switching

   RSVP-TE: Resource Reservation Protocol-Traffic Engineering

   LDP: Label Distribution Protocol

   NPM: Network-wide Protocol Monitoring

   NPMS: Network Protocol Monitoring System

   BMP: BGP Monitoring Protocol

   LSP: Link State Packet

   SDN: Software Defined Network

   IPFIX: Internet Protocol Flow Information Export

3.  Problem Statement

3.1.  Network Troubleshooting Challenges

   According to Huawei 2016 network issue statistics, about 48% issues
   of the total amount are routing protocol-related, including protocol
   adjacency/peer set up failure, adjacency/peer flapping, protocol-



Chen, et al.           Expires September 12, 2019               [Page 5]

Internet-Draft                NPM Use Cases                   March 2019


   related table error.  What's more, the routing protocol issues are
   not standalone, which simultaneously come with anomaly status in data
   plane, and are finally reflected on poor service quality and user
   experience.

   Existing methods for protocol troubleshooting include CLI, SNMP,
   Netconf-YANG/gRPC-YANG and vendor-specific/third party tools.

   Using CLI to do per-device check provides adequate per device
   information, but lacks network-wide vision, thus leading to either
   massive labor/time consumption checking all devices or fail to
   localize the source.  Besides, complex CLI usage (combination and
   repeat pattern) requires experience from the NOC person.

   Management protocols, like SNMP, Netconf/gRPC, provide information
   already/to be gathered from the network, which reduces operational
   complexity, but sacrifices data adequacy compared with CLI.  Since
   the above protocols aren't designed specifically for routing
   troubleshooting, not all the data source required is currently
   supported for exportation, and the lack of certain data becomes the
   troubleshooting bottleneck.  For example, in an LSP purge abnormal
   case caused by continuous corrupted LSP, it's useful to collect the
   corrupted LSP PDUs for root cause analysis.  In addition, for the
   currently supported, as well as to be supported, data source
   collection, the data synchronization issue, due to export performance
   difference of various approaches, can be a concern for data
   correlation.  The data collection requirements depend largely on the
   use cases, and more details are discussed in Section 5.

   Some third party OAM tools provide troubleshooting-customized
   information collection and analysis.  For example, Packet Design uses
   passive listening to collect IS-IS/OSPF/BGP messages to do route
   analysis for troubleshooting and path optimization.  Such passive
   listening lacks per-device information collection.  For example, to
   detect the existence of a route loop and analyze the root cause, it
   not only requires the network-wide RIB/FIB collection, but also
   requires the route policy information that is responsible for the
   generation of loop issue.

   To summarize here, the currently protocols and tools do not provide
   sufficient data source for routing troubleshooting.  There requires
   new methods or augmented work to existing methods to enhance the
   control plane data collection and to support more efficient data
   correlation.







Chen, et al.           Expires September 12, 2019               [Page 6]

Internet-Draft                NPM Use Cases                   March 2019


3.2.  Network Planning Challenges

   The dynamic nature of IP networks, e.g., peer up/down, prefix
   advertisement, route change, and so on, has great influence on the
   service provisioning.  With the emerging of new network services,
   such as automated driving systems, AR (Augmented Reality), and so on,
   network planning is facing new requirements in order to meet the
   latency, bandwidth and security demands.  The requirements can
   generally break into two perspectives: 1. sufficient and up-to-date
   routing data collection as the input for network simulation; 2.
   accurate what-if simulation to evaluate new network planning actions.

   Most existing control plane and data plane simulation tools, e.g.,
   Batfish [Batfish], use device configurations to generate a control/
   data plane.  There exists some concerns w.r.t. such simulation
   method: 1. in a multi-vendor network understanding and translating
   the configuration files is a non-trivial task for the simulator; 2.
   the generated control/control plane is not the 100% mirroring of the
   actual network, and thus resulting in less accurate simulation
   results.  Thus, it requires real-time routing data collection from
   the on-going network.  Currently, BGP routes and peering states are
   monitored in real-time by using BMP.  However, IS-IS/OSPF/MPLS
   routing data still lacks legitimate and comprehensive monitoring.
   Here, not only the data coverage, including RIB/FIB, network
   topology, peering states and so on, but also the data synchronization
   of various devices should be considered in order to recover a
   faithful data/control plane within the simulator.

4.  Network-wide Protocol Monitoring (NPM)

   With the above mentioned challenges facing the control plane
   telemetry, it is of great value to identify the requirements from
   typical use cases, and the gaps between the requirements and existing
   methods.  It is thus necessary to propose a comprehensive control
   plane telemetry framework, as shown in Figure 1.
















Chen, et al.           Expires September 12, 2019               [Page 7]

Internet-Draft                NPM Use Cases                   March 2019


                              +-------------+
                       +----->+  NPM Server +<-----+
                       |      +------+------+      |
                       |             ^             |
                       |             +             |
                       |   BMP,gRPC,Netconf,       |
                       |   BGP+LS,new|protocol?:   |
                       |   topology,protocol PDU,  |
                       |   RIB,route policy,       |
                       |   statistics...           |
                   ****|*************|*************|*****
                   *   |             |             | AS0*
                   *   |          +--+--+          |    *
                   *   |  +...+-->+ R 3 +<--+...+  |    *
                   *   |  |       +-----+       |  |    *
                   *   |  |                     |  |    *
                   *   |  +                     |  |    *
                   *   | ISIS/OSPF/BGP/         |  |    *
                   *   | MPLS/SR...             |  |    *
                   *   |  +                     +  |    *
                   *   |  |          ISIS/OSPF/BGP/|    *
                   *   |  |          MPLS/SR... +  |    *
                   *   |  |                     |  |    *
           BGP/MPLS*   |  v    ISIS/OSPF/BGP/   v  |    *BGP/MPLS
   +-----+ /SR     *  ++--+-+  MPLS/SR...     +-+--++   */SR     +-----+
   | AS1 +<---------->+ R 1 +<-----+...+----->+ R 2 +<---------->+ AS2 |
   +-----+         *  +-----+                 +-----+   *        +-----+
                   *                                    *
                   **************************************

                          Figure 1: NPM framework

   Under the NPM framework, the challenges, use cases, requirements,
   gaps, and solutions options are to be identified and discussed.  The
   NPM problem space is depicted in Figure 2.  Two general requirements
   are concluded from the challenges discussed above.

   o  The requirement of a "tunnel" for the control plane data export:
      There should be a way (or ways) of exporting the required control
      plane data, and the export performance (e.g., data modeling,
      encoding and transmission) should be able to meet per application
      requirements;

   o  The requirement of adequate data collection: In order to support
      specific troubleshooting and planning use cases, the collected
      data coverage, including the data type coverage and the network
      coverage, should be adequate.  The data type coverage refers to
      data such as protocol PDUs, RIBs, policy and so on, and the



Chen, et al.           Expires September 12, 2019               [Page 8]

Internet-Draft                NPM Use Cases                   March 2019


      network type coverage refers to the devices providing such
      information.

   More specific requirements may vary case by case, but it is a common
   appeal to guarantee a valid tunnel and adequate data collection.

          Data Source:
        Topology,protocol PDU,    NPM problem space:
        RIB,route policy,         suffcient data type coverage,
        statistics...             sufficient device coverage
        +--------+----------+
                 |
                 v
   +-------------+-------------+
   |     Data Generation:      |  NPM problem space:
   |    data encapsulation,    |  data model definition,
   |    data serialization,    |  data process efficiency
   |    data subscription      |
   +-------------+-------------+
                 |
                 v
   +-------------+-------------+
   |    Data Transportation:   |  NPM problem space:
   |                           |  Transportation protocol
   |    BMP, gRPC, Netconf,    |  selection,
   |    BGP-LS, new protocol?  |  exportation efficiency
   +-------------+-------------+
                 |
                 v
   +-------------+-------------+
   |       Data Analysis:      |
   | Protocol troubleshooting, |  NPM problem space:
   |     Policy validation,    |  data synchronization,
   |   Traffic optimization,   |  data parse efficiency
   |     What-if simulation    |
   +---------------------------+

             Figure 2: NPM problem space

5.  NPM Use Cases

5.1.  Network Troubleshooting Use Cases

   We have identified several typical routing issues that occur
   frequently in the network, and are typically hard to localize.






Chen, et al.           Expires September 12, 2019               [Page 9]

Internet-Draft                NPM Use Cases                   March 2019


5.1.1.  IS-IS Route Flapping

   The IS-IS Route Flapping refers to the situation that one or more
   routes appear and then disappear in the routing table repeatedly.
   Route flapping usually comes with massive PDUs interactions (e.g.,
   LSP, LSP purge...), which consume excessive network bandwidth, and
   excessive CPU processing.  In addition, the impact is often network-
   wide.  The localizing of the flapping source and the identifying of
   root causes haven't been easy work due to various reasons.

   The flapping can be caused by system ID conflict, IS-IS neighborship
   flapping, route source flapping (caused by import route policy
   misconfiguration) , device clock dis-function with abnormal LSP purge
   (e.g., 100 times faster) and so on.

   o  The system ID conflict check is a network-wide work.  If such
      information is collected centrally to a controller/server, the
      issues can be identified in seconds, and more importantly, in
      advance of the actual flapping event.

   o  The IS-IS neighborship flapping is typically caused by interface
      flapping, BFD flapping, CPU high and so on.  Conventionally, to
      located the issue, operators typically identify the target
      device(s), and then log in the devices to check related
      statistics, parsed protocol PDU data and configurations.  The
      manual check often requires a combination of multiple CLIs (check
      cost/next hop/exit interface/LSP age...) in a repeated manner,
      which is time-consuming and requires rich OAM experience.  If such
      statistics and configuration data were collected at the server in
      real-time, the server may analyze them automatically or semi-
      automatically with troubleshooting algorithms implemented at the
      server.

   o  In the case that route policies are misconfigured, which then
      causes the route flapping, it's typically difficult to directly
      identify the responsible policy in a short time.  Thus, if the
      route change history is recorded in correlation with the route
      policy, then with such record collected at the server, the server
      can directly identify the responsible policy with the one-to-one
      mapping between policy processing and the route attribute change.

   o  In the case that flapping comes with abnormal LSP purges, it may
      be due to continuous LSP corruptions with falsified shorter
      Remaining Lifetime, or the clock running 100 times faster with 100
      times more purge LSPs generated.  In order to identify the purge
      originator, RFC 6232 [RFC6232] proposes to carry the Purge
      Orginator Identification (POI) TLV in IS-IS.  However, to analyze




Chen, et al.           Expires September 12, 2019              [Page 10]

Internet-Draft                NPM Use Cases                   March 2019


      the root cause of such abnormal purges, the collection and
      analysis of LSP PDUs are needed.

5.1.2.  LSDB Synchronization Failure

   During the IS-IS flooding, sometimes the LSP synchronization failure
   happens.  The synchronization failure causes can be generally
   classified into three cases:

   o  Case 1, the LSP is not correctly advertised.  For example, an LSP
      sent by Router A fails to be synchronized at Router B.  It can be
      due to incorrect route export policy, or too many prefixes being
      advertised which exceeds the LSP/MTU threshold, and so on at
      Router A.

   o  Case 2, LSP transmission error, which is tyically caused by IS-IS
      adjacency failure, .e.g., link down/BFD down/authentication
      failure.

   o  Case 3, the LSP is received but not correctly processed.  The
      problem that happens at Router B can be faulty route import
      policy, or Router B being in Overload mode, or the hardware/
      software bugs.

   With sufficient ISIS PDU related statistics and parsed PDU
   information recorded at the device, the neighborship failure in Case
   2 can be typically diagnosed at Router A or Router B independently.
   With such diagnosing information collected (e.g., in the format of
   reason code) in real-time, the server can identify the root
   synchronization issue with much less time and labor consumption
   compared with conventional methods.  In Case 1 & 3, the failure is
   mostly caused by incorrect route policy and software/hardware issue.
   By comparing the LSDB with the sent/received LSP, differences can be
   recognized.  Then the difference may further guide the localization
   of the root cause.  Thus, by collecting the LSDBs and sent/received
   LSPs from the two affected neighbors, the server can have more
   insights at the synchronization failure.

5.1.3.  Route Loop

   Incorrect import policy, such as incorrect protocol priority
   (distance) or improper default route configuration, may result in a
   route loop.  TTL anomaly report or packet loss complain triggers loop
   alarm.  However, locating the exact device(s) and more importantly
   the responsible configuration/policy is definitely non-trivial work.
   The generation of routing information base/forwarding information
   base (RIB/FIB) is related to various protocols and massive route




Chen, et al.           Expires September 12, 2019              [Page 11]

Internet-Draft                NPM Use Cases                   March 2019


   policies, which often makes it hard to locate the loop source in a
   timely manner.

   If the network-wide RIB/FIB data can be collected in real-time, the
   server is able to run loop detection algorithms to detect and locate
   the loop.  More importantly, with real-time RIB/FIB collected as the
   input for network simulator, loop can be predicted with what-if
   simulations of network changes, such as new policy, or link failure.

5.1.4.  Tunnel Set Up Failure

   The MPLS label switch path set up, either using RSVP-TE or LDP, may
   fail due to various reasons.  Typical troubleshooting procedures are
   to log in the device, and then check if the failure lies on the
   configuration, or path computation error, or link failure.
   Sometimes, it requires the check of multiple devices along the
   tunnel.  Certain reason codes can be carried in the Path-Err/ResvErr
   messages of RSVP-TE, while other data are currently not supported to
   be transmitted to the path ingress/egress node, such as the
   authentication failure.  In this case, if the tunnel configurations
   of devices along the tunnel, as well as the link states, and other
   reasons diagnosed by each device can be collected centrally, the
   server is able to do a thorough analysis and find the root cause.

5.2.  Network Planning Use Cases

   Monitoring and analyzing the network routing events not only help
   identify the root causes of network issues, but also provide
   visibility of how routing changes affect network traffic.  With the
   benefit of data plane telemetry, such as iOAM and IPFIX, network
   traffic matrices can be generated to give a glance of the current
   network performance.  More specifically, traffic matrices visualize
   the current and historical network changes, such as link utilization,
   link delay, jitter, and so on.  While traffic matrices provide "what"
   are the network changes, the control plane event monitoring, such as
   adjacency/peering failure, route flapping, prefix advertize/withdraw,
   provides "why".

5.2.1.  Route Policy Validation

   Route policy validation has been a great concern for operators when
   implementing new policies as well as optimizing existing pollicies.
   Validation comes in two perspectives:

   o  Firstly, there requires valid monitoring of implemented policy
      correlated with network changes to understand how one policy
      impacts routing in both single-device and network-wide views.
      Conventionally, policy/configuration data collection (e.g.,



Chen, et al.           Expires September 12, 2019              [Page 12]

Internet-Draft                NPM Use Cases                   March 2019


      through Netconf/YANG) is separate from route information
      collection (e.g., BMP), which lacks correlation between policy and
      routes.  Thus, even with both information at hand, it is still
      difficult for the operator to figure out how a policy impacts the
      route change.  If the route change is recorded correlated with
      policy processing, the server can directly identify the impact
      through the correlation analysis of such data collected from all
      devices.

   o  Secondly, there requires pre-check of policy impact using
      simulation tools.  Most existing simulation tools use device
      configurations to generate a control plane/data plane, and then
      run what-if simulations to evaluate a new policy.  However, there
      exists difference between the on-going network and the generated
      control/data plane, and thus leading the simulation results less
      effective.  If the control/data plane snapshot (e.g., topology,
      protocol neighbor state, RIB...) of the on going network is
      realized and taken as the input of the simulation, the reliability
      of the evaluation can be greatly improved.

6.  Security Considerations

   TBD

7.  Contributors

   TBD

8.  Acknowledgments

   TBD

9.  References

   [Batfish]  etc., A. F., "A General Approach to Network Configuration
              Analysis", May 2015.

   [I-D.brockners-inband-oam-requirements]
              Brockners, F., Bhandari, S., Dara, S., Pignataro, C.,
              Gredler, H., Leddy, J., Youell, S., Mozes, D., Mizrahi,
              T., Lapukhov, P., and r. remy@barefootnetworks.com,
              "Requirements for In-situ OAM", draft-brockners-inband-
              oam-requirements-03 (work in progress), March 2017.








Chen, et al.           Expires September 12, 2019              [Page 13]

Internet-Draft                NPM Use Cases                   March 2019


   [I-D.ietf-grow-bmp-adj-rib-out]
              Evens, T., Bayraktar, S., Lucente, P., Mi, K., and S.
              Zhuang, "Support for Adj-RIB-Out in BGP Monitoring
              Protocol (BMP)", draft-ietf-grow-bmp-adj-rib-out-03 (work
              in progress), December 2018.

   [I-D.ietf-grow-bmp-local-rib]
              Evens, T., Bayraktar, S., Bhardwaj, M., and P. Lucente,
              "Support for Local RIB in BGP Monitoring Protocol (BMP)",
              draft-ietf-grow-bmp-local-rib-02 (work in progress),
              September 2018.

   [I-D.ietf-netconf-yang-push]
              Clemm, A., Voit, E., Prieto, A., Tripathy, A., Nilsen-
              Nygaard, E., Bierman, A., and B. Lengyel, "Subscription to
              YANG Datastores", draft-ietf-netconf-yang-push-22 (work in
              progress), February 2019.

   [I-D.openconfig-rtgwg-gnmi-spec]
              Shakir, R., Shaikh, A., Borman, P., Hines, M., Lebsack,
              C., and C. Morrow, "gRPC Network Management Interface
              (gNMI)", draft-openconfig-rtgwg-gnmi-spec-01 (work in
              progress), March 2018.

   [I-D.song-ntf]
              Song, H., Zhou, T., Li, Z., Fioccola, G., Li, Z.,
              Martinez-Julia, P., Ciavaglia, L., and A. Wang, "Toward a
              Network Telemetry Framework", draft-song-ntf-02 (work in
              progress), July 2018.

   [RFC1157]  Case, J., Fedor, M., Schoffstall, M., and J. Davin,
              "Simple Network Management Protocol (SNMP)", RFC 1157,
              DOI 10.17487/RFC1157, May 1990,
              <https://www.rfc-editor.org/info/rfc1157>.

   [RFC1191]  Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
              DOI 10.17487/RFC1191, November 1990,
              <https://www.rfc-editor.org/info/rfc1191>.

   [RFC1195]  Callon, R., "Use of OSI IS-IS for routing in TCP/IP and
              dual environments", RFC 1195, DOI 10.17487/RFC1195,
              December 1990, <https://www.rfc-editor.org/info/rfc1195>.

   [RFC1213]  McCloghrie, K. and M. Rose, "Management Information Base
              for Network Management of TCP/IP-based internets: MIB-II",
              STD 17, RFC 1213, DOI 10.17487/RFC1213, March 1991,
              <https://www.rfc-editor.org/info/rfc1213>.




Chen, et al.           Expires September 12, 2019              [Page 14]

Internet-Draft                NPM Use Cases                   March 2019


   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC3209]  Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V.,
              and G. Swallow, "RSVP-TE: Extensions to RSVP for LSP
              Tunnels", RFC 3209, DOI 10.17487/RFC3209, December 2001,
              <https://www.rfc-editor.org/info/rfc3209>.

   [RFC3719]  Parker, J., Ed., "Recommendations for Interoperable
              Networks using Intermediate System to Intermediate System
              (IS-IS)", RFC 3719, DOI 10.17487/RFC3719, February 2004,
              <https://www.rfc-editor.org/info/rfc3719>.

   [RFC3988]  Black, B. and K. Kompella, "Maximum Transmission Unit
              Signalling Extensions for the Label Distribution
              Protocol", RFC 3988, DOI 10.17487/RFC3988, January 2005,
              <https://www.rfc-editor.org/info/rfc3988>.

   [RFC6232]  Wei, F., Qin, Y., Li, Z., Li, T., and J. Dong, "Purge
              Originator Identification TLV for IS-IS", RFC 6232,
              DOI 10.17487/RFC6232, May 2011,
              <https://www.rfc-editor.org/info/rfc6232>.

   [RFC6241]  Enns, R., Ed., Bjorklund, M., Ed., Schoenwaelder, J., Ed.,
              and A. Bierman, Ed., "Network Configuration Protocol
              (NETCONF)", RFC 6241, DOI 10.17487/RFC6241, June 2011,
              <https://www.rfc-editor.org/info/rfc6241>.

   [RFC7752]  Gredler, H., Ed., Medved, J., Previdi, S., Farrel, A., and
              S. Ray, "North-Bound Distribution of Link-State and
              Traffic Engineering (TE) Information Using BGP", RFC 7752,
              DOI 10.17487/RFC7752, March 2016,
              <https://www.rfc-editor.org/info/rfc7752>.

   [RFC7854]  Scudder, J., Ed., Fernando, R., and S. Stuart, "BGP
              Monitoring Protocol (BMP)", RFC 7854,
              DOI 10.17487/RFC7854, June 2016,
              <https://www.rfc-editor.org/info/rfc7854>.

   [RFC8231]  Crabbe, E., Minei, I., Medved, J., and R. Varga, "Path
              Computation Element Communication Protocol (PCEP)
              Extensions for Stateful PCE", RFC 8231,
              DOI 10.17487/RFC8231, September 2017,
              <https://www.rfc-editor.org/info/rfc8231>.





Chen, et al.           Expires September 12, 2019              [Page 15]

Internet-Draft                NPM Use Cases                   March 2019


Authors' Addresses

   Huanan Chen
   China Telecom
   109 West Zhongshan Ave
   Guangzhou
   China

   Email: chenhuanan@gsta.com


   Zhenqiang Li
   China Mobile
   No. 32 Xuanwumenxi Ave., Xicheng District
   Beijing
   China

   Email: lizhenqiang@chinamobile.com


   Feng Xu
   Tencent
   Guangzhou
   China

   Email: oliverxu@tencent.com


   Yunan Gu
   Huawei
   156 Beiqing Rd
   Beijing
   China

   Email: guyunan@huawei.com


   Zhenbin Li
   Huawei
   156 Beiqing Rd
   Beijing
   China

   Email: lizhenbin@huawei.com







Chen, et al.           Expires September 12, 2019              [Page 16]