Internet DRAFT - draft-even-quic-troubleshooting-video-delivery

draft-even-quic-troubleshooting-video-delivery







Network Working Group                                            R. Even
Internet-Draft                                                  H. Zheng
Intended status: Informational                                    Huawei
Expires: March 16, 2018                                          L. Geng
                                                             ChinaMobile
                                                                R. Huang
                                                                  Huawei
                                                      September 12, 2017


   Passive Measurements in Network for troubleshooting Video Delivery
                                Problems
           draft-even-quic-troubleshooting-video-delivery-00

Abstract

   This document provides a detailed description of the passive
   measurements that operators are using to troubleshoot network
   problems when delivering streaming video and multimedia services.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on March 16, 2018.

Copyright Notice

   Copyright (c) 2017 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must



Even, et al.             Expires March 16, 2018                 [Page 1]

Internet-Draft       passive-measurements-in-network      September 2017


   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Passive Measurements for troubleshooting Video Delivery
       Problems  . . . . . . . . . . . . . . . . . . . . . . . . . .   3
     2.1.  Passive Measurements for TCP  . . . . . . . . . . . . . .   5
       2.1.1.  RTT Measurements  . . . . . . . . . . . . . . . . . .   5
       2.1.2.  Loss Measurements . . . . . . . . . . . . . . . . . .   6
     2.2.  Video Delivery Problems Troubleshooting . . . . . . . . .   7
       2.2.1.  Locating WIFI Problems in Home Network  . . . . . . .   7
       2.2.2.  Locating Network Devices Problems . . . . . . . . . .   8
       2.2.3.  Locating Server Side Problems . . . . . . . . . . . .   9
   3.  Conclusion  . . . . . . . . . . . . . . . . . . . . . . . . .   9
   4.  Security Considerations . . . . . . . . . . . . . . . . . . .  10
   5.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  10
   6.  Informative References  . . . . . . . . . . . . . . . . . . .  10
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  11

1.  Introduction

   Privacy protection has been a growing concern in IETF.  [RFC7258]
   says that Pervasive Monitoring (PM) is a technical attack that should
   be mitigated where possible through the design of protocols.  This
   call of [RFC7258] is answered by emerging protocols, for example,
   QUIC [I-D.ietf-quic-transport].  QUIC is a new transport protocol
   designed to be secure.  Once a connection is set up, packets
   exchanged by QUIC are largely encrypted during transmission; only a
   minimal piece of information in the protocol header is exposed.  The
   encryption protects QUIC packets from being tampered by network
   middleboxes.

   Driven by concerns on privacy, the Internet has been accelerating the
   shift from using plaintext traffic towards using encrypted traffic
   [I-D.mm-wg-effect-encrypt].  Google started to offer end-to-end
   encryption for Gmail in 2010 and for searches in 2013.  YouTube
   traffic has been carried via HTTPS (or QUIC) since 2014.  In
   addition, the Snowden revelations [RFC7624] seem to cause an upward
   surge in encrypted traffic.

   However, it is also documented in [RFC7258] that making networks
   unmanageable to mitigate PM is not an acceptable outcome.  The
   prevalence of encryption precludes operators from obtaining certain
   traffic information to do some applications service quality
   estimation; an example use case is that operators need ways to locate



Even, et al.             Expires March 16, 2018                 [Page 2]

Internet-Draft       passive-measurements-in-network      September 2017


   fault and perform diagnosis when users report a degradation in
   quality of service.

   For traditional transport protocols such as TCP, passive measurements
   are easy to perform, because TCP exposes protocol state information
   in the protocol header, even HTTPS or TCPinc is used.  The passive
   measurements on TCP traffic enables operators to manage and diagnoses
   TCP traffic in many ways, depending on the particular needs of a
   specific application.

   This draft aims to provide information on how operators use passive
   measurements derived from TCP header for video applications.  For
   example, how to monitor video service quality degradation, and
   trouble-shoot and locate problematic devices in the networks.  This
   information can be used as a reference to future transport protocols
   about how passive measurements can be useful.

2.  Passive Measurements for troubleshooting Video Delivery Problems

   Below is a blueprint of video streaming across networks:































Even, et al.             Expires March 16, 2018                 [Page 3]

Internet-Draft       passive-measurements-in-network      September 2017


                    ------------
                   /  +------+  \
                  /   |Video |   \
        Server   |    |Server|    |
        Network  |    +--v---+    |
                  \      |       /
                   \     |      /
                 +-------V--------+
                 | Server Gateway |
                 +-------V--------+
                   /     |      \
                  /      |       \
                 | +-----v------+ |
                 | |Core Router | |
        Operator | +-----v------+ |
        Network  |       |        |
                 | +-----V-------+|
                 | |Access Router||
                 | +-----V-------+|
                  \      |       /
                   \     |      /
                  +------V-------+
                  | Home Gateway |
                  +------V-------+
                   /     |      \
                  /      |       \
        Home     |    +--v---+    |
        Network  |    |Video |    |
                  \   |Player|   /
                   \  +------+  /
                    ------------

                 Figure 1: Video Streaming across Networks

   Video streaming relies on the network to transport its data.  When a
   user experiences a degradation in the quality of video service, the
   user may complain and report the degradation to the network operator.
   Such kind of report usually includes little useful information for
   the operator to identify what the problem is.  This due to video
   streaming is an end-to-end service; it is not easy to tell which path
   on the delivery chain goes wrong, given that the delivery chain can
   span multiple network providers' networks.  In Figure 1, the delivery
   of a video stream crosses three networks: server network, operator
   network and the end user's home network.  In this case, trouble-
   shooting the degradation is difficult and involves devices in the
   server network and the home network; both are out of operator's
   control.




Even, et al.             Expires March 16, 2018                 [Page 4]

Internet-Draft       passive-measurements-in-network      September 2017


   To aid operator on such kind of issues, operators need a way to
   monitor and measure the video streaming performance on various nodes
   in the network along the delivery path.  Operators may deploy probes
   on the network nodes (e.g. home gateway, access router, or core
   router) to measure and report information at flow level.  With such
   information at hand, it should be easier for operators to diagnosis
   service degradation.  Moreover, for operator to detect and take
   action about service degradation proactively.

   To summarize, passive measurements in the network help operators
   detect application problems at large.  Without it, operators may have
   to resort to traditional methods, to perform tests in the network and
   analyze the results.  It could be time consuming and off the scene,
   since not all problems are reproducible by tests.

2.1.  Passive Measurements for TCP

   Section 3 of [I-D.stephan-quic-interdomain-troubleshooting] also
   mentions these measurements.  Here, more detailed descriptions is
   given.  As indicated in Figure 2, TCP passive measurements require
   setting up a measurement point on the path of a TCP connection.  The
   measurement point virtually splits the path into halves.  The half
   close to the server is called "upstream"; the half close to the
   client is called "downstream".  The following sections are going to
   describe the methods for the measurement of Round Trip Time (RTT) and
   Loss in both upstream and downstream.  "Inbound" and "Outbound" are
   used to denote stream direction.  "Inbound" denotes the stream is
   toward server, whereas "Outbound" denotes the stream is toward
   client.

                             Measurement Point
        +--------+                   |                    +--------+
        | Server |<------------------|------------------->| client |
        +--------+      Upstream     |     Downstream     +--- ----+

                    Figure 2: Passive Measurement Point

   One caveat about passive measurement is it has no way to know the
   processing time at end points.  For example, if server or client adds
   some delay before sending a packet, the delay cannot be mitigated at
   the measurement point when calculating RTT.

2.1.1.  RTT Measurements

   TCP connection setup is a three-way handshake.  Usually the client
   initiates a connection to the server.  The signals "SYN -> SYN-ACK ->
   ACK" can be used to determine the initial upstream/downstream RTT.




Even, et al.             Expires March 16, 2018                 [Page 5]

Internet-Draft       passive-measurements-in-network      September 2017


   o  Initial Upstream RTT: The time difference between SYN and SYN-ACK

   o  Initial Downstream RTT: The time difference between SYN-ACK and
      ACK

   After the connection setup phase, the initial RTT should be updated
   by sequence number matching.  The built-in mechanism of TCP requires
   every segment to be acknowledged.  By matching the sequence number,
   it is possible to pair a segment to its corresponding acknowledgement
   at the Measurement Point.  The time difference between the segment
   and its acknowledgement can be a strong candidate for the RTT.
   However, this method is not without measurement error.  In the
   following situations measurement error can occur:

   o  Delayed ACK.  For good reasons, the TCP endpoint may decide to
      delay sending acknowledgement for a little while.  The measurement
      error contributed by delayed ACK can be up to 500 milliseconds,
      according to the statement in [RFC1122].

   o  Packet Loss.  Another source of measurement error is from packet
      loss.  A segment past the Measure Point can still be lost on the
      way to its destination.  An acknowledgement can be lost before
      arriving the Measurement Point.  There are times a segment cannot
      be matched to its corresponding acknowledgement, but to a latter
      one, thus contributing to measurement error.

   Note that bidirectional streams are required to measure both
   downstream and upstream RTT when using sequence number matching.
   Unidirectional stream from server to client yields downstream RTT.
   For upstream RTT, unidirectional stream from client to server is
   required.

   An alternative method of measuring RTT is described in Section 4 of
   [RFC7323], which utilize the TCP Timestamps option.  The method
   results in less measurement error than sequence number matching

2.1.2.  Loss Measurements

   TCP uses sliding window at both endpoints to coordinate data
   transmission.  Sending endpoint utilizes send window to control how
   many data it can send; receiving endpoint utilizes receive window as
   a buffering mechanism for incoming data and to report window size
   update.  The sliding window mechanism exchanges information by using
   fields and options of TCP header, thus it is visible to the network.
   Such information can be obtained at the Measurement Point, and the
   following loss measurements can be performed:

   o  Downstream Loss Rate Measurement.



Even, et al.             Expires March 16, 2018                 [Page 6]

Internet-Draft       passive-measurements-in-network      September 2017


   o  Upstream Loss Rate Measurement.

   Downstream Loss Rate can be measure by monitoring outbound streams at
   the Measurement Point.  From the sequence number exposed in TCP
   header, two values can be calculated: total amount of original data,
   and total amount of retransmitted data.  Total amount of application
   data represents the number of bytes application wants to send.  Total
   amount of retransmitted data represents the number of bytes that have
   been previously received at the Measurement Point.  Downstream Loss
   Rate is calculated as:

   Total Amount of Retransmitted Data / Total Amount of Application Data

   For Upstream Loss Rate, monitoring outbound streams can only give
   estimates.  This is due to difficult in counting the amount of data
   that is lost in the upstream before arriving to the Measurement
   Point.  Data loss in the upstream causes the Measurement Point seeing
   "holes" in received sequence numbers.  The amount of data represented
   by the "holes" can be used as an estimate for upstream data loss.
   However, to make more practical estimate of loss, two issues need to
   be considered.  A) out-of-order packets can as well cause "holes", so
   the measurement should also account for out-of-order arrival.  B) If
   the segment with newer sequence number than that is recorded at the
   Measurement Point, there is no way to tell such loss at the
   Measurement Point.

   The situation is reversed when monitoring inbound streams instead of
   outbound streams.  In this case, Upstream Loss Rate can be measured
   more precisely and Downstream Loss Rate can only be estimated.

2.2.  Video Delivery Problems Troubleshooting

   This section describes how the TCP passive measurements are used for
   troubleshooting the video delivery problems.  As depicted in
   Figure 1, three network segments are concerned: home network,
   operator's network and server network.  The following subsections
   address problems regarding each of the network segments.

2.2.1.  Locating WIFI Problems in Home Network

   It is common that WIFI is used in home network to share internet
   access wirelessly.  This functionality brings mobility to people when
   accessing internet at home.  However, it comes at a cost when
   wireless access performances is worse than wired access, since
   wireless signal suffers more from varying environmental conditions.
   Wireless access inherently incurs more packet loss and often results
   in large delay.  Performance of network applications is often
   degraded in wireless network.



Even, et al.             Expires March 16, 2018                 [Page 7]

Internet-Draft       passive-measurements-in-network      September 2017


   When network application performance degrades, WIFI is often blamed.
   It is desirable for operators to know how much WIFI has contributed
   to the degradation.  Some passive measurement methods are needed to
   help visualize the problem.  One method is to profile the RTT in the
   home network.  High RTT values may be seen for home networks that use
   WIFI.

   One important reason that WIFI causes high RTT values is that WIFI
   retransmits lost frames in its Medium Access Control (MAC) layer, in
   order to alleviate high loss induced by poor wireless conditions.
   Due to the trade-off at MAC layer, WIFI traffic often has the trait
   of high delay and relatively low packet loss rate.  This trait makes
   traffic over WIFI more distinguishable from traffic other carriers.

   To profile the RTT in the home network, the Measurement Point should
   be set at the home gateway if it is controlled by the operator.
   Otherwise, the Measurement Point has to be deployed one level above
   the home gateway in the access network, usually the next hop IP
   address from the home gateway.  In this case, the Measurement Point
   is distant from the home network it measures.  Congestion in the link
   between the Measurement Point and the home network can affect the
   test result.  The Measurement Point must not account for RTTs
   affected by congestion in the link.  When congestion occurs, the loss
   and delay both increases, making it distinguishable from ordinary
   WIFI traffic, which is high delay but low loss.

   Passive measurement on TCP traffic is crucial to the RTT profiling
   method introduced above, since TCP traffic is the major constitution
   of all traffic on the Internet.  It is a viable source to collect
   downstream RTT from TCP traffic.

2.2.2.  Locating Network Devices Problems

   Sometimes application performance degradation is caused by problems
   in the network.  One faulty or misconfigured node in the network may
   cause unusual packet loss or unnecessary delay for packets.  When
   this happens, it is often difficult for operators to locate the
   faulty or misconfigured node, due to the complex architecture of
   network.  Operators have to find out whether the problem exists in
   the access level, or the aggregation level, or even in the core.

   To help locate the problem, it is useful to identify which network
   segment causes it.  For that, passive measurement can serve as a
   vital means for the problem demarcation between network segments.
   Probes can be deployed on suitable nodes along the whole network
   path, as indicated in Figure 3.





Even, et al.             Expires March 16, 2018                 [Page 8]

Internet-Draft       passive-measurements-in-network      September 2017


                   +--------------------+
                   | Measurement Center |
                   +--------------------+
                      \               \
                       \ Probe         \ Probe
                        \               \
      +-------+         ++              ++         +-------+
      |Network| Access  || Aggregation  || Core    |Network|
      |Ingress|---------++--------------++---------|Egress |
      |Node   | Network || Network      || Network |Node   |
      +-------+         ++              ++         +-------+

                 Figure 3: Probes Deployed on Network Path

   The purpose of those probes deployed on the network path is to
   measure TCP traffic passively and report the collected downstream/
   upstream RTT and Loss information to the Measurement Center.  Then it
   is possible for the Measurement Center to build a normal baseline of
   the characteristics of the network segments.  If a network node turns
   faulty or misconfigured, its behavior will deviate from the normal
   baseline, thus be detected by the Measurement Center.  This will
   greatly aid operators in trouble-shooting problems that are caused by
   the network.

2.2.3.  Locating Server Side Problems

   For end-to-end application such as video streaming, there is a
   possibility that performance degradation is caused by the problems in
   the upstream of the service chain, located in the server side, owned
   by server network providers.  In this case, if the operator network
   provider can use passive measurement results as a proof to server
   network providers, and improve the server network provider's
   understanding about how the network is doing outside the server
   gateway.  Using this information, server network providers can focus
   more on the potential problem area, rather than looking outside.

3.  Conclusion

   The information exposed by TCP Header enables network operators to do
   passive measurements such as RTT and packet loss.  This information
   is useful for network operators to do trouble-shooting.  This
   document proposes several use cases about passive measurement.  A
   conclusion can be drawn from those use cases is that passive
   measurement is a viable means for diagnosis of application
   performance degradation, especially in problem demarcation between
   network segments.





Even, et al.             Expires March 16, 2018                 [Page 9]

Internet-Draft       passive-measurements-in-network      September 2017


   It is a recommendation for future transport protocols that passive
   measurements of RTT and packet loss are supported.  New transport
   protocols may exploit different ways than what TCP does.  It is
   required that information needed for doing passive measurements is
   exposed to network.

   For more information and discussion on solutions see also
   [I-D.stephan-quic-interdomain-troubleshooting] and
   [I-D.ietf-quic-manageability].

4.  Security Considerations

   T.B.D.

5.  IANA Considerations

   This document has no requirement on IANA.

6.  Informative References

   [I-D.ietf-quic-manageability]
              Kuehlewind, M., Trammell, B., and D. Druta, "Manageability
              of the QUIC Transport Protocol", draft-ietf-quic-
              manageability-00 (work in progress), July 2017.

   [I-D.ietf-quic-transport]
              Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed
              and Secure Transport", draft-ietf-quic-transport-05 (work
              in progress), August 2017.

   [I-D.mm-wg-effect-encrypt]
              Moriarty, K. and A. Morton, "Effect of Pervasive
              Encryption on Operators", draft-mm-wg-effect-encrypt-12
              (work in progress), June 2017.

   [I-D.stephan-quic-interdomain-troubleshooting]
              Emile, S., Cayla, M., Braud, A., and F. Fieau, "QUIC
              Interdomain Troubleshooting", draft-stephan-quic-
              interdomain-troubleshooting-00 (work in progress), July
              2017.

   [RFC1122]  Braden, R., Ed., "Requirements for Internet Hosts -
              Communication Layers", STD 3, RFC 1122,
              DOI 10.17487/RFC1122, October 1989, <https://www.rfc-
              editor.org/info/rfc1122>.






Even, et al.             Expires March 16, 2018                [Page 10]

Internet-Draft       passive-measurements-in-network      September 2017


   [RFC7258]  Farrell, S. and H. Tschofenig, "Pervasive Monitoring Is an
              Attack", BCP 188, RFC 7258, DOI 10.17487/RFC7258, May
              2014, <https://www.rfc-editor.org/info/rfc7258>.

   [RFC7323]  Borman, D., Braden, B., Jacobson, V., and R.
              Scheffenegger, Ed., "TCP Extensions for High Performance",
              RFC 7323, DOI 10.17487/RFC7323, September 2014,
              <https://www.rfc-editor.org/info/rfc7323>.

   [RFC7624]  Barnes, R., Schneier, B., Jennings, C., Hardie, T.,
              Trammell, B., Huitema, C., and D. Borkmann,
              "Confidentiality in the Face of Pervasive Surveillance: A
              Threat Model and Problem Statement", RFC 7624,
              DOI 10.17487/RFC7624, August 2015, <https://www.rfc-
              editor.org/info/rfc7624>.

Authors' Addresses

   Roni Even
   Huawei

   Email: roni.even@huawei.com


   Hui Zheng (Marvin)
   Huawei

   Email: marvin.zhenghui@huawei.com


   Liang Geng
   ChinaMobile

   Email: gengliang@chinamobile.com


   Rachel Huang
   Huawei

   Email: rachel.huang@huawei.com











Even, et al.             Expires March 16, 2018                [Page 11]