Internet DRAFT - draft-even-quic-troubleshooting-video-delivery
draft-even-quic-troubleshooting-video-delivery
Network Working Group R. Even
Internet-Draft H. Zheng
Intended status: Informational Huawei
Expires: March 16, 2018 L. Geng
ChinaMobile
R. Huang
Huawei
September 12, 2017
Passive Measurements in Network for troubleshooting Video Delivery
Problems
draft-even-quic-troubleshooting-video-delivery-00
Abstract
This document provides a detailed description of the passive
measurements that operators are using to troubleshoot network
problems when delivering streaming video and multimedia services.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on March 16, 2018.
Copyright Notice
Copyright (c) 2017 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
Even, et al. Expires March 16, 2018 [Page 1]
Internet-Draft passive-measurements-in-network September 2017
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Passive Measurements for troubleshooting Video Delivery
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1. Passive Measurements for TCP . . . . . . . . . . . . . . 5
2.1.1. RTT Measurements . . . . . . . . . . . . . . . . . . 5
2.1.2. Loss Measurements . . . . . . . . . . . . . . . . . . 6
2.2. Video Delivery Problems Troubleshooting . . . . . . . . . 7
2.2.1. Locating WIFI Problems in Home Network . . . . . . . 7
2.2.2. Locating Network Devices Problems . . . . . . . . . . 8
2.2.3. Locating Server Side Problems . . . . . . . . . . . . 9
3. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 9
4. Security Considerations . . . . . . . . . . . . . . . . . . . 10
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10
6. Informative References . . . . . . . . . . . . . . . . . . . 10
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11
1. Introduction
Privacy protection has been a growing concern in IETF. [RFC7258]
says that Pervasive Monitoring (PM) is a technical attack that should
be mitigated where possible through the design of protocols. This
call of [RFC7258] is answered by emerging protocols, for example,
QUIC [I-D.ietf-quic-transport]. QUIC is a new transport protocol
designed to be secure. Once a connection is set up, packets
exchanged by QUIC are largely encrypted during transmission; only a
minimal piece of information in the protocol header is exposed. The
encryption protects QUIC packets from being tampered by network
middleboxes.
Driven by concerns on privacy, the Internet has been accelerating the
shift from using plaintext traffic towards using encrypted traffic
[I-D.mm-wg-effect-encrypt]. Google started to offer end-to-end
encryption for Gmail in 2010 and for searches in 2013. YouTube
traffic has been carried via HTTPS (or QUIC) since 2014. In
addition, the Snowden revelations [RFC7624] seem to cause an upward
surge in encrypted traffic.
However, it is also documented in [RFC7258] that making networks
unmanageable to mitigate PM is not an acceptable outcome. The
prevalence of encryption precludes operators from obtaining certain
traffic information to do some applications service quality
estimation; an example use case is that operators need ways to locate
Even, et al. Expires March 16, 2018 [Page 2]
Internet-Draft passive-measurements-in-network September 2017
fault and perform diagnosis when users report a degradation in
quality of service.
For traditional transport protocols such as TCP, passive measurements
are easy to perform, because TCP exposes protocol state information
in the protocol header, even HTTPS or TCPinc is used. The passive
measurements on TCP traffic enables operators to manage and diagnoses
TCP traffic in many ways, depending on the particular needs of a
specific application.
This draft aims to provide information on how operators use passive
measurements derived from TCP header for video applications. For
example, how to monitor video service quality degradation, and
trouble-shoot and locate problematic devices in the networks. This
information can be used as a reference to future transport protocols
about how passive measurements can be useful.
2. Passive Measurements for troubleshooting Video Delivery Problems
Below is a blueprint of video streaming across networks:
Even, et al. Expires March 16, 2018 [Page 3]
Internet-Draft passive-measurements-in-network September 2017
------------
/ +------+ \
/ |Video | \
Server | |Server| |
Network | +--v---+ |
\ | /
\ | /
+-------V--------+
| Server Gateway |
+-------V--------+
/ | \
/ | \
| +-----v------+ |
| |Core Router | |
Operator | +-----v------+ |
Network | | |
| +-----V-------+|
| |Access Router||
| +-----V-------+|
\ | /
\ | /
+------V-------+
| Home Gateway |
+------V-------+
/ | \
/ | \
Home | +--v---+ |
Network | |Video | |
\ |Player| /
\ +------+ /
------------
Figure 1: Video Streaming across Networks
Video streaming relies on the network to transport its data. When a
user experiences a degradation in the quality of video service, the
user may complain and report the degradation to the network operator.
Such kind of report usually includes little useful information for
the operator to identify what the problem is. This due to video
streaming is an end-to-end service; it is not easy to tell which path
on the delivery chain goes wrong, given that the delivery chain can
span multiple network providers' networks. In Figure 1, the delivery
of a video stream crosses three networks: server network, operator
network and the end user's home network. In this case, trouble-
shooting the degradation is difficult and involves devices in the
server network and the home network; both are out of operator's
control.
Even, et al. Expires March 16, 2018 [Page 4]
Internet-Draft passive-measurements-in-network September 2017
To aid operator on such kind of issues, operators need a way to
monitor and measure the video streaming performance on various nodes
in the network along the delivery path. Operators may deploy probes
on the network nodes (e.g. home gateway, access router, or core
router) to measure and report information at flow level. With such
information at hand, it should be easier for operators to diagnosis
service degradation. Moreover, for operator to detect and take
action about service degradation proactively.
To summarize, passive measurements in the network help operators
detect application problems at large. Without it, operators may have
to resort to traditional methods, to perform tests in the network and
analyze the results. It could be time consuming and off the scene,
since not all problems are reproducible by tests.
2.1. Passive Measurements for TCP
Section 3 of [I-D.stephan-quic-interdomain-troubleshooting] also
mentions these measurements. Here, more detailed descriptions is
given. As indicated in Figure 2, TCP passive measurements require
setting up a measurement point on the path of a TCP connection. The
measurement point virtually splits the path into halves. The half
close to the server is called "upstream"; the half close to the
client is called "downstream". The following sections are going to
describe the methods for the measurement of Round Trip Time (RTT) and
Loss in both upstream and downstream. "Inbound" and "Outbound" are
used to denote stream direction. "Inbound" denotes the stream is
toward server, whereas "Outbound" denotes the stream is toward
client.
Measurement Point
+--------+ | +--------+
| Server |<------------------|------------------->| client |
+--------+ Upstream | Downstream +--- ----+
Figure 2: Passive Measurement Point
One caveat about passive measurement is it has no way to know the
processing time at end points. For example, if server or client adds
some delay before sending a packet, the delay cannot be mitigated at
the measurement point when calculating RTT.
2.1.1. RTT Measurements
TCP connection setup is a three-way handshake. Usually the client
initiates a connection to the server. The signals "SYN -> SYN-ACK ->
ACK" can be used to determine the initial upstream/downstream RTT.
Even, et al. Expires March 16, 2018 [Page 5]
Internet-Draft passive-measurements-in-network September 2017
o Initial Upstream RTT: The time difference between SYN and SYN-ACK
o Initial Downstream RTT: The time difference between SYN-ACK and
ACK
After the connection setup phase, the initial RTT should be updated
by sequence number matching. The built-in mechanism of TCP requires
every segment to be acknowledged. By matching the sequence number,
it is possible to pair a segment to its corresponding acknowledgement
at the Measurement Point. The time difference between the segment
and its acknowledgement can be a strong candidate for the RTT.
However, this method is not without measurement error. In the
following situations measurement error can occur:
o Delayed ACK. For good reasons, the TCP endpoint may decide to
delay sending acknowledgement for a little while. The measurement
error contributed by delayed ACK can be up to 500 milliseconds,
according to the statement in [RFC1122].
o Packet Loss. Another source of measurement error is from packet
loss. A segment past the Measure Point can still be lost on the
way to its destination. An acknowledgement can be lost before
arriving the Measurement Point. There are times a segment cannot
be matched to its corresponding acknowledgement, but to a latter
one, thus contributing to measurement error.
Note that bidirectional streams are required to measure both
downstream and upstream RTT when using sequence number matching.
Unidirectional stream from server to client yields downstream RTT.
For upstream RTT, unidirectional stream from client to server is
required.
An alternative method of measuring RTT is described in Section 4 of
[RFC7323], which utilize the TCP Timestamps option. The method
results in less measurement error than sequence number matching
2.1.2. Loss Measurements
TCP uses sliding window at both endpoints to coordinate data
transmission. Sending endpoint utilizes send window to control how
many data it can send; receiving endpoint utilizes receive window as
a buffering mechanism for incoming data and to report window size
update. The sliding window mechanism exchanges information by using
fields and options of TCP header, thus it is visible to the network.
Such information can be obtained at the Measurement Point, and the
following loss measurements can be performed:
o Downstream Loss Rate Measurement.
Even, et al. Expires March 16, 2018 [Page 6]
Internet-Draft passive-measurements-in-network September 2017
o Upstream Loss Rate Measurement.
Downstream Loss Rate can be measure by monitoring outbound streams at
the Measurement Point. From the sequence number exposed in TCP
header, two values can be calculated: total amount of original data,
and total amount of retransmitted data. Total amount of application
data represents the number of bytes application wants to send. Total
amount of retransmitted data represents the number of bytes that have
been previously received at the Measurement Point. Downstream Loss
Rate is calculated as:
Total Amount of Retransmitted Data / Total Amount of Application Data
For Upstream Loss Rate, monitoring outbound streams can only give
estimates. This is due to difficult in counting the amount of data
that is lost in the upstream before arriving to the Measurement
Point. Data loss in the upstream causes the Measurement Point seeing
"holes" in received sequence numbers. The amount of data represented
by the "holes" can be used as an estimate for upstream data loss.
However, to make more practical estimate of loss, two issues need to
be considered. A) out-of-order packets can as well cause "holes", so
the measurement should also account for out-of-order arrival. B) If
the segment with newer sequence number than that is recorded at the
Measurement Point, there is no way to tell such loss at the
Measurement Point.
The situation is reversed when monitoring inbound streams instead of
outbound streams. In this case, Upstream Loss Rate can be measured
more precisely and Downstream Loss Rate can only be estimated.
2.2. Video Delivery Problems Troubleshooting
This section describes how the TCP passive measurements are used for
troubleshooting the video delivery problems. As depicted in
Figure 1, three network segments are concerned: home network,
operator's network and server network. The following subsections
address problems regarding each of the network segments.
2.2.1. Locating WIFI Problems in Home Network
It is common that WIFI is used in home network to share internet
access wirelessly. This functionality brings mobility to people when
accessing internet at home. However, it comes at a cost when
wireless access performances is worse than wired access, since
wireless signal suffers more from varying environmental conditions.
Wireless access inherently incurs more packet loss and often results
in large delay. Performance of network applications is often
degraded in wireless network.
Even, et al. Expires March 16, 2018 [Page 7]
Internet-Draft passive-measurements-in-network September 2017
When network application performance degrades, WIFI is often blamed.
It is desirable for operators to know how much WIFI has contributed
to the degradation. Some passive measurement methods are needed to
help visualize the problem. One method is to profile the RTT in the
home network. High RTT values may be seen for home networks that use
WIFI.
One important reason that WIFI causes high RTT values is that WIFI
retransmits lost frames in its Medium Access Control (MAC) layer, in
order to alleviate high loss induced by poor wireless conditions.
Due to the trade-off at MAC layer, WIFI traffic often has the trait
of high delay and relatively low packet loss rate. This trait makes
traffic over WIFI more distinguishable from traffic other carriers.
To profile the RTT in the home network, the Measurement Point should
be set at the home gateway if it is controlled by the operator.
Otherwise, the Measurement Point has to be deployed one level above
the home gateway in the access network, usually the next hop IP
address from the home gateway. In this case, the Measurement Point
is distant from the home network it measures. Congestion in the link
between the Measurement Point and the home network can affect the
test result. The Measurement Point must not account for RTTs
affected by congestion in the link. When congestion occurs, the loss
and delay both increases, making it distinguishable from ordinary
WIFI traffic, which is high delay but low loss.
Passive measurement on TCP traffic is crucial to the RTT profiling
method introduced above, since TCP traffic is the major constitution
of all traffic on the Internet. It is a viable source to collect
downstream RTT from TCP traffic.
2.2.2. Locating Network Devices Problems
Sometimes application performance degradation is caused by problems
in the network. One faulty or misconfigured node in the network may
cause unusual packet loss or unnecessary delay for packets. When
this happens, it is often difficult for operators to locate the
faulty or misconfigured node, due to the complex architecture of
network. Operators have to find out whether the problem exists in
the access level, or the aggregation level, or even in the core.
To help locate the problem, it is useful to identify which network
segment causes it. For that, passive measurement can serve as a
vital means for the problem demarcation between network segments.
Probes can be deployed on suitable nodes along the whole network
path, as indicated in Figure 3.
Even, et al. Expires March 16, 2018 [Page 8]
Internet-Draft passive-measurements-in-network September 2017
+--------------------+
| Measurement Center |
+--------------------+
\ \
\ Probe \ Probe
\ \
+-------+ ++ ++ +-------+
|Network| Access || Aggregation || Core |Network|
|Ingress|---------++--------------++---------|Egress |
|Node | Network || Network || Network |Node |
+-------+ ++ ++ +-------+
Figure 3: Probes Deployed on Network Path
The purpose of those probes deployed on the network path is to
measure TCP traffic passively and report the collected downstream/
upstream RTT and Loss information to the Measurement Center. Then it
is possible for the Measurement Center to build a normal baseline of
the characteristics of the network segments. If a network node turns
faulty or misconfigured, its behavior will deviate from the normal
baseline, thus be detected by the Measurement Center. This will
greatly aid operators in trouble-shooting problems that are caused by
the network.
2.2.3. Locating Server Side Problems
For end-to-end application such as video streaming, there is a
possibility that performance degradation is caused by the problems in
the upstream of the service chain, located in the server side, owned
by server network providers. In this case, if the operator network
provider can use passive measurement results as a proof to server
network providers, and improve the server network provider's
understanding about how the network is doing outside the server
gateway. Using this information, server network providers can focus
more on the potential problem area, rather than looking outside.
3. Conclusion
The information exposed by TCP Header enables network operators to do
passive measurements such as RTT and packet loss. This information
is useful for network operators to do trouble-shooting. This
document proposes several use cases about passive measurement. A
conclusion can be drawn from those use cases is that passive
measurement is a viable means for diagnosis of application
performance degradation, especially in problem demarcation between
network segments.
Even, et al. Expires March 16, 2018 [Page 9]
Internet-Draft passive-measurements-in-network September 2017
It is a recommendation for future transport protocols that passive
measurements of RTT and packet loss are supported. New transport
protocols may exploit different ways than what TCP does. It is
required that information needed for doing passive measurements is
exposed to network.
For more information and discussion on solutions see also
[I-D.stephan-quic-interdomain-troubleshooting] and
[I-D.ietf-quic-manageability].
4. Security Considerations
T.B.D.
5. IANA Considerations
This document has no requirement on IANA.
6. Informative References
[I-D.ietf-quic-manageability]
Kuehlewind, M., Trammell, B., and D. Druta, "Manageability
of the QUIC Transport Protocol", draft-ietf-quic-
manageability-00 (work in progress), July 2017.
[I-D.ietf-quic-transport]
Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed
and Secure Transport", draft-ietf-quic-transport-05 (work
in progress), August 2017.
[I-D.mm-wg-effect-encrypt]
Moriarty, K. and A. Morton, "Effect of Pervasive
Encryption on Operators", draft-mm-wg-effect-encrypt-12
(work in progress), June 2017.
[I-D.stephan-quic-interdomain-troubleshooting]
Emile, S., Cayla, M., Braud, A., and F. Fieau, "QUIC
Interdomain Troubleshooting", draft-stephan-quic-
interdomain-troubleshooting-00 (work in progress), July
2017.
[RFC1122] Braden, R., Ed., "Requirements for Internet Hosts -
Communication Layers", STD 3, RFC 1122,
DOI 10.17487/RFC1122, October 1989, <https://www.rfc-
editor.org/info/rfc1122>.
Even, et al. Expires March 16, 2018 [Page 10]
Internet-Draft passive-measurements-in-network September 2017
[RFC7258] Farrell, S. and H. Tschofenig, "Pervasive Monitoring Is an
Attack", BCP 188, RFC 7258, DOI 10.17487/RFC7258, May
2014, <https://www.rfc-editor.org/info/rfc7258>.
[RFC7323] Borman, D., Braden, B., Jacobson, V., and R.
Scheffenegger, Ed., "TCP Extensions for High Performance",
RFC 7323, DOI 10.17487/RFC7323, September 2014,
<https://www.rfc-editor.org/info/rfc7323>.
[RFC7624] Barnes, R., Schneier, B., Jennings, C., Hardie, T.,
Trammell, B., Huitema, C., and D. Borkmann,
"Confidentiality in the Face of Pervasive Surveillance: A
Threat Model and Problem Statement", RFC 7624,
DOI 10.17487/RFC7624, August 2015, <https://www.rfc-
editor.org/info/rfc7624>.
Authors' Addresses
Roni Even
Huawei
Email: roni.even@huawei.com
Hui Zheng (Marvin)
Huawei
Email: marvin.zhenghui@huawei.com
Liang Geng
ChinaMobile
Email: gengliang@chinamobile.com
Rachel Huang
Huawei
Email: rachel.huang@huawei.com
Even, et al. Expires March 16, 2018 [Page 11]