Internet DRAFT - draft-xie-v6ops-network-happyeyeballs


Internet Engineering Task Force                                   C. Xie
Internet-Draft                                             China Telecom
Intended status: Informational                                   L. Song
Expires: May 29, 2019                         Beijing Internet Institute
                                                       J. Palet Martinez
                                                        The IPv6 Company
                                                       November 25, 2018

     Network-side Happy Eyeballs based on accurate IPv6 measurement


   During the period of IPv6 transition, both ISPs and ICPs (Internet
   Content Providers) care about user's experience in dual-stack
   networks.  They hesitate to provide IPv6 to their users due to the
   fear of poor IPv6 performance.  Network-based Happy Eyeballs (NHE) is
   proposed in this memo as an approach to facilitate ISPs to identify
   IPv6 connectivity issues and provide better connectivity to end
   users.  NHE does accurate measurements and comparison on IPv6/IPv4
   performance on the network side compared with client-side as in Happy
   Eyeballs v2 (HEv2) [RFC8305].  It works independently with client's
   adoption of HEv2 and both coexist without conflicting.

   REMOVE BEFORE PUBLICATION: The source of the document with test
   script is currently placed at GitHub [NHE-GitHub].  Comments and pull
   request are welcome.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on May 29, 2019.

Xie, et al.               Expires May 29, 2019                  [Page 1]
Internet-DraftNetwork-side Happy Eyeballs based on accurateNovember 2018

Copyright Notice

   Copyright (c) 2018 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   ( in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Overview of NHE Framework . . . . . . . . . . . . . . . . . .   3
   3.  IPv6/IPv4 Performance Measurement . . . . . . . . . . . . . .   4
     3.1.  Performance metrics . . . . . . . . . . . . . . . . . . .   4
     3.2.  Location of IPv6/IPv4 Measurement . . . . . . . . . . . .   6
     3.3.  Reducing measurement traffic  . . . . . . . . . . . . . .   7
   4.  Reporting IPv6 failures using syslog  . . . . . . . . . . . .   7
     4.1.  Discovery of the syslog collector NSP . . . . . . . . . .   8
   5.  One Use Case of Troubleshooting action  . . . . . . . . . . .   8
   6.  Security considerations . . . . . . . . . . . . . . . . . . .   9
   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   9
   8.  Acknowledgments . . . . . . . . . . . . . . . . . . . . . . .   9
   9.  References  . . . . . . . . . . . . . . . . . . . . . . . . .   9
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  10

1.  Introduction

   During the period of IPv6 transition, both ISPs and ICPs (Internet
   Content Providers) care about user's experience in dual-stack
   networks.  They hesitate to provide IPv6 to their users due to the
   fear of poor IPv6 performance.  Happy Eyeballs v2 (HEv2) [RFC8305]
   provides an approach to enable clients to attempt multiple
   connections in parallel.  It is helpful to work around the blocked,
   broken, or sub-optimal network.  Taking IPv6 priority consideration
   in design, HEv2 helps increase IPv6 traffic in networks and reduce
   the delay in client side as well if IPv6 connectivity is poorer.  So
   far, most modern web browsers support HEv2 very well, thanks to
   popular web browser engines, such as WebKit and Trident.  However, in
   practice there are still some barriers keeping Mobile developers who
   develop Apps with APIs and libs which don't not implementing HEv2.

Xie, et al.               Expires May 29, 2019                  [Page 2]
Internet-DraftNetwork-side Happy Eyeballs based on accurateNovember 2018

   Firstly, HEv2 adds additional complexity and uncertainties to both
   development and operation.  For example, according to the section 8
   of [RFC8305] there are 6 Configurable values such as Resolution Delay
   and Connection Attempt Delay.  It raises a bar for small application
   developers to do a "nuanced implementation" to tune these values
   according to network dynamics.  Secondly, paralleled connections
   emitted by HEv2 produces larger volume of traffic which consume both
   mobile fees and power.  As a result, mobile application developers
   may choose not to adopt HEv2 or postpone their IPv6 transition due to
   those issues.  The third, client-based HEv2 hides some of the
   possible IPv6 connectivity issues to the operator, because users
   don't notice anything broken, so they aren't reporting it to their
   providers.  Those issues are more notable in regions where IPv6
   performance is not as good as IPv4 in terms of RTT and failure rate

   This memo is intended to proposed a Network-side Happy Eyeballs
   (NHE), an approach to improve IPv6 connectivity by doing network-side
   IPv6 measurement and failure reporting.  Instead of requiring the
   client to race IPv6 and IPv4 connections, NHE intends to do the
   "race" on the network side.  NHE aims to provide helpful alert
   information for ISP to fix the networking issues by themselves.  In
   addition, this memo also introduces a potential use case of NHE to
   work around networking issues which can't be resolved locally (issues
   of third parties on the path to the destination, or the destination
   itself, for example).

   The rationale of NHE approach is simple.  Considering that ISPs
   typically the mobile and broadband network providers have more
   resources, capability and motivation to do accurate IPv6/IPv4
   performance measurement, using existing protocols for the immediate
   alert/reporting of failures, those can then be analyzed and resolved,
   improving network reliability, for the good of their users.  With
   sufficient and accurate troubleshooting information, ISPs will have a
   crystal clear vision about their IPv6 network performance and spare
   no effort to improve it.

2.  Overview of NHE Framework

   As shown in Figure 1 NHE Frame consists of three key components:
   IPv6/IPv4 performance measurement, IPv6 failure reporting and
   troubleshooting actions on these failures.  To resolve the issue of
   Client-based HEv2 concealing the operational issues of IPv6 network,
   IPv6 failure reporting is a key element in NHE, by reporting and
   collecting precise performance information of the IPv6 network.  Note
   that IPv6 failure event is not necessary only triggered by
   disconnection or severe packet dropping.  It includes all events once

Xie, et al.               Expires May 29, 2019                  [Page 3]
Internet-DraftNetwork-side Happy Eyeballs based on accurateNovember 2018

   IPv6 connection is slower than IPv4 in the race.  Section 4 will
   introduce more about how IPv6 failure reporting works.

   The IPv6/IPv4 performance measurement component is designed to feed
   IPv6 failure reporting, by performing IPv6/IPv4 RTT measurement on a
   special list of domains (called measuring list).  The list of domains
   under the measurement can start with a well populated cache, then
   updated in alignment with a certain dynamic popularity of Domains in
   the network.  To achieve better accuracy of measurement, probes may
   be located adjacent to clients on the edge of the network.  The
   criteria of putting a specific domain into that list and how to
   perform the measurement are introduced in Section 3.

     +-------------+     +--------------+     +---------------+
     | IPv6/IPv4   +---> | IPv6 Failure +---> |Troubleshooting|
     | Measurement |     |   Reporting  |     |    Actions    |
     +-------------+     +--------------+     +---------------+

                    Figure 1: High-Level NHE Framework

   After IPv6 failure information is collected and analyzed, various
   troubleshooting actions can be adopted accordingly.  Most of the
   actions are similar to IPv4 network troubleshooting.  For example, if
   the problem is local, operators should resolve the networking issue
   as soon as possible.  If the problem is caused by far-end or third-
   parties, the ISP may check the upstream ISPs or transit peering ASs
   to clear the issue (withdraw some BGP peerings for example).  There
   is a case with an action which can be adopted temporarily to reduce
   the suffering of IPv6 poor performance for a specific domain.  It
   will be introduced in Section 5.

   Note that NHE can work independently with client's adoption of HEv2
   and both coexist without conflicting in the NHE framework.

3.  IPv6/IPv4 Performance Measurement

   An accurate IPv6 performance measurement is vital to the success of
   NHE.  An accurate measurement depends on what to measure, where and
   how to measure.

3.1.  Performance metrics

   In client-side HEv2, a kind of round-trip delay or round-trip
   time(RTT) metric is used where a race between IPv6 and IPv4
   connection is measured, starting from the domain name resolution,

Xie, et al.               Expires May 29, 2019                  [Page 4]
Internet-DraftNetwork-side Happy Eyeballs based on accurateNovember 2018

   then the TCP setup on both address families.  In NHE, a similar
   approach is adopted in the ISP network to simulate a client doing a
   race for a list of domains.

   o  Lookup the domain name.  If a positive AAAA response with at least
      one valid AAAA record is received, it continues the process.  If a
      negative response with no AAAA record is received, it will break
      and continue with another domain in the list.  Note that, if a
      negative response with ServFail is received, which means error on
      the far-end server, it should be marked "alert to operator" to
      report "ServFail" incident.  It is observed that some clients will
      continue asking AAAA queries after receiving ServFail response.

   o  Make TCP connections via all IPv6 and IPv4 destination addresses
      returned.  Note that in NHE there is no address sorting or
      connection attempt Delay which are important in the design of
      client-side HEv2.  NHE measuring server can concurrently make
      connections on all addresses returned.

   o  The round-trip delay is measured including the RTT of the domain
      name resolution and the RTT of the TCP setup (started when sending
      the SYN and ending when the ACK is received).  If there is more
      than one IP address in either the AAAA or A record responses, all
      the addresses should be measured for the round-trip delay.

   o  Calculate the difference of round-trip delay (Diff-RTT) of
      different address families.  If there are more than one IP address
      in either the AAAA or A record responses, the minimum RTT of a
      destination from one address family will be chosen to do the
      difference, that is Min(RTT-IPv6)-min(RTT-IPv4)

   o  For each domain, if the difference of the round-trip delay of IPv6
      and IPv4 is larger than a configurable threshold, the domain will
      be recorded in a local list and flagged as "alert to operator"
      with "Poor IPv6 Performance" incident.  This action will trigger
      the reporting algorithm (described in section 4).  If the domain
      is already listed in the local list with a flag "alert to
      operator", nothing should be done (in order to avoid repetitive

   o  When a follow-up measurement result shows that, for a given
      domain, which was previously flagged as "alert to operator", there
      is no longer an issue, the "alert to operator" flag must be
      cleared and the reporting algorithm will be triggered.

   Note that the threshold value should be tunable by the network
   provider to gain a better tradeoff between IPv6 vs IPv4 performance
   and allow to adjust the IPv6 vs IPv4 priority local policy.

Xie, et al.               Expires May 29, 2019                  [Page 5]
Internet-DraftNetwork-side Happy Eyeballs based on accurateNovember 2018

   In the measurement process described above, there is an important
   domain list called Measuring list which contains the targeted
   domains.  The list can be formed and updated from the popular domains
   visited by users of the network.  This measurement should be done
   periodically on each domain, every a given configurable number of

   Compared to Client-side HEv2, NHE operated by ISPs have more
   resources to do better performance measurement.  For exmaple, the
   race on the handset measures the round-trip delay on one
   instantaneous connection which does not fully represent the
   connectivity performance of one address family in a persistent
   period.  For example, erratic variation in delay (caused by network
   jitter) makes it difficult to support many interactive real-time
   applications.  So, the statistics of round- trip delay are helpful
   for ISPs to build more sophisticated measurements.  Section 4 of
   [RFC2681] specifies some statistics definitions for round-trip delay
   which can be utilized for advanced Round-trip delay measurement, such
   as percentile, median, minimum, inverse percentile, etc.

   Also, HEv2 measurements may be influenced by access network problems,
   which don't affect NHE measurements.  The ISP should measure the
   access network problems using alternative means.

3.2.  Location of IPv6/IPv4 Measurement

   According to the accuracy requirement of user performance simulation,
   the location where the measurement is done is very important.  The
   intuitive approach is to place the measuring probes or servers on the
   edge, in proximity to the end users.  In 4G LTE cellular networks as
   a typical case, the performance measurement servers can be located in
   proximity to base stations (or an aggregation point).  There is only
   at most one hop difference in the end-to-end path between a real end-
   user and a destination.

   In the case of broadband networks,the measuring probes can be
   collocated with the BRAS, OLTs, or equivalent aggregation points,
   depending on each access technology.

   Setting up probes at different parts of the network, including core,
   or close to the upstream provider connectivity, can help to determine
   the source of the issues, especially if they affect many domains.

   Moreover, probes can be designed into a special mobile device for
   reporting purpose.  There are many choices.  For example, people can
   implement a specially application to do the probing and reporting to
   a collector operated by the operator.  Modern APM (application
   performance measurement) and NPM (network performance measurement)

Xie, et al.               Expires May 29, 2019                  [Page 6]
Internet-DraftNetwork-side Happy Eyeballs based on accurateNovember 2018

   technologies allow normal application software integrated with
   special SDK (Software Development Kit) for measuring purposes.  So,
   it is possible for mobile App providers to adopt NHE framework to
   avoid IPv6 poor performance by themselves, although the
   troubleshooting actions are different from the one on the ISP case.

3.3.  Reducing measurement traffic

   Since the IPv6/IPv4 performance varies per domain, there is a fear of
   having to generate a lot of measuring traffic in NHE.  There are two
   approaches that may be helpful to generate less traffic.  One is to
   keep a moderate size of Measuring list list including, for example,
   the top 1k popular domains in the cache.  The size of the Measuring
   list can be configurable as well according to the ISP local
   policy.One optional approach to limit the size of the measuring list
   is to focus on top 1k Apps other than domains.  ISPs can cooperate
   with ICPs to maintain a domain list of top Apps for NHE.

   The second approach to reduce the measurement traffic, is to use
   passive measurements.  The round-trip delay of DNS lookup of a
   particular domain is trivial in most of the cases if there is a cache
   hit.  So passive measurement should focus on monitoring TCP
   connection of specific destinations.  Suppose there are 1000 top
   popular domains in the measurement list, which means a thousand of
   TCP connections will be inserted into the passive monitoring to
   measure the round-trip delay.

4.  Reporting IPv6 failures using syslog

   In order to simplify the reporting of the NHE failures, syslog
   ([RFC5424]) over UDP ([RFC5426]), MUST be used, by means of the
   default port (514) with IPv6-only.

   The intent is to make this reporting very simple, so no choice of
   alternative ports or transport protocols is offered.

   Operators willing to use this reporting MUST configure at least one
   syslog collector.

   The configuration can be done in a static way, providing dedicated
   IPv4/IPv6 addresses for the syslog collectors and probes.

   As an alternative, a more automated procedure can be done by
   configuring at least one syslog collector at the IPv6 prefix formed

   Network-Specific Prefix::

Xie, et al.               Expires May 29, 2019                  [Page 7]
Internet-DraftNetwork-side Happy Eyeballs based on accurateNovember 2018

   The Network-Specific Prefix (NSP) MUST be chosen by the operator from
   its RIR allocated IPv6 addressing space.

   Additional collectors can be made available by using anycast at the
   NSP + prefix

   Note that messages encoded in syslog are to be defined.  As
   introduced in Section 3.1, syslog of NHE should contain two kinds of
   message to report "Poor IPv6 Performance" incident and "ServFail"

4.1.  Discovery of the syslog collector NSP

   In case the automated procedure is used, the same mechanism described
   by RFC7050 ([RFC7050]) should be used to look for the address of the
   syslog collector(s).

   Because the collectors will be using an IPv6 address with the 32 low
   order bits from the reserved range, this will not be
   in conflict with any public addresses used in Internet, so this
   mechanism is compatible with the expected usage of the NSP for NAT64.

5.  One Use Case of Troubleshooting action

   Besides the normal network troubleshooting measures taken by network
   operators as usual in IPv4 networks, there are other troubleshooting
   actions for temporary but urgent workarounds.

   Before [RFC6555] and [RFC8305] were documented, selective filtering
   of the DNS AAAA record (returning NODATA) was proposed as a practice
   making the IPv6 transition less painful [Less-painful].  The basic
   idea introduced is that ISP DNS Recursive servers does not return
   AAAA for users who have broken IPv6 connectivity.  There are some
   working implementations of such filter AAAA option in BIND 9

   However, it should be noticed that there are two security risks on
   selective filtering.  One is that it may break DNSSEC and omit RRSIG
   records covering type AAAA as well as AAAA record.  The second is
   that filtering AAAA records cause DNS incoherency in the end-users
   perspective which may causes some risks if end user's application
   depend on the integrity of DNS data.

   To reduce both security risks, an alternative approach for an ISP
   using NHE, could be to run a special resolver which artificially
   delays the AAAA answers of a targeted domain name.  A domain name
   being targeted means that the IPv6 performance of that domain name is
   measured and reported with poor performance.  So, instead of

Xie, et al.               Expires May 29, 2019                  [Page 8]
Internet-DraftNetwork-side Happy Eyeballs based on accurateNovember 2018

   filtering the AAAA record, postponing the AAAA responses with a
   configurable timer (i.e., 300 ms) may cause IPv6 connection losing
   the race on the client side which avoid Concurrent IPv6 and IPv4
   connection attempts.  It will help the HEv2 client.  The non-HE
   client will fall back sooner to IPv4 without IPv6 connections and

   Note that there is a corner case when negative response with ServFail
   are received for a domain name lookup, no ServFail response should be
   returned to the client, because it is observed that some clients will
   continue querying for AAAA RRs after receiving ServFail response.  In
   this case, the resolver could silently drop the query without
   responding to the client.

6.  Security considerations


7.  IANA Considerations

   No IANA considerations for this memo

8.  Acknowledgments

   Acknowledgments are given to Geoff Huston, David Schinazi, Marc
   Blanchet, and Paul Vixie who gave comments and suggestions on the
   conception of NHE.

   Thanks to Tony Finch and Tommy Pauly who gave positive comment on the
   part which 01 version stands on.

9.  References

              APNIC, "APNIC IPv6 Performance Monitoring",

              "Filter AAAA option in BIND 9", August 2017,

              Yahoo, "IPv6 and recursive resolvers:How do we make the
              transition less painful?", March 2010,

Xie, et al.               Expires May 29, 2019                  [Page 9]
Internet-DraftNetwork-side Happy Eyeballs based on accurateNovember 2018

              BII, "GitHub Repository of Network-side Happy Eyeballs",

   [RFC2681]  Almes, G., Kalidindi, S., and M. Zekauskas, "A Round-trip
              Delay Metric for IPPM", RFC 2681, DOI 10.17487/RFC2681,
              September 1999, <>.

   [RFC5424]  Gerhards, R., "The Syslog Protocol", RFC 5424,
              DOI 10.17487/RFC5424, March 2009,

   [RFC5426]  Okmianski, A., "Transmission of Syslog Messages over UDP",
              RFC 5426, DOI 10.17487/RFC5426, March 2009,

   [RFC6555]  Wing, D. and A. Yourtchenko, "Happy Eyeballs: Success with
              Dual-Stack Hosts", RFC 6555, DOI 10.17487/RFC6555, April
              2012, <>.

   [RFC7050]  Savolainen, T., Korhonen, J., and D. Wing, "Discovery of
              the IPv6 Prefix Used for IPv6 Address Synthesis",
              RFC 7050, DOI 10.17487/RFC7050, November 2013,

   [RFC8305]  Schinazi, D. and T. Pauly, "Happy Eyeballs Version 2:
              Better Connectivity Using Concurrency", RFC 8305,
              DOI 10.17487/RFC8305, December 2017,

Authors' Addresses

   Chongfeng Xie
   China Telecom
   No.118 Xizhimennei street, Xicheng District
   Beijing  100035
   P. R. China


Xie, et al.               Expires May 29, 2019                 [Page 10]
Internet-DraftNetwork-side Happy Eyeballs based on accurateNovember 2018

   Linjian Song
   Beijing Internet Institute
   2nd Floor, Building 5, No.58 Jing Hai Wu Lu, BDA
   Beijing  100176
   P. R. China


   Jordi Palet Martinez
   The IPv6 Company
   Molino de la Navata, 75
   Madrid, La Navata - Galapagar  28420


Xie, et al.               Expires May 29, 2019                 [Page 11]