Internet DRAFT - draft-muks-dnsop-dns-thundering-herd

draft-muks-dnsop-dns-thundering-herd







Internet Engineering Task Force                             M. Sivaraman
Internet-Draft                             Akira Systems Private Limited
Intended status: Experimental                                        Liu
Expires: December 27, 2020                                      Infoblox
                                                           June 25, 2020


                    The DNS thundering herd problem
                draft-muks-dnsop-dns-thundering-herd-00

Abstract

   This document describes an observed regular pattern of spikes in
   queries that affects caching resolvers, and recommends software
   mitigations for it.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on December 27, 2020.

Copyright Notice

   Copyright (c) 2020 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.




Sivaraman & Liu         Expires December 27, 2020               [Page 1]

Internet-Draft       The DNS thundering herd problem           June 2020


Table of Contents

   1.  Problem Description . . . . . . . . . . . . . . . . . . . . .   2
   2.  Requirements Notation . . . . . . . . . . . . . . . . . . . .   4
   3.  Mitigations . . . . . . . . . . . . . . . . . . . . . . . . .   4
     3.1.  Combine identical queries to upstream nameservers . . . .   4
     3.2.  Include noise in response TTLs from caching resolvers . .   4
     3.3.  Other mitigations . . . . . . . . . . . . . . . . . . . .   4
   4.  Security Considerations . . . . . . . . . . . . . . . . . . .   5
   5.  IANA considerations . . . . . . . . . . . . . . . . . . . . .   5
   6.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .   5
   7.  References  . . . . . . . . . . . . . . . . . . . . . . . . .   5
     7.1.  Normative references  . . . . . . . . . . . . . . . . . .   5
     7.2.  Informative references  . . . . . . . . . . . . . . . . .   5
   Appendix A.  Change history (to be removed before publication)  .   6
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .   6

1.  Problem Description

   Typically, DNS caching resolvers prepare answers for multiple clients
   from a single cached RRset [RFC1034].  Depending on when in time the
   clients make their queries, caching resolvers reply with lower and
   lower valued TTLs, before the cached RRset from which answers are
   prepared expires.  Clients themselves may cache and use their copies
   of RRsets until the TTL that the resolver replied with expires.  A
   key property is that all these copies of answers, and the cached
   answer from which they are prepared, expire at the same absolute
   time.

   As an example, consider the following query sequence received by a
   resolver from 10 clients all querying for a popular
   www.example.org./A RRset.  We use this example to illustrate two
   kinds of spikes in queries.


















Sivaraman & Liu         Expires December 27, 2020               [Page 2]

Internet-Draft       The DNS thundering herd problem           June 2020


   +--------+------------+--------+------------------------------------+
   | Client | Query time | Answer | Notes                              |
   |        | (seconds   | RRset  |                                    |
   |        | since      | TTL    |                                    |
   |        | epoch)     |        |                                    |
   +--------+------------+--------+------------------------------------+
   | C1     | 1591441620 | 600    | Answer was not found in cache.     |
   |        |            |        | Resolver performs a resolution     |
   |        |            |        | and caches authoritative answer    |
   |        |            |        | with TTL=600.                      |
   | C2     | 1591441626 | 594    | Answered from cache.               |
   | C3     | 1591441713 | 507    | Answered from cache.               |
   | C4     | 1591441780 | 440    | Answered from cache.               |
   | C5     | 1591441866 | 354    | Answered from cache.               |
   | C6     | 1591442006 | 214    | Answered from cache.               |
   | C7     | 1591442070 | 150    | Answered from cache.               |
   | C8     | 1591442070 | 150    | Answered from cache.               |
   | C9     | 1591442213 | 7      | Answered from cache.               |
   | C3     | 1591442220 | 600    | Previously cached answer had       |
   |        |            |        | expired in the resolver's          |
   |        |            |        | cache. So the resolver performs a  |
   |        |            |        | fresh resolution and caches        |
   |        |            |        | authoritative answer with TTL=600. |
   | C5     | 1591442220 | 600    | Ditto if not joined with previous. |
   | C2     | 1591442220 | 600    | Ditto if not joined with previous. |
   | C6     | 1591442220 | 600    | Ditto if not joined with previous. |
   | C1     | 1591442221 | 599    | Answered from cache.               |
   | C9     | 1591442221 | 599    | Answered from cache.               |
   | C4     | 1591442221 | 599    | Answered from cache.               |
   | C8     | 1591442221 | 599    | Answered from cache.               |
   | C7     | 1591442221 | 599    | Answered from cache.               |
   | C10    | 1591442227 | 593    | Answered from cache.               |
   | C7     | 1591442820 | 600    | Previously cached answer had       |
   |        |            |        | expired in the resolver's          |
   |        |            |        | cache. So the resolver performs a  |
   |        |            |        | fresh resolution and caches        |
   |        |            |        | authoritative answer with TTL=600. |
   | C4     | 1591442820 | 600    | Ditto if not joined with previous. |
   | C1     | 1591442820 | 600    | Ditto if not joined with previous. |
   | C2     | 1591442820 | 600    | Ditto if not joined with previous. |
   | C10    | 1591442820 | 600    | Ditto if not joined with previous. |
   | C8     | 1591442820 | 600    | Ditto if not joined with previous. |
   | C3     | 1591442821 | 599    | Answered from cache.               |
   | C9     | 1591442821 | 599    | Answered from cache.               |
   | C5     | 1591442821 | 599    | Answered from cache.               |
   | C6     | 1591442821 | 599    | Answered from cache.               |
   +--------+------------+--------+------------------------------------+




Sivaraman & Liu         Expires December 27, 2020               [Page 3]

Internet-Draft       The DNS thundering herd problem           June 2020


2.  Requirements Notation

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in
   [RFC2119] [RFC8174] when, and only when, they appear in all capitals,
   as shown here.

3.  Mitigations

3.1.  Combine identical queries to upstream nameservers

   At a resolver, when multiple queries have arrived together asking the
   same question and there is no existing unexpired cached answer, DNS
   resolutions have to be performed to answer these queries.  De-
   duplication of these multiple resolutions into a single DNS
   resolution by the resolver is RECOMMENDED where possible.

   If such de-duplication is not performed, the client queries will
   effectively be forwarded 1:1 by the resolver to upstream nameservers,
   and they will significantly increase the upstream nameservers' query
   rate in spikes.  Some nameserver operators may have deployed measures
   such as response rate limiting [RRL] and other IP-address based rate
   limiting, which may cause them to deny service to the resolver due to
   the query spikes of identical queries.

3.2.  Include noise in response TTLs from caching resolvers

   Caching resolvers are permitted to lower the TTLs of RRsets in their
   answers as they please [RFC2181].  This can be used to distribute the
   time at which RRset copies received by clients expire from a single
   absolute time to a time interval.  However, this has to be done with
   some consideration such that the thundering herd doesn't re-converge
   at the expiry time of the cached RRset that is used to generate
   answers to the clients.

   TBD.

3.3.  Other mitigations

   With very low authoritative RRset TTLs (such as under 60s) for
   popular questions, the frequency of the thundering herd increases and
   including noise in response TTLs is less effective because the
   maximum TTL to work with is low.  In other words, there is a shorter
   interval over which the thundering herd can be distributed by adding
   noise.  Some implementations permit an operator to set a minimum TTL
   value such that authoritative RRset TTLs with lower values are
   increased and clamped to the minimum TTL value.  This breaks



Sivaraman & Liu         Expires December 27, 2020               [Page 4]

Internet-Draft       The DNS thundering herd problem           June 2020


   currently accepted DNS protocol, and hence this document does not
   make any recommendation about it.

4.  Security Considerations

   There are no security considerations.

5.  IANA considerations

   There are no IANA considerations.

6.  Acknowledgements

   This document was prepared from thundering herd client query patterns
   noticed at resolvers of ISPs and large institutions, which resulted
   in traffic spikes that caused performance issues and lookup failures.
   The authors acknowledge the contribution of Ramesh Damodaran who
   participated in analysis of these patterns.

7.  References

7.1.  Normative references

   [RFC1034]  Mockapetris, P., "Domain names - concepts and facilities",
              STD 13, RFC 1034, DOI 10.17487/RFC1034, November 1987,
              <https://www.rfc-editor.org/info/rfc1034>.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC2181]  Elz, R. and R. Bush, "Clarifications to the DNS
              Specification", RFC 2181, DOI 10.17487/RFC2181, July 1997,
              <https://www.rfc-editor.org/info/rfc2181>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

7.2.  Informative references

   [RRL]      Vixie, P. and V. Schryver, "DNS Response Rate Limiting
              (DNS RRL)", 2012,
              <https://ftp.isc.org/isc/pubs/tn/isc-tn-2012-1.txt>.






Sivaraman & Liu         Expires December 27, 2020               [Page 5]

Internet-Draft       The DNS thundering herd problem           June 2020


Appendix A.  Change history (to be removed before publication)

   o  draft-muks-dnsop-dns-thundering-herd-00
      * Initial draft.

Authors' Addresses

   Mukund Sivaraman
   Akira Systems Private Limited
   1 Coleman Street, #05-05 The Adelphi
   Singapore  179803
   SG

   Email: muks@akira.org
   URI:   https://akira.org/


   Cricket Liu
   Infoblox
   3111 Coronado Drive
   Santa Clara  95054
   US

   Email: cricket@infoblox.com
   URI:   http://www.infoblox.com/


























Sivaraman & Liu         Expires December 27, 2020               [Page 6]