Network Working Group                                     W. Kumari, Ed.
Internet-Draft                                                    Google
Intended status: Informational                           P. Hoffman, Ed.
Expires: December 1, 2014                                 VPN Consortium
                                                            May 30, 2014


                       Distributing the DNS Root
                    draft-wkumari-dnsop-dist-root-00

Abstract

   This document recommends that recursive DNS resolvers transfer the
   root zone, securely validate it and then populate their caches with
   the information.

   [[ Note: This document is largely a discussion starting point. ]]

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on December 1, 2014.

Copyright Notice

   Copyright (c) 2014 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of


Kumari & Hoffman        Expires December 1, 2014                [Page 1]

Internet-Draft          Distributing the DNS Root               May 2014


   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
     1.1.  Requirements notation . . . . . . . . . . . . . . . . . .   3
   2.  Requirements  . . . . . . . . . . . . . . . . . . . . . . . .   3
   3.  Pros and Cons of this Technique . . . . . . . . . . . . . . .   5
     3.1.  Pros  . . . . . . . . . . . . . . . . . . . . . . . . . .   5
     3.2.  Cons  . . . . . . . . . . . . . . . . . . . . . . . . . .   6
   4.  Open Questions  . . . . . . . . . . . . . . . . . . . . . . .   6
     4.1.  Transfer Mechanism  . . . . . . . . . . . . . . . . . . .   6
     4.2.  Transfer Source . . . . . . . . . . . . . . . . . . . . .   7
     4.3.  Channel / Object Security . . . . . . . . . . . . . . . .   7
     4.4.  Load Esitmates  . . . . . . . . . . . . . . . . . . . . .   7
     4.5.  Behavior on Failures. . . . . . . . . . . . . . . . . . .   8
       4.5.1.  Bad Zone Data / Scaling . . . . . . . . . . . . . . .   8
       4.5.2.  Failover to the Next Transfer Server  . . . . . . . .   8
   5.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   8
   6.  Security Considerations . . . . . . . . . . . . . . . . . . .   8
   7.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .   9
   8.  Contributors  . . . . . . . . . . . . . . . . . . . . . . . .   9
   9.  Normative References  . . . . . . . . . . . . . . . . . . . .  10
   Appendix A.  Changes / Author Notes.  . . . . . . . . . . . . . .  10
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  10

1.  Introduction

   One of the main advantages of a DNSSEC-signed root zone is that it
   doesn't matter where you get the data from, as long as you validate
   the contents of the zone using DNSSEC information.

   When a recursive resolver starts up, it has an empty cache and
   addresses of the root servers.  As it begins answering queries, it
   populates its cache by making a number of queries to the set of root
   servers, and caching the results.  This is a somewhat inefficient
   process, and a large number of the queries that hit the root are so
   called "junk" queries, such as queries for second-level domains in
   non-existent TLDs.

   This document is describes a means to populate caches in recursive
   resolvers with the contents of the full root zone so that the
   recursive resolvers have the root zone content cached.  This
   decreases latency for requests to the resolver, increases reliability
   and stability of the DNS, and increases DoS resilience for the root
   servers.


Kumari & Hoffman        Expires December 1, 2014                [Page 2]

Internet-Draft          Distributing the DNS Root               May 2014


   This technique can be viewed as pre-populating a resolver's cache
   with the root zone information, using a transfer operation to do the
   transfer.

1.1.  Requirements notation

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].

2.  Requirements

   [[ Note: We have tried to keep this document easily readable, and to
   drive discussions.  This means that we might be somewhat loose in
   terminology at the moment.  I will firm that up later. ]]

   [[ Note: (Written as a separate note for emphasis!): This document
   proposes one way to do populate caches with the root zone
   information.  It is a starting point - we have made some choices /
   trade offs, and written the doc as though they are the right answer.
   We did this to make reading the document easier - reading a simple
   (but possibly) wrong solution is easier than having multiple "You
   could do X, Y, Z" choices at each point.  There is a section of open
   questions at the end of this document. ]]

   In order to follow these guidelines, a recursive server MUST support
   DNSSEC, and MUST have an up-to-date copy of the DNS root key.

   On startup, recursive servers follow these steps:

   1.  The resolver SHOULD perform a priming query to get the full list
       and addresses of root zone transfer servers.  If a priming query
       is not performed, the resolver MUST have pre-configured knowledge
       of a list of root zone transfer servers, and (for stability
       purposes) that list MUST have at least four servers listed.

   2.  The resolver SHOULD randomly sort the list of answers from the
       priming query.

   3.  The resolver SHOULD attempt to transfer the root zone using AXFR
       from each one of the servers until either success is achieved or
       the list has been exhausted.  If the root zone cannot be
       transferred, the resolver logs this as an error, and falls back
       to "legacy" operation.  The resolver MAY attempt to transfer in
       parallel to minimize startup latency.  The resolver MAY store the
       contents of the root zone to disk.  If the resolver has a stored
       copy of the root zone, and the data in the zone is not expired,
       and that copy was written within the refresh time listed in the


Kumari & Hoffman        Expires December 1, 2014                [Page 3]

Internet-Draft          Distributing the DNS Root               May 2014


       zone, the resolver MAY and load that zone instead of
       transferring.

   4.  The resolver MUST validate the records in the zone using DNSSEC
       before relying on any of the records.  If any of the records do
       not validate, the resolver MUST log an error and SHOULD try the
       next server in the list.

   Until the server has transferred (and validated) the zone, it MUST
   NOT act as though it is a copy of the root zone.  Once the resolver
   has transferred and validated the zone, it MUST act as though it is a
   copy of the root zone.  This includes following the refresh, retry,
   expire logic, with certain modifications:

   1.  If the zone expires (for example, because it cannot retransfer
       because of blocked TCP connections), it MUST fall back to
       "legacy" operation and MUST log an error.  It MUST NOT return
       SERVFAIL to queries simply because its copy of the root zone
       expired.

   2.  The resolver MUST validate the contents of the records in the
       zone using DNSSEC for every transfer.  The resolver SHOULD try
       alternate servers if the validation fails.  If the resolver is
       unable to transfer a copy of the zone that validates, it MUST
       treat this as an error, MUST discard the received records, and
       fail back to "legacy" operation.  The resolver SHOULD attempt to
       restart this process at every retry interval for the root zone.

   3.  The resolver SHOULD set the AD bit on responses to queries for
       records in the root zone.  This action is the same as if it had
       inserted the entry into its cache through a "normal" query.

   4.  The resolver MUST validate all of the zone contents, and MUST NOT
       start using the new contents until all have been validated; the
       resolver MUST NOT use "lazy validation".  This means that the
       replacement of the existing zone data with the refreshed data
       MUST be an atomic operation.

   Compliant nameservers software MUST include an option to securely
   cache the root zone (an example name for this option could be
   "transfer-and-validate-root [yes|no]").  That is, the mechanism
   described in this document MUST be optional, and the cache operator
   MUST be able to turn it off and on.

   [[ Note: TODO: define "legacy operation" - this basically mans "just
   how things operate now; you go ask a root server where each TLD is.
   ]]


Kumari & Hoffman        Expires December 1, 2014                [Page 4]

Internet-Draft          Distributing the DNS Root               May 2014


   [ Ed: This fallback to legacy operation solution might only work
   until most people are doing this.  As the number of folk querying the
   root directly decreases, the scale of the root will presumably
   decrease.  Once this happens, if there is a large failure and
   everyone falls back to "legacy" operation, will the root still be big
   enough to cope with the load?  Should we address this in this
   document (e.g: After 10 years from today, the "fallback to legacy"
   option should be disabled")?  Or just note this and suggest that a
   new document be written, updating this one and disabling the root
   fallback? ]

3.  Pros and Cons of this Technique

   [[ Note: This section likely to be removed or significantly revised
   before publication. ]]

   This is primarily a tracking / discussion section, and the text is
   kept even looser than in the rest of this doc.  These are not
   ordered.

3.1.  Pros

   o  Decrease in latency to the client - The recursive resolver already
      knows about all the TLDs and all of their information, so the
      first query for a particular TLD will always be faster.

   o  DoS against the root servers - By distributing the root to many
      recursive resolvers, the DoS protection for the root servers is
      significantly increased.  A DDoS may still be able to take down
      some recursive servers, but there is no root infrastructure to
      attack.  Of course, there is still a zone distribution system that
      could be attacked (but it would need to be kept down for a much
      longer time to cause significant damage, and so far the root has
      stood up just fine to DDoS.

   o  No central monitoring point (see also Cons!) - This proposal
      provides a small increase to privacy of requests, and removes a
      place where attackers could collect information.  Although query
      name minimization also achieves some of this, it does still leak
      the TLDs that people behind a resolver are querying for, which may
      in itself be a concern (for example someone in a homophobic
      country who is querying for a name in .gay).

   o  Junk queries / negative caching - Currently, a significant number
      of queries to the root servers are "junk" queries.  Many of these
      queries are TLDs that do not (and may never) exist in the root
      Another significant source of junk is queries where the negative
      TLD answer did not get cached because the queries are for second-


Kumari & Hoffman        Expires December 1, 2014                [Page 5]

Internet-Draft          Distributing the DNS Root               May 2014


      level domains (a negative cache entry for "foo.example" will not
      cover a subsequent query for "bar.example").

   o  More use of DNSSEC - In order for a recursive resolver to use this
      system, it needs to fully deploy DNSSEC.  Many large ISP-run
      resolvers do so today, but many smaller resolvers do not.  This
      might be the impetus for them to do so.

3.2.  Cons

   o  No central monitoring point (also see Pros!) - DNS operators lose
      the ability to monitor the root system.  While there is work
      underway to implement better instrumentation of the root server
      system, this (potentially) removes the thing to monitor.

   o  Loss of agility in making root zone changes - Currently, if there
      is an error in the root zone (or someone needs to make an
      emergency change), a new root zone can be created, and the root
      server operators can be notified and start serving the new zone
      quickly.  Of course, this does not invalidate the bad information
      in (long TTL) cached answers.  Notifying every recursive resolver
      is not feasible.

   o  Increased complexity in nameserver software and their operations -
      Any proposal for recursive servers to copy and serve the root
      inherently means more code to write and execute.  Note that many
      recursive resolvers are on inexpensive home routers that are
      rarely (if ever) updated.

   o  Changes the nature and distribution of traffic hitting the root
      servers - If all the "good" recursive resolvers deploy root
      copying, then root servers end up servicing only "bad" recursive
      resolvers and attack traffic.  The roots (could) become what AS112
      is for RFC1918.

4.  Open Questions

   [[ Lots of food for thought here. ]]

4.1.  Transfer Mechanism

   The current document uses AXFR as the way to get the zone.  This may
   be not be the best way to transfer the data.  AXFR is an easy way to
   explain what we are trying to achieve, and everyone in the DNS world
   is familiar with transferring a copy of a zone with AXFR.  There are
   many technologies that might be better for distributing this type of
   data to lots of locations.  A short list of alternatives includes


Kumari & Hoffman        Expires December 1, 2014                [Page 6]

Internet-Draft          Distributing the DNS Root               May 2014


   FTP, HTTP, and BitTorrent.  The whole point of DNSSEC is that it
   doesn't matter where the data comes from.

4.2.  Transfer Source

   We need a source for the data.  Currently, some of root operators
   allow open AXFR (B, C, F, G, K), and IANA provides a service as well.
   Should we continue to use the root servers as a source, or should
   there be a new infrastructure created for getting copies of the full
   root zone?  Will the current set of operators / nodes be willing /
   able to scale to the number of transfers?  Will additional letters be
   willing to enable AXFR?  What if we changed the transfer mechanism?
   Should we stand up a new service?

4.3.  Channel / Object Security

   Currently the root zone is signed.  Unfortunately the way DNSSEC
   works, it only signs the authoritative information in the zone, and
   non-authoritative information, particularly glue records, are not
   signed.

   Does this matter?

   o  No.  The non-authoritative information is not signed in the
      current design.  All that the "copy the root zone" idea does it
      pre-populate the cache en mass, and so we should do exactly what
      we currently do.

   o  Yes. It would be good to be able to get all the zone information
      from anywhere.  An attacker might be stripping or modifying the
      non-authoritative information.

   An option that has been mentioned would be to wrap the AXFR transfers
   in SIG(0), but this has serious load implications for the transfer
   servers.  A simple solution would be to sort the records into a
   canonical format, make a hash of that, and then append and sign the
   result.  This would require a new protocol, but it is something that
   has been done many times in areas outside the DNS.

4.4.  Load Esitmates

   [[ Note: these are quick, on the back of an envelope calculations.
   They could be very wrong. ]]

   People estimate that there are roughly 180,000 "real" recursive
   servers that talk to the root server.  To account for restarts, we'll
   call it 200,000.  We like to keep something like the current agility
   of the root, so we should try transfer twice a day.  This is 400,000


Kumari & Hoffman        Expires December 1, 2014                [Page 7]

Internet-Draft          Distributing the DNS Root               May 2014


   transfers per day, or less than 5 transfers per second.  The root
   zone is currently around 550KB.  At 5qps this is roughly 21Mbps.
   That's quite a low number relative to what the root servers currently
   serve.  Yes, the root zone is growing in size, but even a few orders
   of magnitude is still reasonable.

   While this could be handled by a single box, it (obviously!)
   shouldn't be.  We still need DoS protection, redundancy, overhead,
   etc - but as a scaling number this is interesting.

4.5.  Behavior on Failures.

   This is actually 2 questions:

4.5.1.  Bad Zone Data / Scaling

   Once most recursive servers start using this, the load on the root
   will be significantly less and / or different.  This means that the
   root might no longer be adequately scaled to deal with *everyone*
   suddenly querying it if there is a bad root zone pushed out.  This is
   a longer term issue, but how should we address it?  After N years
   remove the legacy fallback?  What is N?  Or, in a few years someone
   writes a new doc that updates this one and removes the legacy
   fallback?

4.5.2.  Failover to the Next Transfer Server

   If you try transfer from a transfer server and get bad data, you
   should try another one -- but, how do we avoid causing a DoS if a bad
   root zone is pushed out?  We could solve this with something like
   "try the next N server, then start exponential backoff, capping at
   M".  This seems like it might be another form of an existing issue -
   what happens if someone published a bad DS in the root for a very
   popular TLD - does everyone start hammering on the door demanding
   better data?

5.  IANA Considerations

   Currently this document requires no action from the IANA.  Depending
   on some of the Open Questions discussions this may change.

6.  Security Considerations

   [[ Note: This needs to be filled in more when there is agreement on
   the actual mechanism. ]]


Kumari & Hoffman        Expires December 1, 2014                [Page 8]

Internet-Draft          Distributing the DNS Root               May 2014


7.  Acknowledgements

   The editors fully acknowledge that this is not a new concept, and
   that we have chatted with many people about this.  If we have spoken
   to you and your name is not listed below, let us know.

8.  Contributors

   The general concept in this document is not new; there have been
   discussions regarding recursive resolvers copying the root zone for
   many years.  The fact that the root zone is now signed with DNSSEC
   makes implementing some of these techniques more feasible.

   The following is an unordered list of individuals have contributed
   text and / or significant discussions to this document.

      Steve Crocker - Shinkuro

      Jaap Akkerhuis - NLnet Labs

      David Conrad - Virtualized, LLC.

      Lars-Johan Liman - Netnod

      Suzanne Woolf - Individual

      Roy Arends - Nominet

      Olaf Kolkman - NLnet Labs

      Danny McPherson - Verisign

      Joe Abley - Dyn

      Jim Martin - ISC

      Jared Mauch - NTT America

      Rob Austien - Dragon Research Labs

      Sam Weiler - Parsons

      Duane Wessels - Verisign


Kumari & Hoffman        Expires December 1, 2014                [Page 9]

Internet-Draft          Distributing the DNS Root               May 2014


9.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

Appendix A.  Changes / Author Notes.

   [RFC Editor: Please remove this section before publication ]

   Initial to -00

   o  Text!

Authors' Addresses

   Warren Kumari (editor)
   Google
   1600 Amphitheatre Parkway
   Mountain View, Ca  94043
   US

   Email: Warren@kumari.net


   Paul Hoffman (editor)
   VPN Consortium

   Email: paul.hoffman@vpnc.org


Kumari & Hoffman        Expires December 1, 2014               [Page 10]