Network Working Group                                          V. Pappas
Internet-Draft                                                      UCLA
Expires: August 5, 2006                                         B. Zhang
                                                    Colorado State Univ.
                                                            E. Osterweil
                                                                    UCLA
                                                               D. Massey
                                                    Colorado State Univ.
                                                                L. Zhang
                                                                    UCLA
                                                           February 2006


         Improving DNS Service Availability by Using Long TTLs
                     draft-pappas-dnsop-long-ttl-01

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on August 5, 2006.

Copyright Notice

   Copyright (C) The Internet Society (2006).

Abstract

   Due to the hierarchical tree structure of the Domain Name System


Pappas, et al.           Expires August 5, 2006                 [Page 1]

Internet-Draft     Improving DNS Service Availability      February 2006


   [RFC1034][RFC1035], losing all of the authoritative servers that
   serve a zone can disrupt services to not only that zone but all of
   its descendants.  This problem is particularly severe if all the
   authoritative servers of the root zone, or of a top level domain's
   zone, fail.  Although proper placement of secondary servers, as
   discussed in [RFC2182], can be an effective means against isolated
   failures, it is insufficient to protect the DNS service against a
   distributed denial of service attack (DDoS).  This document proposes
   to mitigate the impact of DDoS attacks against top level DNS servers
   by setting long TTL values for NS records and their associated A
   records.  Our proposed changes are purely operational and can be
   deployed incrementally.  Our analysis shows that this simple
   operational tuning has a small impact on DNS performance but can
   significantly reduce the impact felt by client resolvers as a result
   of a successful DDoS attacks on the DNS service.


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
     1.1.  Terminology  . . . . . . . . . . . . . . . . . . . . . . .  4
   2.  Infrastructure RRsets Definitions and Conflicts  . . . . . . .  5
     2.1.  Infrastructure Records and DNS Caching . . . . . . . . . .  6
     2.2.  Infrastructure RRset Conflicts . . . . . . . . . . . . . .  6
   3.  Setting Long Infrastructure TTLs . . . . . . . . . . . . . . . 10
     3.1.  Cases of Secondary Servers outside the Zone  . . . . . . . 10
     3.2.  Intuition  . . . . . . . . . . . . . . . . . . . . . . . . 11
     3.3.  Handling Name Server Changes . . . . . . . . . . . . . . . 11
     3.4.  Impact on Cache Memory Size  . . . . . . . . . . . . . . . 13
   4.  Measurement Results on Infrastructure RRSets Changes . . . . . 14
   5.  Effectiveness of Long TTL on Zone's Availability . . . . . . . 15
     5.1.  Further Enhancement Through Prefetching  . . . . . . . . . 16
   6.  Backwards Compatibility  . . . . . . . . . . . . . . . . . . . 17
   7.  Security Considerations  . . . . . . . . . . . . . . . . . . . 18
   8.  Recommendations  . . . . . . . . . . . . . . . . . . . . . . . 19
   9.  Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 20
   10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 21
   Intellectual Property and Copyright Statements . . . . . . . . . . 22


Pappas, et al.           Expires August 5, 2006                 [Page 2]

Internet-Draft     Improving DNS Service Availability      February 2006


1.  Introduction

   [RFC2182] provides operational guidelines for selecting and operating
   authoritative servers to maximize a zone's availability.  Proper
   placement of authoritative servers can be an effective means to guard
   DNS service against unintentional failures or errors, but it cannot
   effectively protect DNS services against intentional attacks.  A
   distributed denial of service attack could target all of the
   authoritative servers for a zone, regardless of where they are
   placed.  By disabling all of a zone's authoritative servers, an
   attacker can disrupt service for that zone and all the zones below
   it.  In particular, attacks against domains such as the root, generic
   top level domains (gTLDs), country code top level domains (ccTLDs),
   and other zones serving popular DNS domains (such as co.uk. or
   co.jp.) could have a severe global impact.  For example, knocking out
   all of the root zone servers may effectively render the entire
   Internet unreachable.  Successful attacks against all authoritative
   servers for a large generic top level domain (gTLD) such as "com."
   can also impact availability for tens of millions of DNS zones.

   DNS caching can effectively help mitigate the impact of denial of
   service attacks.  A caching resolver only consults an authoritative
   server if the requested data is not already present in the cache.
   The cache contains both specific records such as www.example.com and
   infrastructure records such as the name servers for example.com.  In
   this document, we focus primarily on the caching of infrastructure
   records (defined formally in the next section) and show how setting
   long TTLs on these records can help mitigate the impact of DDoS
   attacks.  For example, consider the case of a successful attack
   against all of the DNS root servers and suppose all root servers are
   unavailable for some time period P. Despite the attack, resolvers can
   still access commonly used gTLDs and ccTLDs as long as these NS
   records and their corresponding A/AAAA resource record sets (RRsets)
   remain in a locally available cache during the period P. Generally
   speaking, access to the root servers is only used for looking up top
   level domain entries that are not presently available in the cache.
   Similar arguments apply to attacks against servers of other top level
   domains, or any DNS domain for that matter.  If the NS and associated
   A/AAAA RRSets for a domain are cached, an attack against higher level
   domains will have little or no impact on descendant domains.

   Based on the above observation, this document suggests an operational
   change regarding the setting of the TTL value for NS resource record
   sets and the A and AAAA resource records associated with these NS
   records.  Throughout the remainder of the draft, we refer to these
   types of records as "infrastructure resource record sets" or simply
   "infrastructure RRsets" and infrastructure records are discussed more
   fully in later sections.  As with all DNS RRsets, the cache lifetime


Pappas, et al.           Expires August 5, 2006                 [Page 3]

Internet-Draft     Improving DNS Service Availability      February 2006


   for these infrastructure RRsets is determined by time to live (TTL)
   field which is typically set to a value between a small number of
   hours to two days.  This draft recommends the use of a significantly
   longer TTL value (such as one week) for infrastructure RRsets in
   order to improve the DNS service's availability in the event of a
   successful attack or an unexpected correlated failure.

   This change is feasible because of the relatively stable nature of
   infrastructure RRsets, and the DNS's tolerance for occasional partial
   discrepancies in these RRsets.  The recommendation for a longer TTL
   value in this draft applies only to DNS infrastructure RRsets; other
   RRsets such as those for end hosts should continue to use whatever
   TTL values that local administrators deem appropriate to meet the
   need of their dynamic changes.

   Currently, some of the root and TLD servers use shared unicast
   addresses [RFC3258] to improve availability during denial of service
   attacks.  This approach can be effective when the number of
   replicated servers is large, however the interactions between shared
   unicast addresses and BGP routing dynamics are still not fully
   understood.  Furthermore, the use of shared unicast addresses
   requires one entry in the global BGP routing for each protected zone.
   Therefore, it may not be a generic solution for protecting a large
   number of zones.  In contrast, our proposal for using a long TTL for
   infrastructure RRsets to mitigate the impact of DDoS attacks is much
   simpler in operation, does not require any additional hardware
   support, and can also be applied to any DNS domains which desire high
   availability in face of top level DNS service failures.  One can also
   combine the use of long TTL values for infrastructure RRsets with the
   shared unicast address approach to further enhance DNS' availability.

   We describe the exact mechanisms of our proposal in Section 2, and
   some related technical and operational issues that we have identified
   in Section 3.  Section 4 discusses potential impacts on DNS security,
   and Section 5 presents a specific recommendation for setting TTL
   values for infrastructure RRsets.

1.1.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].


Pappas, et al.           Expires August 5, 2006                 [Page 4]

Internet-Draft     Improving DNS Service Availability      February 2006


2.  Infrastructure RRsets Definitions and Conflicts

   The DNS contains two very distinct types of data; general DNS data
   records and infrastructure resource records.  However the DNS design
   did not explicitly distinguish between general data records and
   infrastructure records.  As a result, the rules for identifying
   infrastructure resource records are somewhat complex.  As a general
   rule of thumb, an infrastructure record is used by a resolver to
   navigate across a delegation.  Applying this general rule allows us
   to distinguish between general records and infrastructure records as
   follows:

   All NS resource records are infrastructure records since a resolver
   uses NS RRsets to navigate between delegations.  Similarly, the A
   RRsets associated with name servers (more precisely associated with
   names in the data portion of a NS RR) are also considered
   infrastructure records.  An iterative resolver could not navigate
   from example.com to subzone.example.com without these records.  All
   other resource record types defined in [RFC1035] are general data
   records.

   As the DNS evolves, a large number of new general data records have
   been added such as SRV resource records, LOC resource records, and so
   forth.  A small number of new infrastructure resource records have
   also been added.  In particular, AAAA (IPv6) records are associated
   with name servers, and are considered infrastructure records; an IPv6
   resolver needs to know AAAA RRs in order to navigate from example.com
   to subzone.example.com.  DNS Security Extensions [RFC4034] also
   introduced several new infrastructure records.  The DS RR is an
   infrastructure record since it is needed by security aware resolvers
   to navigate between zones.  DNSKEY RRs are infrastructure records if
   they are used to match DS RRs, or are configured as trust anchors in
   resolvers.  Similarly, NSEC RRs that are associated with a delegation
   name are also infrastructure records since they may be used to
   indicate the lack of DS (or DNSKEY RR) and thus play a role in
   securely navigating between zone.  Finally, RRSIGs are infrastructure
   records if they sign an infrastructure RRset.

   In summary, NS and DS resource records are always infrastructure
   records.  The A and AAAA resource records are infrastructure records
   if and only if the name associated with the A or AAAA RR exactly
   matches a name in the data portion of some NS RR.  An NSEC RR is an
   infrastructure RR if and only if its owner name is a delegation.  A
   DNSKEY RR is an infrastructure record if and only if it matches a DS
   RR or is configured as a trust anchor in some resolver.  An RRSIG is
   an infrastructure record if and only if it signs an infrastructure
   RRset.  All other resource records defined at the time of this draft
   are general data records.


Pappas, et al.           Expires August 5, 2006                 [Page 5]

Internet-Draft     Improving DNS Service Availability      February 2006


2.1.  Infrastructure Records and DNS Caching

   Infrastructure records play a large role in DNS caching.  An end host
   typically sends DNS queries to a local caching resolver.  If an exact
   match for a query is not found in the cache, the caching resolver
   uses cached infrastructure records to determine where to start an
   iterative search.  Initially the cache will check whether the
   infrastructure records (e.g. the NS RRset and corresponding A RRsets
   in the case of IPv4 deployment without DNSSEC) for the requested zone
   are present in the cache.  If the infrastructure records for the zone
   are not found, the cache checks for infrastructure records of the
   zone's predecessors.  The cache then begins its sequence of iterative
   queries by first contacting the nearest ancestor that is in the cache
   (e.g. the zone itself in the best case and DNS root in the worst
   case).

   For example, suppose a caching resolver wants to obtain the TXT RRSet
   for www.subzone.example.com.  The caching resolver first searches its
   local cache for the requested www.subzone.example.com TXT RRset and
   simply returns this answer if it is found.  If this is not found, the
   caching resolver searches its local cache for the subzone.example.com
   NS RRSet.  If both this NS RRSet and its corresponding A/AAAA RRsets
   are found, the caching resolver directly contacts these servers.  If
   no NS RRsets for subzone.example.com are found in the cache, the
   cache resolver will searches its local cache for the example.com NS
   RRset and begin the iterative query at these servers if they are
   found in the cache.  If no NS RRsets for example.com are found, the
   cache resolver will search its local cache for com NS RRsets and
   begin the iterative query at this point.  Finally, if no com NS
   RRsets are found in the cache, the caching resolver will begin its
   iterative query at the root servers.

   From the above description we can see that, once the infrastructure
   RRsets for a zone are cached in a local resolver, it can go directly
   to the zone server to resolve DNS queries, even when the higher level
   DNS servers are unavailable.  As stated in [Mock88], a high TTL value
   not only minimizes DNS traffic but also "allows caching to mask
   periods of server unavailability due to network or host problems."
   Following this hint, we propose a simple operational tuning of the
   TTL values of the infrastructure RRsets toward longer values.  We
   show how this minimizes the dependency on top level zones and
   increases the overall availability of DNS zones, while still
   maintaining acceptable operational policies.

2.2.  Infrastructure RRset Conflicts

   One unfortunate complication is that infrastructure records often
   appear on both sides of DNS delegation.  In other words, the records


Pappas, et al.           Expires August 5, 2006                 [Page 6]

Internet-Draft     Improving DNS Service Availability      February 2006


   appear in both the parent and child zone.  For example, a non-
   authoritative copy of subzone.example NS RRset is stored in the
   example zone and an authoritative copy of the subzone.example NS
   RRset is stored in the subzone.example zone.  When an infrastructure
   RRset appears in multiple zones, ideally all RRsets contain the same
   data.  [RFC1034] states "The administrators of both (parent and
   child) zones should insure that the NS and glue RRs which mark both
   sides of the cut are consistent and remain so."  However, this
   mandate is very optimistic, and is not always satisfied in real
   operations.  This reality poses a problem that is particularly acute
   in the context of this draft.  We propose to increase the TTL
   associated with infrastructure records such as the NS RRsets, but
   this change has different scopes if the parent and child zones do not
   represent the same TTL for the same RRset.  For example, should one
   increase the TTL of the example.com NS RRset stored in the
   example.com zone or increase the TTL of the example.com NS RRset
   stored in the .com zone, or increase both?  To address this question,
   we first consider how resolvers deal with the multiple copies of a
   particular RRset.

   A caching resolver may cache the non-authoritative NS RRset stored at
   the parent or cache the authoritative copy from the zone itself.
   Each set may have its own TTL that is determined by the zone storing
   the data.  The parent zone operator sets the TTL for the non-
   authoritative copy stored at the parent zone.  The zone operator sets
   the TTL for the authoritative copy stored at the zone itself.  A
   caching resolver must not combine the two NS RRsets.  When presented
   with a choice of which copy to use, a cache should always prefer the
   authoritative copy of the NS RRset over any non-authoritative
   copy[RFC2181], but a cache may not always encounter the authoritative
   copy.

   A caching resolver first relies on the NS RRset stored at the parent.
   For example, a caching resolver first fetches the subzone.example NS
   RRset from example zone.  This non-authoritative NS RRset stored at
   the parent is needed to reach the zone and is stored in the local
   cache.  The lifetime of this non-authoritative copy of the NS RRset
   depends on the TTL set in the parent zone.

   The non-authoritative NS RRset from the parent is replaced by the
   actual authoritative version that is stored in the zone itself only
   if one of the following happens: First, the caching resolver can
   explicitly query for subzone.example servers and request the
   subzone.example NS RRset.  This ensures the cache obtains the
   authoritative copy, but is rarely done in practice.  Second, the
   caching resolver may query the zone's name servers for a record that
   is currently stored in the zone, and the server returns the requested
   record together with a lists the authoritative NS RRset in the


Pappas, et al.           Expires August 5, 2006                 [Page 7]

Internet-Draft     Improving DNS Service Availability      February 2006


   additional section of the response.  For example, suppose the
   resolver requests the www.subzone.example TXT RRset.  The caching
   resolver initially queries the example servers to learn the (non-
   authoritative) subzone.example NS RRset and the resulting
   subzone.example NS RRset from the example zone is stored in the cache
   using a TTL selected by the example zone administrator.  The caching
   resolver next queries a subzone.example server to learn the
   www.subzone.example TXT RRset.  If this RRset exists, the response
   includes the www.subzone.example TXT RRset in the answer section and
   includes the (authoritative) copy of the subzone.example NS RRset in
   the additional section.  In this way, the caching resolver learns the
   authoritative copy of the NS RRset and this authoritative copy has a
   TTL set by subzone.example administrator.

   Note that if a zone is used purely for referrals, then the caching
   resolver never learns the authoritative NS RRset for the zone.  For
   example, suppose the resolver requests the www.sub2.subzone.example
   TXT records.  The resolver first queries the example servers and
   learns the (non-authoritative) subzone.example NS RRset stored at the
   example zone.  The caching resolver next queries the subzone.example
   servers and learns the (non-authoritative) sub2.subzone.example NS
   RRset stored at the subzone.example zone.  Finally the caching
   resolver queries the sub2.subzone.example servers are learns the
   www.sub2.subzone.example RRset and learns the authoritative
   sub2.subzone.example NS RRset (from the additional section, assuming
   www.sub2.subzone.example TXT RRset exists).  During this entire
   process, the resolver never learns the authoritative copy of the
   subzone.example NS RRset.  The TTL set by the subzone.example
   administrator makes no difference, only the TTL set by the example
   administrator has an impact on this caching resolver.

   Finally we note that a server may be authoritative for both a zone
   and one of its ancestors and thus further complicate which copy of
   the NS RRset is stored at a cache.  For example, suppose a server is
   authoritative for both the .example and sub2.subzone.example zones.
   The example above now works as follows.  The caching resolver first
   queries the example servers, but this server does not provide any
   referrals since this server is authoritative for both .example and
   sub2.subzone.example.  The server replies with the
   www.sub2.subzone.example RRset and includes the authoritative
   sub2.subzone.example NS RRset (from the additional section, assuming
   www.sub2.subzone.example TXT RRset exists).

   Similar reasoning applies to other infrastructure records that are
   stored in multiple places.  For example, non-authoritative copies of
   infrastructure A and AAAA records are often encountered by resolvers.
   A cache should always prefer authoritative answers when available.
   But whether a cache obtains the authoritative or non-authoritative


Pappas, et al.           Expires August 5, 2006                 [Page 8]

Internet-Draft     Improving DNS Service Availability      February 2006


   version depends on the sequence of queries as illustrated above.


Pappas, et al.           Expires August 5, 2006                 [Page 9]

Internet-Draft     Improving DNS Service Availability      February 2006


3.  Setting Long Infrastructure TTLs

   To reduce the dependency on top level DNS servers, and hence increase
   the availability of a zone, we recommend that DNS zone operators
   substantially increase the TTL values of their zones' infrastructure
   RRsets.  In other words, the long TTL value should be set on the
   authoritative copy of the NS RRset and any related A or AAAA RRsets
   present in the zone (authoritative or not) that correspond to names
   listed in the NS RRset.

   Given that the TTL value is part of the RRs, we recommend that the
   non-authoritative copies of the infrastructure RRsets stored at the
   parent zone also be assigned the long TTL value.  This recommendation
   is especially important for those zones that mainly provide referral
   answers for their children zones, rather than answers for records
   stored by them.  For example, TLD zones mainly provide referrals for
   their delegated zones.  As general guide, we suggest the TTL value of
   a non-authoritative record be no longer than the TTL at the
   authoritative copy.  This presumes the authoritative copy has
   implemented the long TTL recommendation and has selected the longest
   possible TTL value given the expected dynamics of this RRset.  Note
   also that the authoritative answers of the NS and associated A RRsets
   from the zone itself are preferred over any copy stored at the
   parent.  Thus, a shorter TTL value set by the parent zone will not
   reduce the effectiveness of the long TTL values set by the child
   zone, provided a resolver learns the authoritative version.

3.1.  Cases of Secondary Servers outside the Zone

   The following common case in DNS configuration deserves a special
   explanation.  When a zone's name server for foo.example is located
   inside the zone, the operator for foo.example can configure the TTL
   for both the NS RRset and the A/AAAA records to a long time period.
   However some of foo.example's authoritative servers may be located in
   other domains, as illustrated in the following NS RRset:

   foo.example.  NS ns1.foo.example.
   foo.example.  NS ns2.foo.example.
   foo.example.  NS ns.bar.example2.

   The foo.example zone is authoritative for the A and AAAA RRsets at
   both ns1.foo.example and ns2.foo.example, and can set a longer TTL
   value for their NS records and associated A records.  However the TTL
   value of the third server is configured by the bar.example2 zone,
   which may or may not be set to the longer value.  Nevertheless a
   short TTL for the A record of the third server should not have a big
   impact, because when the parent zone of foo.example is unavailable,
   the A record of the third server may still be resolved even when it


Pappas, et al.           Expires August 5, 2006                [Page 10]

Internet-Draft     Improving DNS Service Availability      February 2006


   is not in the local cache, because the outage of the example zone
   does not necessarily imply the failure of the bar.example2 zone.
   This example also illustrates the benefit of locating secondary
   servers under different branches of DNS tree.

3.2.  Intuition

   The motivation for extending the TTLs on infrastructure RRsets is
   partially derived from the general caching model used by the DNS.
   With the DNS' long-standing use of caching it is very easy to imagine
   longer TTL values as just an emphasis on the DNS' data being more
   stable (i.e. the infrastructure RRsets don't change very often, so
   they can be cached for longer).

   There are practical limitations to increasing the TTL value of
   infrastructure RRsets.  For example, current implementations of BIND,
   and other DNS server distributions, limit the maximum TTL used for
   RRsets.  Therefore, extending the TTL on RRsets may still encounter
   limitations after being served (i.e. in the client's cache).  In
   addition, the interactions with DNSSEC must be taken into account.
   For example, DNSSEC's key roll-over process is partially a function
   of an RRset's TTL.  Therefore, a long TTL may extend the roll-over
   period.  See draft for more details.

   As a result of the above considerations, 1 week seems to embody a
   long enough period to greatly augment the DNS availability in the
   face of an outage, and still a short enough period to avoid
   undesirable interactions with server implementations or DNSSEC
   signature lifetimes and policies.

3.3.  Handling Name Server Changes

   A primary concern of an increased TTL value is data consistency.  DNS
   servers do change from time to time, new servers are added, existing
   servers' IP addresses are changed or get removed due to network
   reconfigurations.  Such changes can lead to inconsistencies between
   the cached Infrastructure RRsets for DNS servers and the actual name
   servers.

   When changes in DNS name servers or their IP addresses do occur, the
   following operational practices should be followed.  First, as stated
   in [RFC1034], "If a change can be anticipated, the TTL can be reduced
   prior to the change to minimize inconsistency during the change, and
   then increased back to its former value following the change."
   Second, a planned change should involve a grace period.  When the
   information in authoritative DNS servers has been modified, the
   obsolete nameserver and/or obsolete IP address should continue
   answering queries for at least the TTL period, during which the


Pappas, et al.           Expires August 5, 2006                [Page 11]

Internet-Draft     Improving DNS Service Availability      February 2006


   cached information can still be used to resolve DNS queries.

   The prescribed TTL adjustment and graceful transition represent the
   ideal handling of DNS server changes, but they may not always be
   possible.  In cases where unexpected changes happen, some caches will
   inevitably contain invalid nameserver information for a zone.
   However, DNS can operate effectively even when some authoritative
   servers may not be reachable.  As long as not all the servers for a
   zone have changed during the TTL period, the zone will continue to be
   accessible even by those resolvers who have cached the now partially
   obsolete zone data.  By continuing to operate at least a single
   server from the original set, during the TTL period, queries that use
   cached data will still be answered, even when the data for the
   changed server is obsolete.

   We should also note that, after some zone server changes, when the
   query is answered by a working authoritative server, this server can
   include the updated NS RRset in the authoritative section of the
   reply.  Such an inclusion will override the obsolete RRset that is
   cached at the caching resolver.  Thus the only penalty paid by a
   caching server is possibly a longer resolution time for the first
   query issued after the DNS server changes, if that query goes to one
   or more no-longer-existing servers before hitting a working one.

   One effective way to assure DNS availability in the face of
   unexpected changes is for each zone to set up an adequate number of
   secondary servers in diverse locations.  In the earlier example, when
   ns1.foo.example suddenly failed and had to be reinstalled on a
   different host, although the cached data for ns1.foo.example can stay
   in some resolvers for a long time before it gets flushed out of the
   cache, queries for foo.example zone can still be served by the
   remaining servers.  This remains true as long as not all the RRs in
   the NS or A/AAAA RRset change at the same time.  The only negative
   impact is a longer query time in the event that a cache resolver
   first sends a query to ns1.foo.example (the recently failed server).
   In this case, the query will timeout after a few second and the
   resolver will try the next server and succeed.

   Overall, if DNS operators place secondary servers in appropriate
   locations and follow the above rules (pertaining to the updating of
   infrastructure RRsets and in managing server changes), a long TTL
   value should have little negative impact on DNS performance.  We also
   conducted measurements over a large set of randomly chosen DNS zones
   to gauge the frequency of zone server changes in the current DNS
   system.  As described in the next section, our results show that the
   majority of DNS zones do not change their NS RRset and the associated
   A/AAAA records frequently.  This observation provides further support
   for the feasibility of a long TTL value for the infrastructure


Pappas, et al.           Expires August 5, 2006                [Page 12]

Internet-Draft     Improving DNS Service Availability      February 2006


   RRsets.

3.4.  Impact on Cache Memory Size

   Introducing longer TTLs has the potential to result in an increase in
   the caching server's memory requirements.  We believe that this is
   not an issue, with the current typical hardware.  For example, if the
   working set of a very popular caching server is 10 million zones
   (around 10% of the World's DNS zones), and assuming that each zone's
   infrastructure records take less than 100 bytes of memory, then the
   memory requirements will be under one gigabyte of memory, .


Pappas, et al.           Expires August 5, 2006                [Page 13]

Internet-Draft     Improving DNS Service Availability      February 2006


4.  Measurement Results on Infrastructure RRSets Changes

   The previous section proposed to set the TTL values of the
   infrastructure RRsets to a long period.  A long TTL value for
   infrastructure RRsets implies that each zone has a stable set of DNS
   servers.  To assess the stability of currently deployed DNS servers,
   we conducted a measurement study.  From a crawl over 15 million DNS
   zones (the crawl was initiated at DMOZ.ORG), we randomly selected
   100,000 zones and measured their infrastructure RRsets over a 4-month
   period.

   During this 4-month period we queried each of the 100,000 zones twice
   a day to obtain its infrastructure RRset.  Our data shows that 75% of
   the measured zones did not change either the NS or corresponding A
   RRSets during the entire study period. 11% of the zones showed
   changes to their NS RRset during this 4-month period, and 5% of the
   zones made the changes in less than 2 months.  The A records of all
   the measured zone servers had more changes than the NS RRsets: 22% of
   the zones had their servers' A records changed within 4 months, and
   10% of the zones made servers' A record changes in less than 2
   months.  All in all, our measurement results show that the current
   DNS servers, in the majority of the zones, are very stable.  Even
   those servers that made changes during our measurement period show
   that their DNS server changes are rather infrequent.  We believe
   that, with special care, the changes to DNS servers can be further
   reduced, and that a TTL value of 1 week is indeed feasible for
   infrastructure RRsets.


Pappas, et al.           Expires August 5, 2006                [Page 14]

Internet-Draft     Improving DNS Service Availability      February 2006


5.  Effectiveness of Long TTL on Zone's Availability

   The following is a quick, back-of-envelope, calculation of the
   increased zone availability that would result from increasing the TTL
   value of an infrastructure RRset.  Assume foo.example is a popular
   zone and its infrastructure RRset (with a TTL of 4 hours) tends to be
   cached in many cache resolvers.  If a DDoS attack takes the example
   zone out of service for 2 hours, then on average 50% of the cache
   resolves will evict the foo.example zone's infrastructure RRset (due
   to expiration) by the end of the 2 hours.  This would leave them
   unable to resolve foo.example or any name under it.  If we increase
   the TTL value of foo.example's infrastructure RRset to 1 day, then
   during a two hour outage of the example zone, only 1/12, or 8% of the
   cache resolvers would flush out foo.example's infrastructure RRset
   from the cache.  If we increase the TTL value to one week, then after
   the same 2-hour duration of the example zone's service outage,
   foo.example's infrastructure RRset would stay valid in the caches of
   98.9% of those cache resolvers that had fetched the RRset earlier.
   The longer the TTL is, the greater the number of cache resolvers that
   will have valid DNS server information in their cache.  Hence, we see
   an increased DNS availability in the face of temporary outages of top
   level servers.

   In order to gauge the effectiveness of a longer TTL value for the DNS
   infrastructure records, we used a real DNS trace that was captured by
   a UCLA caching server for 2 weeks.  Based on this trace, we simulated
   a DoS attack on all root and TLD servers and we measured the
   percentage of queries that weren't resolved (excluding negative
   answers from the root and TLD zones), in the case of current TTL
   values, and in the case of a hypothetical TTL value of 3, 5, 7, and 9
   days for all zones.  The attack duration was 3, 6, 12 and 24 hours,
   and started at the eighth day (in simulation time).  The following
   table shows the absolute number as well as the percentage of the
   queries that they did not resolve for each case of attack duration
   and TTL value:


Pappas, et al.           Expires August 5, 2006                [Page 15]

Internet-Draft     Improving DNS Service Availability      February 2006


   ---------------------------------------------------------------------
   |     ||                 Attack Duration (Hours)                    |
   ---------------------------------------------------------------------
   |     ||      3       |       6      |      12      |       24      |
   | TTL ||-------------------------------------------------------------
   |(day)|| 7776 Queries | 13799 Queries| 23586 Queries| 53636 Queries |
   |--------------------------------------------------------------------
   |  -  || 2227 - 28.6% | 3829 - 27.7% | 6807 - 28.8% | 17099 - 31.8% |
   |  3  || 1132 - 14.5% | 1884 - 13.6% | 3154 - 13.3% |  7218 - 13.4% |
   |  5  ||  917 - 11.7% | 1530 - 11.0% | 2562 - 10.8% |  5947 - 11.0% |
   |  7  ||  767 -  9.8% | 1256 -  9.1% | 2092 -  8.8% |  4766 -  8.8% |
   |  9  ||  711 -  9.1% | 1165 -  8.4% | 1898 -  8.0% |  4157 -  7.7% |
   ---------------------------------------------------------------------

   Figure 1

   Clearly, we see that by using a longer TTL value we can increase the
   overall system availability under denial of service attacks.  The
   table shows that with a TTL value of seven days we can decrease the
   impact of such an attack at the root and TLD servers by 70%,
   independent of the attack duration.  Also the table shows that by
   increasing the TTL value, we are able more resilient to attacks.
   Based on these results we believe that a TTL value of seven days is
   adequate enough to considerably improve the resilience of the DNS
   system against denial of service attacks.

5.1.  Further Enhancement Through Prefetching

   Although our above analysis shows that a long TTL value alone can be
   effective in increasing DNS service availability, we note that at any
   given time some cache resolvers will have the infrastructure RRsets
   in their caches expire.  Thus, if some top level zones are out of
   service when a resolver's cache entries expire, that resolver loses
   the ability to directly contact the destination zones whose
   infrastructure RRsets got flushed out.  To further improve DNS'
   service availability, we suggest that cache resolvers pre-fetch all
   the infrastructure RRsets that have an initial TTL value > 2 days
   (which is currently the default TTL value).  We can interpret a long
   TTL value for a infrastructure RRset to mean that the zone is "long
   TTL aware" and desires high availability.  We suggest that the pre-
   fetch is performed when an infrastructure RRset's cache time drops
   below TTL/2.  Such pre-fetching assures that a cache resolver will
   have valid infrastructure RRsets in the cache, and hence be able to
   reach zone servers directly, even when some zones along the DNS
   lookup path may have failed.  This would remain true as long as the
   outage is shorter than TTL/2 time period.


Pappas, et al.           Expires August 5, 2006                [Page 16]

Internet-Draft     Improving DNS Service Availability      February 2006


6.  Backwards Compatibility

   The advantages in this approach stem, largely, from its simplicity.
   The operational practice of using long TTLs for infrastructure
   records does not require any modifications to currently deployed
   caches.  The proposal is, therefore, backwards compatible with
   existing infrastructure, and has no dependency on any specific
   implementation of a DNS cache (such as BIND, djbdns, etc.).

   Additional features associated with the use of the long TTL, such as
   re-fetching, may be incrementally deployed without adversely
   affecting any existing or neighboring caches.  All additional logic
   pertains to an instance's local cache and does not have the ability
   to affect or exploit other caches.

   Some DNS resolvers set a maximum value of the TTL that they are
   willing to cache.  Any TTL value larger than the maximum is trimmed
   down to the maximum value.  For example BIND sets one week as the
   maximum value for caching resource records.  Thus, zones with a TTL
   value larger than one week will not achieve any additional
   improvements over zones with just one week TTL value.  Thus in this
   document we recommend a TTL value of one week.  If future caching
   server implementations have a larger maximum acceptable TTL value, we
   recommend increasing the TTL value of the infrastructure records even
   more (up to one month).


Pappas, et al.           Expires August 5, 2006                [Page 17]

Internet-Draft     Improving DNS Service Availability      February 2006


7.  Security Considerations

   The long TTL solution prescribes an operational practice that
   facilitates DNS queries during prolonged outages.  Such outages may
   result from extended DDoS attacks against key servers in the DNS.
   The use of long TTLs does not reduce the vulnerability of targeted
   servers to DDoS attacks.  However, the use of long TTLs limits the
   effectiveness of a DDoS to the global DNS.  While a DDoS may disrupt
   the availability of some critical nameservers, the NS records for the
   zones that are delegated by them will be available in remote caches
   for much longer.  Therefore, while a DDoS is no less likely, its
   scope is dramatically reduced.

   Though the long TTL extends the roll-over period that should be
   followed when updating NS records for a zone, there exist no
   additional operational requirements beyond what is recommended now.
   The current guidelines recommend that operators continue to operate
   existing nameservers during the period between the date of a change
   to the NS records and that date plus the value of the old TTL.  The
   only difference that results from this proposal is to that the roll-
   over period is increased in proportion to the TTL.

   Failure to adhere to these guidelines has 1 of 2 effects (which exist
   in the current mode of operation for the DNS too): If there exist
   some nameservers that appear in both the old NS RRSet and the new
   one, then any cache that is making use of a cached set may have to
   issue multiple A requests and timeout before reaching an active
   nameserver.  However, if there is no intersection between the
   nameservers in the old and the new RRSet, then there exists a period
   between the date that the last cache has fetched the old values, and
   that time plus 1 TTL, when a cache will direct resolvers to
   inoperable nameservers.  Neither of these scenarios is a concern if
   operators follow the standard procedure of maintaining both sets of
   servers (or at least an overlapping set) during roll-overs.


Pappas, et al.           Expires August 5, 2006                [Page 18]

Internet-Draft     Improving DNS Service Availability      February 2006


8.  Recommendations

   Our analysis shows that using long TTL values for infrastructure
   RRsets can be a simple and effective way to increase DNS service
   availability in face of top level DNS server outages, and that this
   simple operational tuning should have negligible impact on the DNS
   system and its performance.  Our measurements over a large set of
   randomly selected DNS zones also suggest that, in today's practice,
   the infrastructure RRsets for the majority of DNS zones are indeed
   stable and change very infrequently.

   Based on our analysis and measurements, we make the following
   recommendations.  First, we recommend that the TTL value for
   infrastructure RRsets to be increased to one week.  Second, conduct a
   trial deployment of this long TTL value with a controlled set of
   zones and measure the zones' availability, performance (in terms of
   name resolution delays), and changes in the zones' server load (we
   expect a decrease in the server load).  If the trial deployment
   succeeds without exposing any unexpected issues, we would like to
   recommend wide deployment of long TTL settings for infrastructure
   RRsets, both for top level zones as well as for any zones that desire
   a high availability.

   It is noteworthy that extending the TTL of infrastructure RRsets to
   one week constitutes a very palpable step toward ensuring the
   robustness of the DNS.  Current caching in the DNS is invaluable for
   many reasons, but with this enhancement, caching in the DNS is being
   drafted into the realm of DDoS protection.  Our analysis has shown
   that this long-standing, bulletproof, staple of DNS is capable of
   offering a very tangible level of protection with almost no overhead
   and with no new code.  As future work, we plan to conduct further
   analysis on much longer TTL values (such as one month) for
   infrastructure RRsets and consider the impact on DNSSEC deployment.


Pappas, et al.           Expires August 5, 2006                [Page 19]

Internet-Draft     Improving DNS Service Availability      February 2006


9.  Acknowledgments

   We would like to express our thanks to Greg Minshall for an early
   discussion on the feasibility of using long TTLs to improve DNS
   availability, to Pete Resnick for his support and the suggestion of
   using one week or even longer TTL values, and to Rob Austin and
   Patrik Faltstrom who also provided constructive comments to our
   proposal.

10.  References

   [Mock88]   Mockapetris, P. and K. Dunlap, "Development of the Domain
              Name System", SIGCOMM, 1988.

   [RFC1034]  Mockapetris, P., "Domain names - concepts and facilities",
              STD 13, RFC 1034, November 1987.

   [RFC1035]  Mockapetris, P., "Domain names - implementation and
              specification", STD 13, RFC 1035, November 1987.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC2181]  Elz, R. and R. Bush, "Clarifications to the DNS
              Specification", RFC 2181, July 1997.

   [RFC2182]  Elz, R., Bush, R., Bradner, S., and M. Patton, "Selection
              and Operation of Secondary DNS Servers", BCP 16, RFC 2182,
              July 1997.

   [RFC3258]  Hardie, T., "Distributing Authoritative Name Servers via
              Shared Unicast Addresses", RFC 3258, April 2002.

   [RFC4034]  Arends, R., Austein, R., Larson, M., Massey, D., and S.
              Rose, "Resource Records for the DNS Security Extensions",
              RFC 4034, March 2005.


Pappas, et al.           Expires August 5, 2006                [Page 20]

Internet-Draft     Improving DNS Service Availability      February 2006


Authors' Addresses

   Vasileios Pappas
   University of California, Los Angeles, Department of Computer Science
   4805 Boelter Hall
   Los Angeles, CA  90095-1596
   US

   Email: vpappas@cs.ucla.edu


   Bin Zhang
   Colorado State University, Department of Computer Science
   Fort Collins, CO  80523-1873
   US

   Email: zhangb@cs.colostate.edu


   Eric Osterweil
   University of California, Los Angeles, Department of Computer Science
   4805 Boelter Hall
   Los Angeles, CA  90095-1596
   US

   Email: eoster@cs.ucla.edu


   Dan Massey
   Colorado State University, Department of Computer Science
   Fort Collins, CO  80523-1873
   US

   Email: massey@cs.colostate.edu


   Lixia Zhang
   University of California, Los Angeles, Department of Computer Science
   3713 Boelter Hall
   Los Angeles, CA  90095-1596
   US

   Email: lixia@cs.ucla.edu


Pappas, et al.           Expires August 5, 2006                [Page 21]

Internet-Draft     Improving DNS Service Availability      February 2006


Intellectual Property Statement

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.


Disclaimer of Validity

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Copyright Statement

   Copyright (C) The Internet Society (2006).  This document is subject
   to the rights, licenses and restrictions contained in BCP 78, and
   except as set forth therein, the authors retain all their rights.


Acknowledgment

   Funding for the RFC Editor function is currently provided by the
   Internet Society.


Pappas, et al.           Expires August 5, 2006                [Page 22]