Internet DRAFT - draft-zhang-dhc-dhcpv6-failure-detection

draft-zhang-dhc-dhcpv6-failure-detection







DHCWG                                                           L. Zhang
Internet-Draft                                                   W. Wang
Intended status: Informational                           BUPT University
Expires: August 1, 2018                                          Y. Chen
                                                     Tsinghua University
                                                                  L. Sun
                                                         BUPT University
                                                        January 28, 2018


         Detection of Primary Server Failure in DHCPv6 Failover
              draft-zhang-dhc-dhcpv6-failure-detection-02

Abstract

   In DHCPv6 failover or other multiple servers deployment scenarios, an
   automatic failure detection capability may be desirable.  This
   document describes a detection method, with which the secondary
   server can detect the link failure between the primary server and
   clients.  This document does not define any protocol details.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on August 1, 2018.

Copyright Notice

   Copyright (c) 2018 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect



Zhang, et al.            Expires August 1, 2018                 [Page 1]

Internet-Draft       DHCPv6 Server Failure Detection        January 2018


   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Requirements Language . . . . . . . . . . . . . . . . . . . .   3
   3.  Problem Statement and Applicability . . . . . . . . . . . . .   3
   4.  Detection of Primary Server Failure . . . . . . . . . . . . .   3
   5.  Security Considerations . . . . . . . . . . . . . . . . . . .   4
   6.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   5
   7.  Normative References  . . . . . . . . . . . . . . . . . . . .   5
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .   5

1.  Introduction

   [RFC7031] describes the requirements of DHCPv6 failover, [RFC6853]
   discusses a simpler redundancy deployment considerations of DHCPv6.
   Both scenarios employ multiple servers deployments to improve
   DHCPv6's reliability and availability.  In such scenarios, two
   categories of DHCPv6 servers, primary and secondary servers, are
   serving the clients in the domain.  Both servers should provide
   essential DHCPv6 service and maintian the consistent configurations
   and lease inforamtion.  The primary server should be resposnible for
   answering clients' requests, while the secondary server is expected
   to be responsive in case of the primary server's failure.

   Popular implementations of failover and redundancy designs always
   provide the ability that one server could detect its partner's
   failure.  This goal could be achieved through various mechanisms such
   as timer-based solution and etc.  However, such failure detection
   methods are not sufficient.  Since they cannot work out in a
   situation that the connection between the primary and secondary
   servers is normal while the link between the primary server and
   clients is down.  Under this circumstances, it would be desirable
   that the secondary server could detect such a failure automatically
   and take the responsibility of providing DHCPv6 services.

   This document describes a method for the secondary server to detect
   such a failure between primary server and clients in a ordinary
   multiple servers deployment.  The consideration of the potential
   preference conflict between the responsive secondary server and
   primary server is also presented.






Zhang, et al.            Expires August 1, 2018                 [Page 2]

Internet-Draft       DHCPv6 Server Failure Detection        January 2018


2.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

3.  Problem Statement and Applicability

   [RFC3315] allows multiple servers working in one domain for high
   availability and other benefits.  One of the main purposes of
   multiple DHCPv6 servers deployment and failover is to solve the
   single point of faiulre problem.  The server failure could be divided
   into two categories, the first one is the failure between primary
   server and secondary server, the second one refers to the failure
   between primary server and clients.  People and existing failover
   implementations always focused more on the former situation and has
   already came up with several automatic detection methods.

   A common scenario of the second failure is a (physical) link failure
   between primary server and clients.  Such link failure may not do
   harm to the primary server itself but could actually result in making
   the primary server unreachable for clients.  If the secondary server
   is not able to detect such a failure, it will assume everything is
   okay and not provide DHCPv6 service for redundancy.

   Section 5.1.1 of [RFC7031] illustrates the first kind of server
   failure and states that the secodnary server could easily detetct
   such failure according to lack of responses from the primary server.
   However, it is obvious that such method does not make sense for the
   second server failure discussed in this document.  Thus, we propose a
   new method in this document to automatically discover the failure
   between primary server and clients.

4.  Detection of Primary Server Failure

   The failure detection method described in this document is based on
   the following assumptions.

   o  The secondary server is reachable to clients while the primary
      server is not (at least to part of clients).
   o  The primary server is not down and the link between primary and
      secondary server is normal.

   Based on the assumptions above, if the primary server is not
   reachable for a client, the client may keep advertising SOLICIT or
   REQUEST messages (if stateless DHCPv6 is used, the client may keep
   sending INFORMATION-REQUEST message).




Zhang, et al.            Expires August 1, 2018                 [Page 3]

Internet-Draft       DHCPv6 Server Failure Detection        January 2018


   To achieve an automatic detection, the secondary server should
   implement an internal counter.  This counter will count each time the
   secondary server receives a duplicated message (e.g.  SOLICIT
   message) from a same client.  Also a threshold value and a time
   period should be set at the secondary server side.  If the count
   value is larger than the threshold value in the configured time
   period, and the secondary server cannot find anything wrong with the
   primary server (i.e. responses from the primary server is regular),
   it will consider there exists a failure between primary server and
   clients.  And if the count value does not reach the threshold in the
   specific time period, the counter will be clear.  The threshold and
   time period value may differ in different deployments, thus the
   specific value of threshold and time period and detailed
   implementation of counter is out of scope of this document.

   The detection method described in this document is likely to lead to
   a situation that both the primary server and secondary server are
   responsive, at least for the clients that their link to the primary
   server is not down.  The reason is that the primary server cannot
   detect there is a failure between itself and part of clients.  Thus
   it will continue to provide its DHCPv6 service which may cause a
   conflict with the secondary server.  As a result, part of clients may
   receive two responses from the two servers and cannot decide which
   should be used.

   One possible solution is that every time the secondary server decide
   to take the responsibility of being a responsive server to provide
   DHCPv6 service,it should inform the primary server about it.  Such a
   notification should be regardless of whether the primary server is
   available or not.  Since the purpose is to make sure there will not
   be two servers offering service at the same time.

   Once the primary server failure is detected and notification process
   is finished, the secondary server may start to serve as a responsive
   server or just report the condition but do nothing else.

5.  Security Considerations

   A sort of DoS attack can be performed by a malicious client, which
   can flood the SOLICIT message in the network, thus make the secondary
   server become responsive while the primary server is actually
   responsive to the other clients.

   Further security considerations is TBD.







Zhang, et al.            Expires August 1, 2018                 [Page 4]

Internet-Draft       DHCPv6 Server Failure Detection        January 2018


6.  IANA Considerations

   This document does not include an IANA request.

7.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC3315]  Droms, R., Ed., Bound, J., Volz, B., Lemon, T., Perkins,
              C., and M. Carney, "Dynamic Host Configuration Protocol
              for IPv6 (DHCPv6)", RFC 3315, DOI 10.17487/RFC3315, July
              2003, <https://www.rfc-editor.org/info/rfc3315>.

   [RFC6853]  Brzozowski, J., Tremblay, J., Chen, J., and T. Mrugalski,
              "DHCPv6 Redundancy Deployment Considerations", BCP 180,
              RFC 6853, DOI 10.17487/RFC6853, February 2013,
              <https://www.rfc-editor.org/info/rfc6853>.

   [RFC7031]  Mrugalski, T. and K. Kinnear, "DHCPv6 Failover
              Requirements", RFC 7031, DOI 10.17487/RFC7031, September
              2013, <https://www.rfc-editor.org/info/rfc7031>.

Authors' Addresses

   Lanshan Zhang
   BUPT University
   Beijing University of Posts and Telecommunications (BUPT)
   Beijing  100876
   P.R. China

   Phone: +86-13146885878
   Email: zls326@sina.com


   Wendong Wang
   BUPT University
   Beijing University of Posts and Telecommunications (BUPT)
   Beijing  100876
   P.R. China

   Email: wdwang@bupt.edu.cn







Zhang, et al.            Expires August 1, 2018                 [Page 5]

Internet-Draft       DHCPv6 Server Failure Detection        January 2018


   Yuchi Chen
   Tsinghua University
   Beijing  100084
   P.R. China

   Phone: +86-10-6278-5822
   Email: chenycmx@gmail.com


   Linhui Sun
   BUPT University
   Beijing University of Posts and Telecommunications (BUPT)
   Beijing  100084
   P.R. China

   Email: sunlinhui@bupt.edu.cn



































Zhang, et al.            Expires August 1, 2018                 [Page 6]