Network Working Group                                        Ralph Droms
INTERNET DRAFT                                       Bucknell University

                                                              Greg Rabil
                                                             Mike Dooley
                                                              Arun Kapur
                                                       Quadritek Systems

                                                             Kim Kinnear
                                                              Mark Stapp
                                                           Cisco Systems

                                                            Steve Gonczi
                                                             Bernie Volz
                                                        Process Software

                                                           November 1998
                                                       Expires June 1999


                         DHCP Failover Protocol
                    <draft-ietf-dhc-failover-03.txt>

Status of this Memo

   This document is an Internet-Draft.  Internet-Drafts are working
   documents of the Internet Engineering Task Force (IETF), its areas,
   and its working groups.  Note that other groups may also distribute
   working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet- Drafts as reference
   material or to cite them other than as "work in progress."

   To view the entire list of current Internet-Drafts, please check the
   "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
   Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern
   Europe), ftp.nic.it (Southern Europe), munnari.oz.au (Pacific Rim),
   ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast).


Abstract

   DHCP [RFC 2131] allows for multiple servers to be operating on a
   single network. Some sites are interested in running multiple servers
   in such a way so as to provide redundancy in case of server failure.
   In order for this to work reliably, the cooperating primary and


Droms, et. al.                                                  [Page 1]

DRAFT                                                      November 1998


   secondary servers must maintain a consistent database of the lease
   information.  This implies that servers will need to coordinate any
   and all lease activity so that this information is synchronized in
   case of failover.

   This document defines a protocol to provide this synchronization
   between two servers. One server is designated the "Primary" server,
   the other is the "Secondary" server. Additionally, this document
   describes a protocol for the automatic transfer of control from the
   primary to the secondary in the case of failure (failover), as well
   as a network partition.

   This document further develops the concepts presented in draft-ietf-
   dhc-failover-02.txt.

1.  Introduction

   As the use of DHCP servers in networked environments grows, the
   dependency of those networks on the DHCP server increases.  This is
   particularly true of the hosts that receive their configuration
   information from the DHCP server.  Therefore, it is very important to
   be able to provide reliable, continuous availability of DHCP ser-
   vices.

   This specification describes a protocol to support automatic failover
   from a primary to its secondary server.  The failover mechanism
   allows the secondary server to perform DHCP actions while the primary
   is down, or when a network failure prevents the primary and secondary
   from communicating.  The protocol also specifies how reintegration is
   achieved when the primary again becomes operational or when the pri-
   mary and secondary can again communicate.

   In providing the specification for the failover, the protocol speci-
   fies how to guarantee reliable delivery of binding changes to the
   partner server.  This is required to synchronize lease data between
   the primary and the secondary.  The protocol further specifies a
   mechanism to allow either server to determine if it can communicate
   with its partner.  The secondary will automatically begin to service
   DHCP requests whenever it cannot communicate with the primary.  When
   the primary server becomes available again, the secondary will convey
   any changes that occurred since the time of failover back to the pri-
   mary.

   Through careful control of the difference between the lease times
   offered to DHCP clients and the lease time known by the secondary
   server, the protocol allows the primary to communicate with the
   secondary after the primary has completed communication with the DHCP
   client (a technique known as "lazy" update) and still guarantee that


Droms, et. al.                                                  [Page 2]

DRAFT                                                      November 1998


   duplicate IP address allocations do not occur.  Thus, the protocol
   does not directly impact the ability of a DHCP server to respond to
   DHCP client requests.

1.1.  Requirements Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC 2119].


1.2.  DHCP Terminology

   This document uses the following terms:

      o "DHCP client" or "client"

        A DHCP client is an Internet host using DHCP to obtain confi-
        guration parameters such as a network address.

      o "DHCP server" or "server"

        A DHCP server is an Internet host that returns configuration
        parameters to DHCP clients.

      o "binding"

        A binding is a collection of configuration parameters, including
        at least an IP address, associated with or "bound to" a DHCP
        client.  Bindings are managed by DHCP servers.

      o "binding database"

        The collection of bindings managed by a primary and secondary.

      o "subnet address pool"

        A subnet address pool is the set of IP address which is associ-
        ated with a particular network number and subnet mask.  In the
        simple case, there is a single network number and subnet mask
        and a set of IP addresses.  In the more complex case (sometimes
        called "secondary subnets", sometimes "superscopes"), several
        (apparently unrelated) network number and subnet mask combina-
        tions with their associated IP addresses may all be configured
        together into one subnet address pool.

      o "Primary server" or "Primary"


Droms, et. al.                                                  [Page 3]

DRAFT                                                      November 1998


        A DHCP server configured to provide primary service to a set of
        DHCP clients for a particular set of subnet address pools.

      o "Secondary server" or "Secondary"

        A DHCP server configured to act as backup to a primary server
        for a particular set of subnet address pools.

      o "stable storage"

        Every DHCP server is assumed to have some form of what is called
        "stable storage".  Stable storage is used to hold information
        concerning IP address bindings (among other things) so that this
        information is not lost in the event of a server failure which
        requires restart of the server.


1.3.  Requirements for this protocol

   The following list of goals must be (and are) achieved by this proto-
   col.

      1.  Implementations of this protocol must work with existing DHCP
          client implementations based on the DHCP protocol [RFC 2131].

      2.  Implementations of the protocol must work with existing BOOTP
          relay implementations.

      3.  The protocol must provide failover redundancy between servers
          that are not located on the same subnet.


1.4.  Goals for this protocol

      1.  Provide for continued service to DHCP clients through an
          automated mechanism in the event of failure of the primary
          server.

      2.  Avoid binding an IP address to a client while that binding is
          currently valid for another client.  In other words, do not
          allocate the same IP address to two clients.

      3.  Minimize any need for manual administrative intervention.

      4.  Introduce no additional delays in server response time as a
          result of the communications required to implement the Fail-
          over protocol.


Droms, et. al.                                                  [Page 4]

DRAFT                                                      November 1998


      5.  Share IP address ranges between primary and secondary servers;
          i.e., impose no requirement that the pool of available
          addresses be divided between servers.

      6.  Continue to meet the goals and objectives of this protocol in
          the event of server failure or network partition.

      7.  Provide graceful reintegration of full protocol service after
          server failure or network partition.

      8.  Allow for one computer to act as a secondary server for multi-
          ple primary servers. Other topologies (e.g.: mesh) are also
          possible.  primary and secondary servers SHOULD be viewed as
          "logical" servers and not necessarily physical computers.

      9.  Ensure that an existing client can keep its existing IP
          address binding if it can communicate with either the primary
          or secondary DHCP server implementing this protocol - not just
          whichever server that originally offered it the binding.

      10. Ensure that a new client can get an IP address from some
          server. Ensure that in the face of partition, where servers
          continue to run but cannot communicate with each other, the
          above goals and requirements may be met. In addition, when the
          partition condition is removed, allow graceful automatic re-
          integration without requiring human intervention.

      11. If either primary or secondary server loses all of the infor-
          mation that is has stored in stable storage, it should be able
          to refresh its stable storage from the other server.


1.5.  Limitations of this Protocol

   The following are explicit limitations of this protocol.

      1.  Under normal operation, only one server at a time will hand
          out new IP addresses, but client lease renewals are serviced
          by both servers; the protocol provides reliability through
          redundancy and some degree of load balancing of lease
          renewals.

      2.  This protocol provides only one level of redundancy through a
          single secondary server for each primary server.

      3.  The protocol provides a way to detect when the primary and
          secondary server cannot communicate, but once this condition
          has been detected, does not (indeed, cannot) provide any way


Droms, et. al.                                                  [Page 5]

DRAFT                                                      November 1998


          to further distinguish between network failure and failure of
          one of the servers. The protocol allows detection of an ord-
          erly shutdown of a participating server.

      4.  A subset of the address pool is reserved for secondary server
          use.  In order to handle the failure case where both servers
          are able to communicate with DHCP clients, but unable to com-
          municate with each other, a subset of the IP address pool must
          be set aside as a private address pool for the secondary
          server. The secondary can use these to service newly arrived
          DHCP clients during such a period.  The size of this private
          pool SHOULD be based only on the arrival rate of new DHCP
          clients and the length of expected down-time, and is not
          influenced in any way by the total number of DHCP clients sup-
          ported by the server pair.

      5.  The primary and secondary servers do not respond to client
          requests at all while recovering from a failure that could
          have resulted in duplicate IP assignments.  (When synchroniz-
          ing in POTENTIAL-CONFLICT state).


2.  Protocol Operations

   The protocol features a small number of messages to communicate bind-
   ing information, operational status and to manage various
   disconnect-reconnect scenarios between servers.


2.1.  Message Addressing and Configuration granularity

   When discussing messages, an important question is "to whom are mes-
   sages sent" and "from whom are messages sent".  What is the address-
   able entity from which and to which messages are sent?

   At one level, this would seem to be a single DHCP server, but in fact
   there are many situations where additional flexibility in configura-
   tion is useful.  For instance, there might be several servers which
   are each primary for a distinct set of address pools, and one server
   which is secondary for all of those address pools.  The situation
   with the primaries is straightforward, but the secondary will need to
   maintain a separate failover state, partner state, and communications
   up/down status for each of the separate primary servers for which it
   is acting as a secondary.

   The protocol allows for there to be a unique failover entity per
   partner per role (where role is primary or secondary).  This failover
   entity can take actions and hold unique states.  There are thus a


Droms, et. al.                                                  [Page 6]

DRAFT                                                      November 1998


   maximum of two failover entities per partner (one for the partner as
   a primary and one for that same partner as a secondary.)

   Thus, in the case where there are two primary servers A and B each
   backed up by a single common secondary server C, there is one fail-
   over entity on each of A and B, and two different failover entities
   on C.  The two different failover entities on C each have unique
   states and message xid ranges.  As far as the protocol described in
   this draft is concerned, they constitute different "servers",
   although they are certainly part of one server (as the term is com-
   monly used) if they reside in the same process.

   It is not the case that there is subnet granularity for each failover
   entity.  On one server, there is one failover entity per "partner-
   role", regardless of how many subnets or address pools are managed by
   that combination of partner and role.  Conversely, any given subnet
   or pool will be associated with exactly one failover entity on a sin-
   gle server (but it will also be associated with the corresponding
   partner's failover entity.)

   When a message is received from the partner, the unique failover
   entity to which the message is directed is determined solely by the
   IP address of the partner and the setting of the SECONDARY bit in the
   'flags' field of the message header.

   Throughout this document, the states and actions taken by "servers"
   are described.  The terms "server", "primary server", and "secondary
   server" are commonly used to described the entity taking these states
   and taking actions.  This description is wholly accurate only for the
   simplest of cases, where all of the address pools on one server are
   backed up by all of the address pools on another server.  In this
   case, there is a "true" primary and secondary server.  In all other
   cases, the term "server" is used to describe one of the two possible
   failover entities per partner.


2.2.  Packet transport

   All messages sent by this protocol are sent in UDP packets.  All mes-
   sages are unicast from the sender to the receiver.  The next section
   discusses the port to use when sending DHCP failover UDP packets.

   DISCUSSION:

      See section 8, Extended discussion #1, for a discussion of the
      reasons to use UDP as the protocol.


Droms, et. al.                                                  [Page 7]

DRAFT                                                      November 1998


2.3.  Port usage

   Compliant servers SHOULD use port 647 (assigned to dhcp-failover by
   IANA) for sending and receiving Failover protocol messages, though
   they MAY be configured to use a different port (including ports 67 or
   68).

   Since the use of port 67 and 68 is allowed, the messages are format-
   ted in such a way that they can be distinguished from DHCP or BOOTP
   messages by the use of distinct message 'op' codes.  Note that send-
   ing failover messages on port 67 to servers not designed to support
   them may not only not work, but may cause those servers to operate
   incorrectly or to crash.

   DISCUSSION:

      Some implementors have a strong requirement for using a separate
      port for the Failover protocol, and the use of the allocated port
      647 will accommodate them.  Some other implementors seem equally
      committed to allowing failover packets to be sent to the standard
      DHCP port, port 67.  The above language strongly suggests that the
      failover port be used (by using SHOULD), but leaves open the pos-
      sibility of using the standard DHCP port (or any other) for
      servers designed to operate in that fashion.


2.4.  Time synchronization between communicating servers

   Each Binding update message carries a "sent time stamp" (the time
   when the message was sent in GMT). This provides a simple mechanism
   to determine any "time drift" between communicating servers.

   DISCUSSION:

      If a UDP packet is successfully transmitted (i.e.: it does not get
      lost), the packet travel time is negligible in the framework of
      DHCP leases.  By providing a GMT "sent time" stamp, the recipient
      can compare this with its notion of the current GMT time at the
      time it receives the packet.  The difference (plus the packet
      travel time, which we ignore) is the time drift.  The recipient
      MUST use this time drift value to bias "absolute time" values it
      receives from the sender.

2.5.  Failover Protocol Messages

   The Failover protocol messages are sent using UDP and encoded using a
   packet format specific to the Failover protocol. To allow easy
   recognition of and separation of Failover protocol messages from


Droms, et. al.                                                  [Page 8]

DRAFT                                                      November 1998


   BOOTP and DHCP messages, BOOTP packet 'op' field values  3..11 are
   used to indicate various Failover protocol message types. A Failover
   protocol message is always unicast from the source to the destination
   using the port defined in section 2.2. The sender, and never the
   recipient is responsible for retransmission when necessary.

2.6.  Failover protocol packet header format

   All of the fields in the fixed portion of the packet MUST be filled
   with correct data in every message sent.

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     op (1)    |     rev (1)   |        payload offset (2)     |
   +---------------+---------------+---------------+---------------+
   |                            xid (4)                            |
   +---------------------------------------------------------------+
   |              sending server ID ( IP address ) (4)             |
   +---------------------------------------------------------------+
   |                          time stamp (4)                       |
   +---------------------------------------------------------------+
   |     state (1) |    flags(1)   |       reserved (2)            |
   +---------------+---------------+---------------+---------------+
   |     0 or more additional header bytes  (variable)             |
   +---------------------------------------------------------------+
   |     Payload Data, formatted as DHCP-style options             |
   |     (although using a unique option number space)             |
   |                     (variable)                                |
   +---------------------------------------------------------------+


Droms, et. al.                                                  [Page 9]

DRAFT                                                      November 1998


   op - 1 byte

   These values extend the number space of the existing BOOTP message
   type "Op" field.

   The following message types are defined:

   Value   Message Type
   -----   ------------
   0       reserved to BOOTP/DHCP, unused by failover
   1       BOOTREQUEST (reserved to BOOTP/DHCP, unused by failover)
   2       BOOTREPLY   (reserved to BOOTP/DHCP, unused by failover)
   3       DHCPPOOLREQ         request allocation of addresses
   4       DHCPPOOLRESP        respond with allocation count
   5       DHCPBNDUPD          update partner with binding info
   6       DHCPBNDACK          acknowledge receipt of binding update
   7       DHCPPOLL            probe partner for comm. integrity
   8       DHCPPRPL            acknowledge comm. integrity
   9       DHCPUPDATEREQALL    request full transfer of binding info
   10      DHCPUPDATEDONE      ack send and ack of req'd binding info
   11      DHCPUPDATEREQ       req transfer of un-acked binding info


   rev - 1 byte

   Failover protocol version supported.  Set to 1 for the Failover
   protocol described in this draft.  The value 255 is reserved for
   experimental implementations.  Such implementations SHOULD use the
   DHCP Vendor Class option to recognize a partner server which is using
   the same vendor's experimental implementation.


   payload offset - 2 bytes, network byte order

   The byte offset of the Payload area, from the beginning of the
   Failover packet header. The value for the current protocol version is
   20.


   xid - 4 bytes, network byte order

   The sender of a Failover protocol packet is responsible for setting
   this number, and the receiver of the packet copies the number over
   into any response packet, treating it as opaque data.  The sender
   SHOULD ensure that every packet sent to a particular IP address and
   port combination has a unique transaction id unless that packet is a
   re-transmission.


Droms, et. al.                                                 [Page 10]

DRAFT                                                      November 1998


   sending server ID - 4 bytes, network byte order

   The IP address of the sending server.  In conjunction with the
   setting of the SECONDARY flag, this uniquely determines the failover
   entity sending the message as well as that destined to receive the
   message.

   This is placed in the packet instead of being recovered from the IP
   header for security purposes (see section 8).


   time stamp - 4 bytes, unsigned, network byte order

   A time stamp, indicating the time when the packet was sent.  The time
   is a 32 bit unsigned long value in network byte order, in units of
   seconds (GMT since EPOCH).

   It is used to determine the time drift between the sender and the
   recipient. The time drift is defined as the difference between
   "Arrive Time (GMT)" and "(Send Time (GMT)".  The actual packet travel
   time is assumed to be negligible in this context. All Date-Time
   values contained in Failover messages MUST be corrected by the time
   drift before being stored by the recipient.


   state - 1 byte

   This field indicates the state of the sender, at the time the packet
   was sent.  The field MUST be set in every Failover message.  The
   server state value can be one of the following:

   Value   Server State
   -----   -------------------------------------------------------------
   0       NO-STATE                     May only occur in POLL messages.
                                        The partner should reply, but
                                        should not react with any state
                                        transition.
   1       STARTUP                      Startup state (1)
   2       NORMAL                       Normal state
   3       COMMUNICATIONS-INTERRUPTED   Communication interrupted (safe)
   4       PARTNER-DOWN                 Partner down (unsafe mode)
   5       POTENTIAL-CONFLICT           Synchronizing
   6       RECOVER                      Recovering bindings from partner
   7       PAUSED                       Shutting down for a short period.
   8       SHUTDOWN                     Shutting down for an extended
                                        period.
   9       RECOVER-DONE                 Interlock state prior to NORMAL


Droms, et. al.                                                 [Page 11]

DRAFT                                                      November 1998


   Note 1: The STARTUP state is never set in the State field of the mes-
   sage, but rather is represented by the setting of the STARTUP flag
   (see the description of the Flags field immediately below).  When the
   server is in the STARTUP state, the state transmitted in the State
   byte is the PREVIOUS state (usually, but not always, the last
   recorded in stable storage prior to a server going down -- see sec-
   tion 6.3 for details.)

   flags - 1 byte

   Currently, bits 7 (MSB), 6, and 5 are defined.  All other bits are
   reserved, and must be set to 0.

      o SECONDARY

        Bit 7 is the SECONDARY flag and defines the server role.  Bit 7
        is 0 if the sender is a primary server, 1 if it is a secondary
        server.  Note that this role is fixed for the duration of the
        relationship between primary and secondary server.  In particu-
        lar, it does not change when and if the secondary server "takes
        over" for the primary server when it enters COMMUNICATIONS-
        INTERRUPTED or PARTNER-DOWN state -- each server retains its
        role throughout all of its state transitions.

      o RESTART

        Bit 6 is the RESTART flag.  If bit 6 is 1, the sender is res-
        tarting.  A server MUST set this bit every time it is re-
        started, and it MUST clear the bit upon receiving the first
        DHCPPRPL to a DHCPPOLL message it has sent with the bit set.

        Whenever a DHCPPOLL message is sent with the RESTART bit set in
        the 'flags' field, the MCLT Option, Option 235, MUST be
        included.

        Whenever a message with the RESTART bit is received by a server,
        it MUST transition through the communications failed state tran-
        sition.  The RESTART bit signals that the partner server has
        been restarted, and if communications is already considered to
        have failed, then nothing need be done.  If, however, the
        partner server appeared to be operating correctly, then it was
        able to restart without the receiving server noticing that it
        was ever gone.  The communications failed transition is forced
        in this case to restart any on-going resynchronization processes
        that were operating with the partner server.  See section 6.3
        for additional information.

        Whenever a DHCPPOLL message is sent with the RESTART bit set,


Droms, et. al.                                                 [Page 12]

DRAFT                                                      November 1998


        the server SHOULD include a Vendor Class Identifier, Option 60,
        in the message to identify the server to its partner.

      o STARTUP

        Bit 5 is the STARTUP flag.  Bit 5 MUST be set to 1 whenever the
        server is in STARTUP state, and set to 0 otherwise.  (Note that
        when in STARTUP state, the state transmitted in the 'state'
        field is usually the last recorded state from stable storage,
        but see section 6.3 for details.)


   reserved - 2 bytes

   2 filler bytes, reserved.

2.7.  DHCPPOOLREQ and DHCPPOOLRESP:

   A secondary server requests addresses for its unique use from the
   primary server by using the DHCPPOOLREQ message.  The primary is in
   complete charge of how many addresses the secondary receives.

   The primary server will allocate IP addresses to the secondary server
   upon receipt of a DHCPPOOLREQ message and inform the secondary server
   of the number of additional addresses allocated in this allocation
   cycle by sending the number in the DHCPPOOLRESP message.

   When the primary server gets a DHCPPOOLREQ message, it computes which
   addresses should be transferred to the secondary, and queues up
   DHCPBNDUPD transactions by setting the Status of the selected
   addresses to "BACKUP".  Having done this, it sends a DHCPPOOLRESP
   message.  The DHCPPOOLRESP message carries the "Number of addresses
   transferred" as its payload.  The primary server does not have to
   wait until all the above binding updates have been acknowledged,

   The secondary server keeps sending DHCPPOOLREQ messages until it
   receives a  DHCPPOOLRESP with "Number of addresses transferred" = 0,
   or it decides that the partner is not responding.

   If the secondary server receives a  DHCPPOOLRESP message with "Number
   of addresses transferred" > 0, it MUST send another DHCPPOOLREQ mes-
   sage, since additional addresses may still be waiting for it.  How-
   ever, the time at which it sends subsequent DHCPPOOLREQ messages is
   implementation dependent.  This mechanism makes it possible for the
   primary server to pace the transfer (e.g., it could generate all
   addresses all at once, or one-by-one) and to some degree for the
   secondary to pace their receipt.


Droms, et. al.                                                 [Page 13]

DRAFT                                                      November 1998


   The primary server MUST respond to each DHCPPOOLREQ message it
   receives. If it has already generated all private addresses, or it
   has no available addresses, it MUST send  DHCPPOOLRESP with "Number
   of addresses transferred" = 0.

   The secondary server MAY send a DHCPPOOLREQ message at any time, and
   although the primary server is under no obligation to allocate any
   additional addresses, it MUST respond with a DHCPPOOLRESP indicating
   how many new addresses it has allocated or 0 if no new addresses were
   allocated.


2.8.  DHCPUPDATEREQ, DHCPUPDATEREQALL and DHCPUPDATEDONE:

   Whenever either server wishes to be updated with information the
   other server knows but has not yet transmitted, it will send a
   DHCPUPDATEREQ or DHCPUPDATEREQALL message.

   When either server gets a DHCPUPDATEREQ or DHCPUPDATEREQALL message,
   it computes which updates should be transferred to the partner, and
   queues up DHCPBNDUPD transactions as appropriate.  Once all such
   updates have been acknowledged, it sends a DHCPUPDATEDONE message.

   If the message that initiated this process was a DHCPUPDATEREQ mes-
   sage, the receiving server will transmit only DHCPBNDUPD messages for
   IP addresses which its information indicates that its partner has not
   acked.

   If, however, the message that initiated this process was a DHCPUP-
   DATEREQALL message, the receiving server will transmit DHCPBNDUPD
   messages for all IP addresses involved in failover with this partner
   in this role.

   The secondary server periodically re-transmits the DHCPUPDATEREQ mes-
   sage, until it receives a DHCPUPDATEDONE message with a matching
   'xid' field, or until it decides that the partner is not responding.

   This approach is similar to the DHCPPOOLREQ/DHCPPOOLRESP message
   exchange, with one critical difference: the DHCPPOOLRESP is sent as
   soon as the binding updates are queued up, but the DHCPUPDATEDONE
   message is deferred until all of the sender's DHCPBNDUPD messages
   have been successfully transmitted and a corresponding DHCPBNDACK
   message has been received for each of them.

   The server processing a DHCPUPDATEREQ message MUST NOT send a
   corresponding DHCPUPDATEDONE message until all of the DHCPBNDUPD mes-
   sages have been acked by the partner with a DHCPBNDACK message.


Droms, et. al.                                                 [Page 14]

DRAFT                                                      November 1998


   Any retransmissions of the DHCPUPDATEREQ message MUST have the same
   transaction ID.  Use of a new transaction ID may cause rebuilding of
   the outgoing binding update queue or other processing in the server
   with a negative effect on performance.

2.9.  DHCPBNDUPD

   One server notifies its partner of a binding state change by using
   the DHCPBNDUPD message.

   Every DHCPBNDUPD message MUST contain:

      o An Assigned IP Address Option (Option 50).

      o A DHCP Binding Status (Option X).

      o Where the Binding Status is ACTIVE, EXPIRED, RELEASED, or RESET,
        it MUST also contain one or both of the Client Identifier
        (Option 61) and the Client Hardware Address (Option X+3). In the
        case where the Binding Status is ACTIVE, it MUST contain the
        Lease Duration, Option 51.

      o Where dynamic DNS updates are being used by the sending server,
        the Client FQDN Option, Option 81, is used by the sender to
        communication the status of the binding update to its partner.
   In response to a binding update, the recipient server MUST respond
   with a  DHCPBNDACK message.

   Multiple binding updates MAY be batched up, and sent in one Failover
   protocol message (see section 3.1).


2.10.  DHCPBNDACK

   This message implements either a positive or negative acknowledgment
   of one or more binding updates.

   A binding update, (or a batch of binding updates sent as one message)
   are matched up with their associated acknowledgment by having the
   same 'xid' field value in the message header.

   The server sending a DHCPBNDACK message MAY include any of the
   options that are acceptable in a DHCPBNDUPD message when the
   DHCPBNDACK message is returned to the sender.  It MUST include at
   least the Assigned IP Address Option.

   If any of this information differs from the information in the
   DHCPBNDUPD message, the receiver MUST NOT update its bindings


Droms, et. al.                                                 [Page 15]

DRAFT                                                      November 1998


   database with that information upon receipt of the DHCPBNDACK mes-
   sage, since the sender will have no way of knowing if the receiver
   actually received the message.

   The DHCPBNDACK MAY selectively reject one or more updates, by includ-
   ing one or more IP address - Reject Reason option pairs in the mes-
   sage body.

   The DHCPBNDACK implicitly acknowledges any binding updates it replies
   to, except those it enumerates using Reject Reason Codes.

   Implementations of this protocol MAY send batched updates, and they
   MUST be prepared to receive batched updates.


2.11.  DHCPPOLL

   In the absence of other messages, a DHCPPOLL message is used to
   verify the communications integrity of the link between the primary
   and secondary servers.  It is used by either server whenever there is
   some question about either the communications integrity or running
   status of the other server.

   Since current state and other status information is transmitted in
   every DHCPPOLL and in every DHCPPRPL message, the DHCPPOLL and
   DHCPPRPL exchange can also be used to signal a change in status by a
   server or as a way to request an update of the status of its partner.

   Whenever a DHCPPOLL message is generated it MUST have a unique value
   in the 'xid' field, unless it is a retransmission of a previously
   un-acked DHCPPOLL message.


2.12.  DHCPPRPL

   This message simply replies to the DHCPPOLL message (PRPL = Poll
   reply).  Like all messages, it needs to have all of the fixed
   portions of the failover packet header filled in, including the state
   and the flags fields.

3.  Protocol Payload Data Format

   Payload data is encoded as a set of flexible DHCP/BOOTP style options
   [RFC 2132].  (The usual 1 byte option code, 1 byte length, and
   "length" bytes of data).  The options are placed after the header,
   after skipping PayloadOffset bytes.  The payload data options are not
   preceded by a "cookie" value.


Droms, et. al.                                                 [Page 16]

DRAFT                                                      November 1998


   Since the packet is NOT a DHCP/BOOTP protocol packet, the options
   used here do not conflict with any existing "proper" DHCP/BOOTP
   options.  In fact, these options are allocated in relationship to the
   DHCP option space in the following way.

   In cases where the syntax and semantics of a Failover Payload Option
   is identical to that of a DHCP/BOOTP option, the same option number
   is used.  For options unique to the Failover protocol, option numbers
   starting at 230 are used.

   Thus, all new Failover protocol option numbers are assigned from a
   continuous range beginning with 230.

   The protocol is permissive in allowing various other DHCP options in
   binding updates.  As long as the sender wishes to use an option, it
   MAY include it.  On the other hand, the recipient MUST ignore any
   option it is not prepared to process.

3.1.  Batching multiple binding updates in one packet

   Implementations of this protocol MAY send batched updates, and they
   MUST be prepared to receive batched updates.

   Multiple DHCPBNDUPD transactions MAY be batched together in one
   protocol message.  Data sets for individual transactions MUST always
   begin with the Assigned IP Address (Option 50).  Option ordering
   between the Assigned IP Address options is not significant.

   If batched updates are sent, they MUST be formatted as follows:


       Non-IP Address/Non-client specific options first
       Assigned IP address option (50) for the first address
           Options pertaining to first address, including
           at least DHCP Binding Status (230)
       Assigned IP address option (50) for the second address
           Options pertaining to second address, including
           at least DHCP Binding Status (230)
        ...


   In case an implementation chooses to reject some or all of the IP
   address binding information in a DHCPBNDUPD message in a DHCPBNDACK
   reply, the DHCPBNDACK message MUST contain one or more Assigned IP
   Address (Option 50) / Reject Reason Code pairs to indicate that the
   updates for the address(es) were not accepted.  The Assigned IP
   Address options communicates which updates out of the batch are being
   rejected, and the Reject Reason Code indicates why.  Any IP addresses


Droms, et. al.                                                 [Page 17]

DRAFT                                                      November 1998


   present in the DHCPBNDUPD message without corresponding Option 50/
   Reject Reason Code pairs in the DHCPBNDACK message are implicitly
   acked by the DHCPBNDACK message.  If the DHCPBNDUPD message only con-
   tains one binding update and that update is rejected, a DHCPBNDACK
   with a single Assigned IP Address / Reject Reason Code pair MUST be
   sent.


3.2.  DHCP Binding Status

   This option is used to convey the current state of a binding. This
   option is mandatory for DHCPBNDUPD messages.

   Code     Len  Type
   +-----+-----+-----+
   | 230 |  1  | 1-7 |
   +-----+-----+-----+
   Legal values for this option are:

   Value Binding Status
   ----- ------------------------------------------------
   1     FREE       Lease has never been used
   2     ACTIVE     Lease is assigned to a client
   3     EXPIRED    Lease has expired
   4     RELEASED   Lease has been released by client
   5     ABANDONED  A server, or client flagged address as unusable
   6     RESET      Lease was freed by some external agent
   7     BACKUP     Lease belongs to secondary's private address pool


3.3.  Assigned IP address

   Uses identical code and format to DHCP Option 50 (requested IP
   address).  This option is mandatory for DHCPBNDUPD messages and in
   any DHCPBNDACK message where a Reject Reason Code option appears.

   Code   Len          Address
   +-----+-----+-----+-----+-----+-----+
   |  50 |  4  |  a1 |  a2 |  a3 |  a4 |
   +-----+-----+-----+-----+-----+-----+


Droms, et. al.                                                 [Page 18]

DRAFT                                                      November 1998


3.4.  Absolute time

   This absolute time is used for the lease grant time as well the
   partner-down time.    When used in a DHCPBNDUPD or DHCPBNDACK
   message, it represents the lease grant time.  When used in a DHCPPOLL
   message, it represents the partner-down time.

   An absolute, GMT time value for this option, as time synchronization
   has already been achieved between the source and the target server
   using the time field in the message.  Represented as seconds elapsed
   since Jan 1, 1970 (i.e. ANSI C time_t time value representation).
   Note that this is (at present) a signed field.

   Code   Len           Time
   +------+-----+-----+-----+-----+-----+
   | 231  |  4  |  t1 |  t2 |  t3 |  t4 |
   +------+-----+-----+-----+-----+-----+


3.5.  Number of addresses transferred to Secondary Server

   A 32 bit unsigned long in network byte order. Reports the number of
   addresses transferred by the primary to the secondary server
   (addresses to be used for the secondary server's private address
   pool)

   Code   Len     Number of Addresses
   +-----+-----+-----+-----+-----+-----+
   | 232 |  4  |  n1 |  n2 |  n3 |  n4 |
   +-----+-----+-----+-----+-----+-----+


3.6.  Lease Duration

   Uses the format and code of the standard DHCP IP Address Lease Time
   option (51).  The time is in units of seconds, and is specified as a
   32-bit  unsigned integer. A Lease Duration of 0xFFFFFFFF indicates an
   infinite lease.

   Code   Len         Lease Time
   +-----+-----+-----+-----+-----+-----+
   |  51 |  4  |  t1 |  t2 |  t3 |  t4 |
   +-----+-----+-----+-----+-----+-----+


Droms, et. al.                                                 [Page 19]

DRAFT                                                      November 1998


3.7.  Client Identifier

   The format, code and conventions used are identical to DHCP option
   61.

   Code   Len   Type  Client-Identifier
   +-----+-----+-----+-----+-----+---
   |  61 |  n  |  t1 |  i1 |  i2 | ...
   +-----+-----+-----+-----+-----+---


3.8.  Client Hardware Address

   The format is similar to DHCP option 61. T1 (type) MUST be set to the
   proper ARP hardware address code, as defined in the ARP section of
   RFC 1700 (it MUST NOT be zero!)

   Code   Len   Type   MAC address
   +-----+-----+-----+-----+-----+---
   | 233 |  n  |  t1 |  m1 |  m2 | ...
   +-----+-----+-----+-----+-----+---

   Either Client Id, Client Hardware Address or BOTH MAY be present in
   binding update transactions. At least one of them MUST be present.
   If both are present, the Client Id MUST be used to uniquely identify
   the owner of the binding (exactly as in RFC 2131).

3.9.  Host Name

   Uses the format and code of DHCP option 12.

   Code   Len                 Host Name
   +-----+-----+-----+-----+-----+-----+-----+-----+--
   |  12 |  n  |  h1 |  h2 |  h3 |  h4 |  h5 |  h6 |  ...
   +-----+-----+-----+-----+-----+-----+-----+-----+--


3.10.  Domain Name

   Uses the format and code of DHCP option 15.

   Code   Len   Domain Name
   +-----+-----+-----+-----+-----+-----+--
   |  15 |  n  |  d1 |  d2 |  d3 |  d4 |  ...
   +-----+-----+-----+-----+-----+-----+--


Droms, et. al.                                                 [Page 20]

DRAFT                                                      November 1998


3.11.  Client FQDN

   If an implementation supports Dynamic DNS updates, this option can be
   used to communicate the DNS name that was set. Uses the format and
   code of the Client FQDN option (81) as described in <draft-ietf-dhc-
   dhcp-dns-08.txt>.

   Code   Len   Flags Rcode1 Rcode2 Domain Name
   +-----+-----+-----+------+------+-----+------
   |  81 |  n  |  f  |  r1  |  r2  |  d1 | d2...
   +-----+-----+-----+------+------+-----+------


3.12.  Reject Reason Code

   This option is used to selectively reject binding updates. It MAY be
   used in DHCPBNDACK message, always following an option 50.  Option 50
   contains the IP address of the specific update being rejected.

   Note that a Message option, DHCP Option 56, may be included to give a
   human readable error indication along with the Reject Reason Code.

   Code   Len   Reason code
   +-----+-----+----------+
   | 234 |  1  |    R1    |
   +-----+-----+----------+

   Reason codes :

   0   Reserved
   1   Illegal IP address (not part of any address pool)
   2   Fatal conflict exists: address in use by other client.
   3 - 253 Reserved for new Reason Codes.
   254 Unknown: Error occurred but does not match any reason code
   255 Reserved for code expansion


Droms, et. al.                                                 [Page 21]

DRAFT                                                      November 1998


3.13.  Message

   This option is used to supply a human readable message.  It may be
   used in association with the Reject Reason Code to provide a human
   readable error message for the reject.


   Code   Len      Text
   +-----+-----+------+-----+--
   | 56  |  1  |  c1  | c2  | ...
   +-----+-----+------+-----+--


3.14.  MCLT - Maximum Client Lead Time

   Maximum Client Lead Time, in seconds.  A 32 bit integer value, in
   network byte order. This option MUST be used in DHCPPOLL and DHCPPRPL
   messages, when the server is NOT in normal state.

   Code   Len           Time
   +------+-----+-----+-----+-----+-----+
   | 235  |  4  |  t1 |  t2 |  t3 |  t4 |
   +------+-----+-----+-----+-----+-----+


3.15.  Vendor Class Identifier

   A string which identifies the vendor of the failover protocol
   implementation.

   The code for this option is 60, and its minimum length is 1.

   Code    Len    Vendor Class Identifier
   +-----+-----+-----+-----+-----+--
   | 60  |  n  |  i1 |  i2 |  i3 | ...
   +-----+-----+-----+-----+-----+--


4.  Challenging scenarios for a Failover protocol

   There exist a number of failure scenarios which will challenge the
   correctness guarantees of the Failover protocol.  Two of the
   scenarios that the Failover protocol was specifically designed to
   handle correctly are detailed in this section in order to motivate
   some of the more unusual aspects of the protocol's operations.


Droms, et. al.                                                 [Page 22]

DRAFT                                                      November 1998


4.1.  Primary Server crash before "lazy" update:

   In the case where the primary server sends a DHCPACK to a client for
   a newly allocated IP address and then crashes prior to sending the
   corresponding update to the secondary server, the secondary server
   will have no record of the IP address allocation.  When the secondary
   server takes over, it may well try to allocate that IP address to a
   different client.  In the case where the first client to receive the
   IP address is not on the net at the time (yet while there was still
   time to run on its lease), an ICMP echo (i.e., ping) will not prevent
   the secondary server from allocating that IP address to different
   client.

   This is handled in the protocol by having the primary and secondary
   allocate addresses for new clients from distinct address pools.

   A more likely (in that DHCPRENEWs are presumably more common than
   DHCPDISCOVERs) and more subtle version of this problem is where the
   primary server crashes after extending a client's lease time, and
   before updating the secondary with a new time using a lazy update.
   After the secondary takes over, if the client is not connected to the
   network the secondary will believe the client's lease has expired
   when, in fact, it has not.  In this case as well, the IP address
   might be reallocated to a different client while the first client is
   still using it.

   This scenario is handled by the Failover protocol through control of
   the lease time and the use of the maximum client lead time (MCLT).
   See the next section for details.

4.2.  Network partition where servers can't communicate but each can
talk to clients:

   Several conditions are required for this situation to occur. First,
   due to a network failure, the primary and secondary servers cannot
   communicate.  As well, some of the DHCP clients must be able to
   communicate with the primary server, and some of the clients must now
   only be able to communicate with the secondary server.  When this
   condition occurs, both primary and secondary servers could attempt to
   allocate IP addresses for new clients from the same pool of available
   addresses. At some point, then, two clients will end up being
   allocated the same IP address. This will cause potentially serious
   problems when the network failure that created this situation is
   corrected.

   This is handled in the protocol by having the primary and secondary
   servers allocate addresses for new clients from distinct address


Droms, et. al.                                                 [Page 23]

DRAFT                                                      November 1998


   pools.

   The specifics of how these two scenarios are handled are supplied in
   the next section.

5.  Duplicate Address Assignment Control

   There are several ways that the Failover protocol avoids the possi-
   bility of duplicate address assignment.

5.1.  Control of lease time

   The key problem with lazy update is that when the a server fails
   after updating a client with a particular lease time and before
   updating its partner, the partner will believe that a lease has
   expired even though the client still retains a valid lease on that IP
   address.

   In order to handle this problem, a period of time known as the "Max-
   imum Client Lead Time" (MCLT) is defined and must be known to both
   the primary and secondary servers.  Proper use of this time interval
   places an upper bound on the difference allowed between the lease
   time provided to a DHCP client by a server and the lease time known
   by that server's partner.  In order that this is not the maximum
   lease time that a server can ever provide to a client, during a lazy
   update the updating server typically updates its partner with lease
   time information which is longer than the lease time previously given
   to the client.  This allows that server to give a longer lease time
   to the client the next time the client renews its lease.

   When moving to the PARTNER-DOWN state (where a server is allowed to
   reallocate the partner's IP addresses), a server will wait the Max-
   imum Client Lead Time before allocating any IP addresses from its
   partner's pool to any new DHCP clients.  Thus, any clients which have
   a lease on an IP address with a lease time greater than that known by
   the server moving into PARTNER-DOWN state will either have contacted
   that server during the MCLT period or their leases will have expired.

   When a server has transitioned to PARTNER-DOWN state, it MUST NOT
   reallocate an IP address from one client to another client until an
   additional maximum client lead time interval after the lease on the
   first client expires. (Actually, until the maximum client lead time
   after what it believes to be the lease expiration time of the first
   client.)

   The fundamental relationship on which much of the correctness of this
   protocol depends is that the lease expiration time known to a DHCP
   client MUST NOT be more than the maximum client lead time greater


Droms, et. al.                                                 [Page 24]

DRAFT                                                      November 1998


   than the lease expiration time known to a server's partner.

   The remainder of this section makes the above fundamental relation-
   ship more explicit.

   This protocol requires a DHCP server to deal with several different
   lease intervals and places specific restrictions on their relation-
   ships. The purpose of these restrictions is to allow the other server
   in the pair to be able to make certain assumptions in the absence of
   an ability to communicate between servers.

   The different lease times are:

      o desired client lease interval

        The desired client lease interval is the lease interval that a
        DHCP server would like to give to a DHCP client in the absence
        of any restrictions imposed by the Failover protocol.  Its
        determination is outside of the scope of this protocol. Typi-
        cally this is the result of external configuration of a DHCP
        server.

      o actual client lease interval

        The actual client lease internal is the lease interval that a
        DHCP server gives out to a DHCP client.  It may be shorter than
        the desired client lease interval (as explained below).

      o desired partner server lease interval

        The desired partner server lease interval is the lease expira-
        tion interval the local server tells to its partner.

      o acknowledged partner server lease interval

        The acknowledged partner server lease interval is the interval
        the partner server has most recently acknowledged.

   The key restriction (and guarantee) that any server makes with
   respect to lease intervals is that the actual client lease interval
   never exceeds the acknowledged partner server lease interval (if any)
   by more than a fixed amount.  This fixed amount is called the "Max-
   imum Client Lead Time" (MCLT).

   The MCLT MAY be configurable, but for correct server operation it
   MUST be the same and known to both the primary and secondary servers.

   It is transmitted from the primary to the secondary in every message


Droms, et. al.                                                 [Page 25]

DRAFT                                                      November 1998


   sent with the RESTART bit set, and also in every poll and poll reply
   message.  The secondary MUST ensure that its value agrees with that
   of the primary.  See section 3.14 concerning the MCLT Option.

   A server MUST record in its stable storage both the local server
   lease interval and the most recently acknowledged partner server
   lease interval for each IP address binding.  It is assumed that the
   desired client lease interval can be determined through techniques
   outside of the scope of this protocol.

   Again, the fundamental relationship among these times which MUST be
   maintained is:

       actual client lease interval <
       ( acknowledged partner lease interval + MCLT )

   The "acknowledged partner lease interval" is the acknowledged secon-
   dary server lease interval for the primary server, and it would be
   the acknowledged primary server lease interval for the secondary
   server when it is operating out of contact with the primary server.

   Figure 5.1-1 illustrates a initial lease to a client using the rules
   discussed in the example which follows it.


Droms, et. al.                                                 [Page 26]

DRAFT                                                      November 1998


          DHCP                 Primary             Secondary
          Client               Server               Server

            |                     |                    |
            | >-DHCPDISCOVER->    |                    |
            |     <---DHCPOFFER-< |                    |
            |                     |                    |
            | >-DHCPREQUEST->     |                    |
            |   (selecting)       |                    |
            |                     |                    |
            |  <--------DHCPACK-< |                    |
            |      ^    (MCLT)    |                    |
            |      :              | >-DHCPBNDUPD-->    |
            |      :              |  (1/2 MCLT + X )   |
            |      :              |                    |
            |      :              |     <-DHCPBNDACK-< |
            |   MCLT / 2          |                    |
           ...     :             ...                  ...
            |      :              |                    |
            |      V              |                    |
            | >-DHCPREQUEST->     |                    |
            |      (renew)        |                    |
            |                     |                    |
            |  <--------DHCPACK-< |                    |
            |      ^    (X)       |                    |
            |      :              | >-DHCPBNDUPD-->    |
            |      :              |   ( 1/2 X + X )    |
            |      :              |                    |
            |      :              |     <-DHCPBNDACK-< |
            |    X / 2            |                    |
            |      :              |                    |
           ...    ...            ...                  ...

           Figure 5.1-1:  Lazy Update Message Traffic
                          X = Desired Client Lease Interval


   DISCUSSION:

      This protocol mandates no algorithm concerning these lease inter-
      vals, as long as above fundamental relationship is preserved.

      In the interests of clarity, however, let's examine a specific
      example.  The MCLT in this case is 1 hour.  The desired client
      lease interval is 3 days, and its renewal time is half the lease
      interval.


Droms, et. al.                                                 [Page 27]

DRAFT                                                      November 1998


      The rules for this example are:

      o What to tell the client:

        Take the remainder of the acknowledged partner server lease
        interval.  If this is a new lease, then this value will be zero.
        If this remainder plus the MCLT is greater than the desired
        client lease interval, give the client the desired client lease
        interval else give the client the remainder plus the MCLT.

      o What to tell the failover partner server:

        Take the renewal interval (typically half of the actual client
        lease interval), and add to it the desired client lease inter-
        val.

      In operation this might work as follows:

      When a primary server makes an offer for a new lease on an IP
      address to a DHCP client, it determines the desired client lease
      interval (in this case, 3 days).  It then examines the ack-
      nowledged partner lease interval (which in this case is zero) and
      determines the remainder of the time left to run, which is also
      zero.  To this it adds the the MCLT.  Since the actual client
      lease interval cannot be allowed to exceed the remainder of the
      current partner lease interval plus the MCLT, the offer made to
      the client is for the remainder of the current partner lease
      interval (i.e., zero) plus the MCLT.  Thus, the actual client
      lease interval is 1 hour.

      Once the primary server has performed the ACK to the DHCP client,
      it will update the secondary server with the lease information.
      However, the desired partner server lease interval will be com-
      posed of the one half of the current actual client lease interval
      added to the desired client lease interval. Thus, the secondary
      server is updated with a DHCPBNDUPD with a lease interval of 3
      days + 1/2 hour specified in the Lease Duration Option (Option
      51).

      When the primary server receives an ACK to its update of the
      secondary server's (partner's) lease interval, it records that as
      the acknowledged partner server lease interval.  A server MUST NOT
      send a DHCPBNDACK in response to a DHCPBNDUPD message until it is
      sure that the information in the DHCPBNDUPD message resides in its
      stable storage.  Thus, the primary server in this case can be sure
      that the secondary server has recorded the desired partner server
      lease interval in its stable storage when the primary server
      receives a DHCPBNDACK message from the secondary server.


Droms, et. al.                                                 [Page 28]

DRAFT                                                      November 1998


      When the DHCP client attempts to renew at T1 (approximately one
      half an hour from the start of the lease), the primary server
      again determines the desired client lease interval, which is still
      3 days.  It then compares this with the remaining acknowledged
      partner server lease interval (3 days + 1/2 hour) and adjusts for
      the time passed since the secondary was last updated (1/2 hour).
      Thus the remaining time on the acknowledged partner server lease
      interval is 3 days.  Adding the MCLT to this yields 3 days plus 1
      hour, which is less than the desired client lease interval of 3
      days.  So the client is renewed for the desired client lease
      interval -- 3 days.

      When the primary DHCP server updates the secondary DHCP server
      after the DHCP client's renewal ACK is complete, it will calculate
      the desired partner server lease interval as the T1 fraction of
      the actual client lease interval (1/2 of 3 days this time = 1.5
      days).  To this it will add the desired client lease interval of 3
      days, yielding a total desired partner server lease interval of
      4.5 days.  In this way, the primary attempts to have the secondary
      always "lead" the client in its understanding of the client's
      lease interval so as to be able to always offer the client the
      desired client lease interval.

      Once the initial actual client lease interval of the MCLT is past,
      the protocol operates effectively like the DHCP protocol does
      today in its behavior concerning lease intervals. However, the
      guarantee that the actual client lease interval will never exceed
      the remaining acknowledged partner server lease interval by more
      than the MCLT allows full recovery from a variety of failures.

5.2.  Controlled re-allocation of IP addresses

   When in PARTNER-DOWN state (after a period defined in detail in sec-
   tion 6.5.2 has passed), a there are no restrictions on reallocating a
   lease from one client to another.

   In any other state, a server cannot reallocate an address from one
   client to another without first notifying (through a DHCPBNDUPD mes-
   sage) and receiving acknowledgement (through a DHCPBNDACK message)
   that its partner is aware that that first client is not using the
   address.

   This could be modeled in the following way (though this specific
   implementation is in no way required).  An "available" IP address on
   a server may be allocated to any client.  An IP address which was
   leased to a client and which expired or was released by that client
   would take on a new state, say "pending-available".  When an IP
   address became "pending-available", the partner server would be


Droms, et. al.                                                 [Page 29]

DRAFT                                                      November 1998


   notified that this IP address was "available" through a DHCPBNDUPD.
   When the sending server received the DHCPBNDACK for that IP address
   showing it was "available", it would move the IP address from
   "pending-available" to "available", and it would be available for
   allocation to any clients.

   A server MAY reallocate an IP address in "pending-available" state to
   the same client with no restrictions.


5.3.  Secondary renewal of leases

   When operating in NORMAL state, a secondary server MAY process
   DHCPREQUEST messages for renewal or rebinding leases.  In this case,
   the requirements for control of lease time and re-allocation of IP
   addresses are the same as that of the primary server.


6.  Server Operation

   This section discusses the operation of a server implementing the
   Failover protocol using the state transition diagram in Figure 6.2-1.
   This is the common state transition diagram for both servers in a
   pair.

6.1.  Server Initialization

   When a server starts it starts out in STARTUP state.  See section 6.4
   below for details.

6.2.  Establishing Communications Integrity

   Central to the operation of the Failover protocol is a notion of
   "communications okay" or "communications failed".  State transitions
   are taken in many cases when the status of communications with the
   partner changes.

   A specific discipline exists for establishing and verifying communi-
   cations integrity.  Communications is set to "okay" whenever a mes-
   sage sent is acked by the partner.  After an implementation dependent
   length of time from the communications "okay" event the communica-
   tions with the partner are deemed to have "failed" if no subsequent
   acknowledgments have been received.  Whenever a DHCPPRPL, DHCPUP-
   DATEDONE, DHCPPOOLRESP or DHCPBNDACK is received this time period is
   restarted.

   Obviously, as the time period elapses, a server SHOULD send DHCPPOLL
   messages in order to elicit a DHCPPRPL message in reply, which will


Droms, et. al.                                                 [Page 30]

DRAFT                                                      November 1998


   reset the time period.

   While an implementation SHOULD restart this time period on every
   DHCPUPDATEDONE, DHCPPOOLRESP or DHCPBNDACK or DHCPRPL, it MAY choose
   to only restart it on a DHCPPRPL.

   This technique ensures that two-way communications integrity exists
   between the servers.  Were the timeout period to be reset on the
   receipt of any message from the partner, a network failure where one
   server could send but not receive messages to the partner could lead
   to failure of the entire redundant DHCP subsystem.  For example, in a
   situation where the primary could send but not receive any messages,
   the secondary would never take over from the primary and yet DHCP
   clients would not receive any service.

6.3.  Server State Transitions

   Figure 6.2-1 is the diagram of the server state transitions. The
   remainder of this section contains information important to the
   understanding of that diagram.

   The server stays in the current state until all of the actions speci-
   fied on the state transition are complete.  If communications fails
   during one of the actions, the server simply stays in the current
   state and attempts a transition whenever the conditions for a transi-
   tion are later fulfilled.

   In the state transition diagram below, the "+" or "-" in the upper
   right corner of each state is a notation about whether communication
   is ongoing with the other server.

   The legend "responsive", "partially-responsive", or "unresponsive" in
   each state indicates whether the server is responsive to DHCP client
   requests in the respective state.  The terms "responsive" and
   "unresponsive" have the obvious meanings, while "partially-
   responsive" means that a DHCP server may respond to DHCPREQUEST mes-
   sages that are RENEWAL or REBINDING, but to no other messages.

   In the state transition diagram below, when communication is reesta-
   blished between the two servers, each must record the state of the
   partner when communication was restored.  State transitions on one
   server in some cases imply state transitions on the partner server,
   so a record of the current state of the partner server must be kept
   by each server.

   If a message is received from a partner with the state equal to zero
   (0), then the receiving server should respond to that message with a
   DHCPPRPL if it was a DHCPPOLL, but under no circumstances should it


Droms, et. al.                                                 [Page 31]

DRAFT                                                      November 1998


   consider communications to be "okay", nor take any state transitions
   based on receipt of that message.

   If the state of the partner changes while communicating a server
   moves through the communications-failed transition and into whatever
   state results.  It then immediately moves through whatever state
   transition is appropriate given the current state of the partner
   server.

   DISCUSSION:

      The point of this technique is simplicity, both in explanation of
      the protocol and in its implementation.  The alternative to this
      technique of memory of partner state and automatic state transi-
      tion on change of partner state is to have every state in the fol-
      lowing diagram have a state transition for every possible state of
      the partner.  With the approach adopted, only the states in which
      communications are reestablished require a state transition for
      each possible partner state.

   The current state of a server must be recorded in stable storage and
   thus be available to the server after a server restart.


Droms, et. al.                                                 [Page 32]

DRAFT                                                      November 1998


        +---------------+  V  +--------------+
        |    RECOVER  - |  |  |   STARTUP  - |
        |(unresponsive) |  +->|(unresponsive)|
        +---------------+     +--------------+
           Comm. OK             +-----------------+
          Other State:-RECOVER  |  PARTNER DOWN - |<-----+
          |      |              | (responsive)    |      |
         All   POTENTIAL-       +-----------------+      |
       Others  CONFLICT------------ | --------+  ^(see   |
          |                     Comm. OK      |  | 6.93) |
         UPDATEREQ(ALL)       Other State:    |  +-----+ |
       Wait UPDATEDONE         |        |     | Comm.  | |
     Wait MCLT from fail   RECOVER  All Others| Failed | |
      +--------------+         |        V     V  |     | |
      |RECOVER-DONE +|      +--+    +--------------+   | |
      |(unresponsive)|      |       |  POTENTIAL + |<--+ |
      +--------------+   Wait for +>|  CONFLICT    |     |
         Comm. OK         Other   | |(unresponsive)|<--- | --+
     +--Other State:-+    State:  | +--------------+     |   |
     |   |           |   RECOVER  |         |            |   |
     |   All      POTENT.  DONE   | Resolve Conflict     |   |
     |  Others:  CONFLICT-- | ----+     (see 6.9)        |   |
     | Wait for             V               V            |   |
     | Other State: NORMAL +-----------------+           |   |
     |   V                 |     NORMAL    + | External  |   |
     |   +--+----------+-->|(see 6.72, 6.73) |-Command-->+   |
     |      ^          ^   +-----------------+           |   |
     |      |          |            |                    |   |
     |  Wait for   Comm. OK       Comm.            External  |
     |   Other      Other        Failed            Command   |
     |   State:     State:          |                or  |   |
     |RECOVER-DONE  NORMAL     Start Safe        Safe    |   |
     |      |     COMM. INT.  Period Timer       Period  |   |
     |   Comm. OK.     |            V            expiration  |
     |  Other State:   |  +------------------+           |   |
     |    RECOVER      +--| COMMUNICATIONS - |-----------+   |
     V      +-------------|   INTERRUPTED    |   Comm. OK    |
    RECOVER               |  (responsive)    |--Other State:-+
    RECOVER-DONE--------->+------------------+   All Others

           Figure 6.2-1:  Server state diagram.


Droms, et. al.                                                 [Page 33]

DRAFT                                                      November 1998


6.4.  STARTUP state

   The STARTUP state affords an opportunity for a server to probe its
   partner server, before starting to service DHCP clients.

   DISCUSSION:

      Without the STARTUP state, a server would likely start in a state
      derived from its previously stored state (held in stable storage),
      if any.  However, this may be inconsistent with the current state
      of the partner.  The STARTUP state affords the opportunity for a
      server to potentially learn the partner's state and determine if
      that state is consistent with its derived starting state or
      whether some significant state change has occurred at the partner
      that forces the server to start in another state.  This is
      especially critical if significant time has elapsed while the
      server was down.


6.4.1.  Operation while in STARTUP state

   Whenever a server is in STARTUP state, it MUST be unresponsive to
   DHCP client requests, and so the time spent in the STARTUP state is
   necessarily short, typically on the order of a few seconds to a few
   tens of seconds.  The exact time spent in the STARTUP state is imple-
   mentation dependent, and the primary and secondary server are not
   required to spend the same amount of time in the STARTUP state.

   Whenever any message is sent to the partner while in STARTUP state
   the STARTUP bit MUST be set in the 'flags' field of the message
   header.


6.4.2.  Transition out of STARTUP state

   Each server starts out in startup state every time it initializes
   itself, and performs the following algorithm as part of its initiali-
   zation:

      1.  Ensure that the RESTART bit is set in the 'flags' field of the
          failover message header.  Once set, the RESTART bit must
          remain set in all failover messages sent by the server to the
          partner until the first acknowledgment of a message is
          received from that partner.  This is required to assure that
          the partner knows that the server has restarted, even if the
          partner itself is unreachable for a long while.


Droms, et. al.                                                 [Page 34]

DRAFT                                                      November 1998


          Do not send any messages until step 5.

      2.  Is there any record in stable storage of a previous failover
          state?  If yes, set previous-state to the last recorded state
          in stable storage, and continue with step 3.

          Is there any configuration information that indicates that
          this server was previously running but lost its stable
          storage?  Such information must typically come from some
          administrative intervention, since it is difficult for a
          server to distinguish first startup from a startup after it
          has lost its stable storage.  If yes, then set the previous-
          state to RECOVER, and set the time-of-failure to whatever time
          was configured, and go on to step 3.  This time-of-failure
          will be used in the transition out of the RECOVER state into
          the RECOVER-DONE state, below.

          If there is no record of any previous failover state in stable
          storage nor of any previous operational activity for this
          server, then set the previous-state to RECOVER and set the
          time-of-failure to a time before the maximum-client-lead-time
          before now.  If using standard Posix times, 0 would typically
          do quite well.

      3.  Is the previous-state NORMAL?  If yes, set the previous-state
          to COMMUNICATIONS-INTERRUPTED.

      4.  Start the STARTUP state timer.  The time that a server remains
          in the STARTUP state (absent any communications with its
          partner) is implementation dependent (and would typically be
          configurable).  It should be long enough to poll several times
          and stand a good chance to receive a response to at least one
          poll from a heavily loaded partner across a slow network.

      5.  Start sending DHCPPOLL messages (with both the RESTART and
          STARTUP bits set in the 'flags' field).

      6.  Wait for "communications okay", i.e., the receipt of an
          DHCPPRPL message.

          When a DHCPPRPL message is received, clear the RESTART flag,
          clear the STARTUP flag, and set the current state to the
          previous-state.

          If the partner is in PARTNER-DOWN state, and if its partner-
          down time (received in the DHCPPRPL message in the Absolute
          Time Option) is later than the last recorded time of operation
          of this server, then set the current state to RECOVER.


Droms, et. al.                                                 [Page 35]

DRAFT                                                      November 1998


          Then, transition to the current state and take the "communica-
          tions okay" state transition based on the current state of
          this server and the partner.

      7.  If the startup time expires, take an implementation dependent
          action:  The server MAY go to the previous-state, or the
          server MAY wait.

          Reasons to go to previous-state and begin processing:

          If the current server is the only operational server, then if
          it waits, there will be no operational DHCP servers.  This
          situation could occur very easily where one server fails and
          then the other crashes and reboots.  If the rebooting server
          doesn't start processing DHCP client requests without first
          being in communication with the other server, then the level
          of DHCP redundancy is not particularly high.  This is an
          appropriate approach if the possibility of partition is low,
          or if the safe period expiration time is well beyond the time
          at which an operator would notice and react to a partition
          situation.  It is also quite appropriate if the safe period
          will never expire.

          Reasons to wait:

          If the current server has been down for longer than the
          maximum-client-lead-time, and it is partitioned from the other
          server, then when it returns it will attempt to use its own
          available addresses to allocate to new DHCP clients, and the
          other server may well be in PARTNER-DOWN state and may have
          already allocated some of those available addresses to DHCP
          clients.  In cases where the possibility of partition is high,
          and the safe period expiration time is less than the likely
          operator reaction time, this is a good approach to use.

6.5.  PARTNER-DOWN state

   PARTNER-DOWN state is a state either server can enter.  When in this
   state, the server does not assume that the other server could still
   be operating and servicing a different set of clients, but instead
   assumes that it is the only server operating.  For this reason, only
   one server should be operating in this state at a time.


6.5.1.  Upon Entry to PARTNER-DOWN state

   When entering PARTNER-DOWN state a server MUST record the time of
   entry, and must transmit it during every DHCPPOLL message or DHCPPRPL


Droms, et. al.                                                 [Page 36]

DRAFT                                                      November 1998


   message sent while in PARTNER-DOWN state.


6.5.2.  Operation while in PARTNER-DOWN state

   A server in PARTNER-DOWN state MUST respond to DHCP client requests.
   It will allow renewal of all outstanding leases on IP addresses, and
   will allocate IP addresses from its own pool, and after a fixed
   period of time (the MCLT interval) has elapsed from entry into
   PARTNER-DOWN state, it will allocate IP addresses from the set of all
   available IP addresses.

   Once a server has entered NORMAL state, the PARTNER-DOWN state is
   entered only on command of an external agency (typically an adminis-
   trator of some sort) or after the expiration of an externally config-
   ured minimum safe-time after the beginning of COMMUNICATIONS-
   INTERRUPTED state.

   Any available IP address tagged as belonging to the other server (at
   entry to PARTNER-DOWN state) MUST NOT be used until the maximum-
   client-lead-time beyond the entry into PARTNER-DOWN state has
   elapsed.

   A server in PARTNER-DOWN state MUST NOT allocate an IP address to a
   DHCP client different from that to which it was allocated at the
   entrance to PARTNER-DOWN state until the maximum-client-lead-time
   beyond the its expiration time has elapsed.  If this time would be
   earlier than the current time plus the maximum-client-lead-time, then
   the current time plus the maximum-client-lead-time is used.

   Two options exist for lease times given out while in PARTNER-DOWN
   state, with different ramifications flowing from each.

   If the server wishes the Failover protocol to protect it from loss of
   stable storage in PARTNER-DOWN state, then it should ensure that the
   MCLT based lease time restrictions in Section 5.1 are maintained,
   even in PARTNER-DOWN state.

   If the server wishes to forego the protection of the Failover proto-
   col in the event of loss of stable storage, then it need recognize no
   restrictions on actual client lease times while in PARTNER-DOWN
   state.

   A server in PARTNER-DOWN state MUST poll its partner and attempt to
   establish communications and synchronization.

   While a server is in PARTNER-DOWN state, it MUST send the absolute
   time of entry into PARTNER-DOWN using the absolute time option in


Droms, et. al.                                                 [Page 37]

DRAFT                                                      November 1998


   every DHCPPOLL and DHCPRPL message sent.

6.5.3.  Transitions out of PARTNER-DOWN state

   When a server in PARTNER-DOWN state succeeds in contacting its
   partner, its actions are conditional on the state and flags received
   in the message from the other server.

   If the STARTUP bit is set in the 'flags' field of a received DHCPPOLL
   message, the server in PARTNER-DOWN state will send a DHCPPRPL mes-
   sage with its current state (and with the absolute PARTNER-DOWN time
   in the DHCPPRPL).  A server in PARTNER-DOWN state MUST NOT take any
   state transitions based on reestablishing communications if the
   STARTUP bit is set in the 'flags' field of the messages that reesta-
   blished communications.

   If the STARTUP bit is not set in the 'flags' field then a server in
   PARTNER-DOWN state will move into POTENTIAL-CONFLICT state if the
   other server is in the NORMAL, COMMUNICATIONS-INTERRUPTED, PARTNER-
   DOWN, or POTENTIAL-CONFLICT state.

   If the STARTUP bit is not set in the 'flags' field, then a server in
   PARTNER-DOWN state will stay in PARTNER-DOWN state if it detects that
   the other server is in RECOVER state.

   If the STARTUP bit is not set in the 'flags' field, then a server in
   PARTNER-DOWN state moves into NORMAL state if it detects that the
   other server is in RECOVER-DONE state.

6.6.  RECOVER state

   This state indicates that the server has no information in its stable
   storage or that it is re-integrating with a server in PARTNER-DOWN
   state after it has been down.  A server in this state will attempt to
   refresh its stable storage from the other server.

6.6.1.  Operation in RECOVER state

   A server in RECOVER MUST NOT respond to DHCP client request.

   A server in RECOVER state will attempt to reestablish communications
   with the other server.

6.6.2.  Transitions out of RECOVER state

   If the other server is in POTENTIAL-CONFLICT state when communica-
   tions are reestablished, then the server in RECOVER state will move
   to POTENTIAL-CONFLICT state itself.


Droms, et. al.                                                 [Page 38]

DRAFT                                                      November 1998


   If the other server is in RECOVER state, then this server SHOULD sig-
   nal an error and halt processing.

   If the other server is in any other state, then the server in RECOVER
   state will request an update of missing binding information by send-
   ing an UPDATEREQ message.  If the server has been configured to indi-
   cate that it has lost its stable storage, it will send an
   UPDATEREQALL message, otherwise it will send an UPDATEREQ message.

   It will wait for an UPDATEDONE message, and upon receipt of that mes-
   sage it will start a timer whose expiration is set to a time equal to
   the the time the server went down (if known) or the current time (if
   the down-time is unknown) plus the maximum-client-lead-time.  When
   this timer goes off, the server will go into RECOVER-DONE state.
   This is to allow any IP addresses that were allocated by this server
   prior to loss of its client binding information in stable storage to
   contact the other server or to time out.

   See Figure 6.6-1.

   DISCUSSION:

      The actual requirement on this wait period in RECOVER is that it
      start when the recovering server went down, not necessarily when
      it came back up.  If the time when the recovering server failed is
      known, then it could be communicated to the recovering server, and
      the wait period could be reduced to the maximum-client-lead-time
      less the difference between the current time and the time the
      server failed. In this way, the waiting period could be minimized.

   If an UPDATEDONE message isn't received within an implementation
   dependent amount of time, and no DHCPBNDUPD message are being
   received, then the UPDATEREQ(ALL) message will be re-transmitted.


Droms, et. al.                                                 [Page 39]

DRAFT                                                      November 1998


                A                                        B
              Server                                  Server

                |                                        |
             RECOVER                               PARTNER-DOWN
                |                                        |
                | >--DHCPUPDATEREQ------------->         |
                |                                        |
                |        <-----------------DHCPBNDUPD--< |
                | >--DHCPBNDACK---------------->         |
               ...                                      ...
                |                                        |
                |        <-----------------DHCPBNDUPD--< |
                | >--DHCPBNDACK---------------->         |
                |                                        |
                |        <-------------DHCPUPDATEDONE--< |
                |                                        |
       Wait MCLT from last known                         |
          time of operation                              |
                |                                        |
           RECOVER-DONE                                  |
                |                                        |
                | >--DHCPPOLL-(RECOVER-DONE)--->         |
                |        <-------------------DHCPPRPL--< |
                |                                        |
                |                                     NORMAL
                |                                        |
                |        <----------(NORMAL)-DHCPPOLL--< |
                | >--DHCPPRPL------------------>         |
                |                                        |
             NORMAL                                      |
                |                                        |
                |                                        |

              Figure 6.6-1:  Transition out of RECOVER state


Droms, et. al.                                                 [Page 40]

DRAFT                                                      November 1998


6.7.  NORMAL state

   NORMAL state is the state used by a server when it can communicate
   with the other server.  When in this state, the primary responds to
   DHCP all clients requests and while the secondary only responds to
   renewal or rebinding requests which it receives.  This is one of the
   few states where the operation of the primary and secondary servers
   are quite different.


6.7.1.  Upon Entry to NORMAL state

   When entering NORMAL state, a server will send to the other server
   all currently unacknowledged DHCPBNDUPD messages.

   When the above process is complete, if the server entering NORMAL
   state is a secondary server, then it will will request IP addresses
   for allocation using the DHCPPOOLREQ message and the techniques
   described in section 2.5.


6.7.2.  Operation in NORMAL state: Primary Server

   When in NORMAL state, the primary server takes the following actions
   to implement the Failover protocol:

      o Lease Time Calculations

        As discussed in section 5.1, "Control of lease time", the lease
        interval given to a DHCP client can never be more than the
        maximum-client-lead-time greater than the acknowledged partner-
        server-lease-interval.

        As long as the primary server adheres to this constraint, the
        specifics of the lease intervals that it gives to either the
        DHCP client or the secondary DHCP server are implementation
        dependent. One possible approach is shown in section 5.1, but
        that particular approach is in no way required by this protocol.

      o Lazy Update of Secondary Server

        After an ACK of a IP address binding, the primary server
        attempts to update the secondary with the binding information.
        The lease time used in the update of the secondary MUST be at
        least that given to the DHCP client in the DHCPACK.  It MAY,
        however, be longer.


Droms, et. al.                                                 [Page 41]

DRAFT                                                      November 1998


      o Reallocation of IP Addresses Between Clients

        Whenever a client binding is released, a DHCPBNDUPD message must
        be sent to the secondary server, setting the binding state to
        RELEASED. However, until a DHCPBNDACK is received for this mes-
        sage, the IP address cannot be allocated to another client.  It
        can be allocated to the same client again.


6.7.3.  Operation in NORMAL state: Secondary Server

   In normal state, the secondary server receives binding updates from
   the primary server in DHCPBNDUPD messages.  It records these in its
   client binding database in stable storage and then sends the
   corresponding DHCPBNDACK message to the primary server.  It MUST
   ensure that the information is recorded in stable storage prior to
   sending the DHCPBNDACK message back to the primary server.

   While in NORMAL state, the secondary server MUST also acquire a
   series of IP addresses from the primary server to be used to satisfy
   DHCPDISCOVER requests from DHCP clients when in COMMUNICATIONS-
   INTERRUPTED state.  See section 2.5 for details of this acquisition
   process.

   The secondary server periodically polls the primary server with the
   DHCPPOLL message.  If it fails to receive a DHCPPRPL message in reply
   after a configured number of retries or some administratively deter-
   mined time, the secondary server transitions into COMMUNICATIONS-
   INTERRUPTED state.  Both the DHCPPOLL and DHCPPRPL messages carry the
   current state of the sender.

   When in normal state, a secondary server is responsive to DHCP client
   requests if they are RENEWAL or REBINDING. Any changes it makes to
   any leases based on these responses should be sent to the primary
   server using DHCPBNDUPD messages.


6.7.4.  Transitions out of NORMAL state

   If an external command is received by a server in NORMAL state
   informing it that its partner is down, then transition into PARTNER-
   DOWN state.

   If a server in NORMAL state fails to receive acks to any messages
   sent to its partner for an implementation dependent period of time,
   it will move into COMMUNICATIONS-INTERRUPTED state. (See section
   6.2).


Droms, et. al.                                                 [Page 42]

DRAFT                                                      November 1998


   If a server in NORMAL state receives any messages from its partner
   where the partner has changed state from that expected by the server
   in NORMAL state, then the server should transition into
   COMMUNICATIONS-INTERRUPTED state and take the appropriate state tran-
   sition from there.  For example, it would be expected for the partner
   to transition from POTENTIAL-CONFLICT into NORMAL state, but not for
   the partner to transition from NORMAL into POTENTIAL-CONFLICT state.

6.8.  COMMUNICATIONS-INTERRUPTED State

   A server goes into COMMUNICATIONS-INTERRUPTED state whenever it is
   unable to communicate with the other server.  Primary and secondary
   servers cycle automatically (without administrative intervention)
   between NORMAL and COMMUNICATIONS-INTERRUPTED state as the network
   connection between them fails and recovers, or as the partner server
   cycles between operational and non-operational.  No duplicate IP
   address allocation can occur while the servers cycle between these
   states.


6.8.1.  Upon Entry to COMMUNICATIONS-INTERRUPTED state

   When a server enters COMMUNICATIONS-INTERRUPTED state, if it has been
   configured to support an automatic transition out of COMMUNICATIONS-
   INTERRUPTED state and into PARTNER-DOWN state, then a timer MUST be
   started for an implementation dependent period.

   It is anticipated that some alarm condition would be raised upon the
   transition from NORMAL state to COMMUNICATIONS-INTERRUPTED state.


6.8.2.  Operation in COMMUNICATIONS-INTERRUPTED State

   In this state a server may respond to DHCP client requests.  When
   allocating new IP addresses, each server allocates from its own IP
   address pool.  When responding to renewal requests, each server will
   allow continued renewal of a DHCP client's current lease on an IP
   address, although the renewal period MUST not exceed the maximum
   client lead time (MCLT) beyond the lease time already acknowledged by
   the other server.

   A server operates in COMMUNICATIONS-INTERRUPTED state as the primary
   server does in NORMAL state.

   However, since the server cannot communicate with its partner in this
   state, the acknowledged-partner-lease-time will not be updated in any
   new bindings.  This is likely to eventually cause the actual-client-
   lease-times to be the current-time plus the maximum-client-lead-time


Droms, et. al.                                                 [Page 43]

DRAFT                                                      November 1998


   (unless this is greater than the desired-client-lease-time).


6.8.3.  Transition out of COMMUNICATIONS-INTERRUPTED State

   If the safe period timer expires while a server is in the
   COMMUNICATIONS-INTERRUPTED state, it will go immediately into
   PARTNER-DOWN state.

   If an external command is received by a server in COMMUNICATIONS-
   INTERRUPTED state informing it that its partner is down, it will go
   immediately into PARTNER-DOWN state.

   If communications is restored with the other server, then the server
   in COMMUNICATIONS-INTERRUPTED state will go into another state based
   on the state of the partner:

      o partner in NORMAL or COMMUNICATIONS-INTERRUPTED

        The server will transition into the NORMAL state.

      o partner in RECOVER

        Stay in COMMUNICATIONS-INTERRUPTED state.

      o partner in RECOVER-DONE

        Transition into NORMAL state.

      o partner in PARTNER-DOWN or POTENTIAL-CONFLICT

        Transition into POTENTIAL-CONFLICT state.

      o partner in PAUSED

        Stay in COMMUNICATIONS-INTERRUPTED state.

      o partner in SHUTDOWN

        Transition into PARTNER-DOWN state.


Droms, et. al.                                                 [Page 44]

DRAFT                                                      November 1998


             Primary                                Secondary
              Server                                  Server

              NORMAL                                  NORMAL
                | >--DHCPPOLL----->:                     |
                |                  :<--------DHCPPOLL--< |
                |                  :                     |
           COMMUNICATIONS          :              COMMUNICATIONS
             INTERRUPTED           :                INTERRUPTED
                |                  :                     |
                | >--DHCPPOLL------------------>         |
                |        <-------------------DHCPPRPL--< |
              NORMAL                                     |
                |                                        |
                | >--DHCPBNDUPD---------------->         |
                |        <-----------------DHCPBNDACK--< |
                |                                        |
                |        <-------------------DHCPPOLL--< |
                | >--DHCPPRPL------------------>         |
                |                                     NORMAL
                |                                        |
                |        <-----------------DHCPBNDUPD--< |
                | >--DHCPBNDACK---------------->         |
               ...                                      ...
                |                                        |
                |        <----------------DHCPPOOLREQ--< |
                | >--DHCPPOOLRESP-(2)---------->         |
                |                                        |
                | >--DHCPBNDUPD-(#1)----------->         |
                |        <-----------------DHCPBNDACK--< |
                |                                        |
                |        <----------------DHCPPOOLREQ--< |
                | >--DHCPPOOLRESP-(0)---------->         |
                |                                        |
                | >--DHCPBNDUPD-(#2)----------->         |
                |        <-----------------DHCPBNDACK--< |
                |                                        |

       Figure 6.8-1:  Transition from NORMAL to COMMUNICATIONS-
                      INTERRUPTED and back (example with 2
                      addresses allocated to secondary)


Droms, et. al.                                                 [Page 45]

DRAFT                                                      November 1998


6.9.  POTENTIAL-CONFLICT state

   This state indicates that the two servers are attempting to re-
   integrate with each other, but at least one of them was running in a
   state that did not guarantee automatic reintegration would be
   possible.  In POTENTIAL-CONFLICT state the servers may determine that
   the same IP address has been offered and accepted by two different
   DHCP clients.

   It is a goal of this protocol to minimize the possibility that
   POTENTIAL-CONFLICT state is ever entered.

6.9.1.  Upon Entry to POTENTIAL-CONFLICT

   When a primary server enters POTENTIAL-CONFLICT state it should
   request that the secondary send it all updates of which it is
   currently unaware by sending an UPDATEREQ message to the secondary
   server.

   A secondary server entering POTENTIAL-CONFLICT state will wait for
   the primary to send it an UPDATEREQ message.

6.9.2.  Operation in POTENTIAL-CONFLICT state

   Any server in POTENTIAL-CONFLICT state MUST be unresponsive to incom-
   ing DHCP requests.


6.9.3.  Transitions out of POTENTIAL-CONFLICT state

   If communications fails with the partner while in POTENTIAL-CONFLICT
   state, then a primary server will transition to PARTNER-DOWN state
   and a secondary server will stay in POTENTIAL-CONFLICT state.

   Whenever either server receives an UPDATEDONE message from its
   partner, it MUST transition to NORMAL state.  This will cause the
   primary server to leave POTENTIAL-CONFLICT state prior to the secon-
   dary, since the primary sends an UPDATEREQ message and receives an
   UPDATEDONE before the secondary sends an UPDATEREQ message and
   receives its UPDATEDONE message.

   When a secondary server receives an indication that the primary
   server has transitioned from POTENTIAL-CONFLICT to NORMAL state, it
   SHOULD send an UPDATEREQ message to the primary server.


Droms, et. al.                                                 [Page 46]

DRAFT                                                      November 1998


              Primary                                Secondary
              Server                                  Server

                |                                        |
         POTENTIAL-CONFLICT                    POTENTIAL-CONFLICT
                |                                        |
                | >--DHCPUPDATEREQ------------->         |
                |                                        |
                |        <-----------------DHCPBNDUPD--< |
                | >--DHCPBNDACK---------------->         |
               ...                                      ...
                |                                        |
                |        <-----------------DHCPBNDUPD--< |
                | >--DHCPBNDACK---------------->         |
                |                                        |
                |        <-------------DHCPUPDATEDONE--< |
              NORMAL                                     |
                | >--DHCPPOLL--(NORMAL) ------->         |
                |        <-------------------DHCPPRPL--< |
                |                                        |
                |        <--------------DHCPUPDATEREQ--< |
                |                                        |
                | >--DHCPBNDUPD---------------->         |
                |        <-----------------DHCPBNDACK--< |
               ...                                      ...
                |                                        |
                | >--DHCPBNDUPD---------------->         |
                |        <-----------------DHCPBNDACK--< |
                |                                        |
                | >--DHCPUPDATEDONE------------>         |
                |                                        |
                |                                     NORMAL
                |                                        |
                |        <----------------DHCPPOOLREQ--< |
                | >--DHCPPOOLRESP-------------->         |
                |                                        |

           Figure 6.9-1:  Transition out of POTENTIAL-CONFLICT


Droms, et. al.                                                 [Page 47]

DRAFT                                                      November 1998


6.10.  RECOVER-DONE state

   This state exists to allow an interlocked transition for one server
   from RECOVER state and another server from PARTNER-DOWN or
   COMMUNICATIONS-INTERRUPTED state into NORMAL state.

6.10.1.  Operation in RECOVER-DOWN state

   A server in RECOVER-DONE state is responsive only to RENEWAL and
   REBINDING DHCP messages.

6.10.2.  Transitions out of RECOVER-DONE state

   When a server in RECOVER-DONE state determines that its partner
   server has entered NORMAL state, then it will transition into NORMAL
   state as well.


6.11.  PAUSED state

   This state exists to allow one server to inform another that it will
   be out of service for what is predicted to be a relatively short
   time, and to allow the other server to transition to COMMUNICATIONS-
   INTERRUPTED state immediately and (if it is a secondary server) to
   begin servicing clients with no interruption.

   A server which is aware that it is shutting down temporarily SHOULD
   send one or more DHCPPOLL messages with the 'state' field containing
   PAUSED.

   While a server may or may not transition internally into PAUSED
   state, the 'previous' state determined when it is restarted MUST be
   the state the server was in prior to receiving the command to shut-
   down and restart and its entry into the PAUSED state.

6.11.1.  Upon entry to PAUSED state

   When entering PAUSED state, the server MUST remember the previous
   state, and use that state as the previous state when it is restarted.

6.11.2.  Transitions out of PAUSED state

   A server transitions out of PAUSED state by being restarted.  At that
   time, the previous state MUST be the state the server was in prior to
   entering the PAUSED state.


Droms, et. al.                                                 [Page 48]

DRAFT                                                      November 1998


6.12.  SHUTDOWN state

   This state exists to allow one server to inform another that it will
   be out of service for what is predicted to be a relatively long time,
   and to allow the other server to transition immediately to PARTNER-
   DOWN state, and take over completely for the server going down.

   A server which is aware that it is shutting down SHOULD send one or
   more DHCPPOLL messages with the 'state' field containing SHUTDOWN.

   While a server may or may not transition internally into SHUTDOWN
   state, the 'previous' state determined when it is restarted MUST be
   the state active prior to the command to shutdown unless the server
   detects that its partner has moved to PARTNER-DOWN, in which case it
   MUST be RECOVER.

6.12.1.  Upon entry to SHUTDOWN state

   When entering SHUTDOWN state, the server MUST record the previous
   state in stable storage for use when the server is restarted.  It
   also MUST record the current time as the last time operational.

   A DHCPPOLL message SHOULD be sent to the partner with the 'state'
   field containing SHUTDOWN state.

6.12.2.

   A server in SHUTDOWN state MUST be unresponsive to DHCP client input.

   If a server receives any message indicating that the partner has
   moved to PARTNER-DOWN state while it is in SHUTDOWN state (e.g in
   response to the DHCPPOLL it sent containing SHUTDOWN state), then it
   MUST record RECOVER state as the previous state to be used when it is
   restarted.

   A server SHOULD wait for a few seconds after informing the partner of
   entry into SHUTDOWN state (if communications are okay) to determine
   if it will enter PARTNER-DOWN state.


6.12.3.  Transitions out of SHUTDOWN state

   A server transitions out of SHUTDOWN state by being restarted.

7.  Safe Period

   Due to the restrictions imposed on each server while in
   COMMUNICATIONS-INTERRUPTED state, long-term operation in this state


Droms, et. al.                                                 [Page 49]

DRAFT                                                      November 1998


   is not feasible for either server.  One reason that these states
   exist at all, is to allow the servers to easily survive transient
   network communications failures of a few minutes to a few days
   (although the actual time periods will depend a great deal on the
   DHCP activity of the network in terms of arrival and departure of
   DHCP clients on the network).

   Eventually, when the servers are unable to communicate, they will
   have to move into a state where they no longer can re-integrate
   without the some possibility of a duplicate IP address allocation.
   There are two ways that they can move into this state (known as
   PARTNER-DOWN).

   They can either be informed by external command that, indeed, the
   partner server is down.  In this case, there is no difficulty in mov-
   ing into the PARTNER-DOWN state since it is an accurate reflection of
   reality and the protocol has been designed to operate correctly (even
   during reintegration) if, when in PARTNER-DOWN state the partner is,
   indeed, down.

   The more difficult scenario is when the servers are running unat-
   tended for extended periods, and in this case an option is provided
   to configure something called a "safe-period" into each server.  This
   OPTIONAL safe-period is the period after which either the primary or
   secondary server will automatically transition to PARTNER-DOWN from
   COMMUNICATIONS-INTERRUPTED state.  If this transition is completed
   and the partner is not down, then the possibility of duplicate IP
   address allocations will exist.

   The goal of the "safe-period" is to allow network operations staff
   some time to react to a server moving into COMMUNICATIONS-INTERRUPTED
   state.  During the safe-period the only requirement is that the net-
   work operations staff determine if both servers are still running --
   and if they are, to either fix the network communications failure
   between them, or to take one of the servers down before the  expira-
   tion of the safe-period.

   The length of the safe-period is installation dependent, and depends
   in large part on the number of unallocated IP addresses within the
   subnet address pool and the expected frequency of arrival of previ-
   ously unknown DHCP clients requiring IP addresses.  Many environments
   should be able to support safe-periods of several days.

   During this safe period, either server will allow renewals from any
   existing client.  The only limitation concerns the need for IP
   addresses for the DHCP server to hand out to new DHCP clients and the
   need to re-allocate IP addresses to different DHCP clients.


Droms, et. al.                                                 [Page 50]

DRAFT                                                      November 1998


   The number of "extra" IP addresses required is equal to the expected
   total number of new DHCP clients encountered during the safe period.
   This is dependent only on the arrival rate of new DHCP clients, not
   the total number of outstanding leases on IP addresses.

   In the unlikely event that a relatively short safe period of an hour
   is all that can be used (given a dearth of IP addresses or a very
   high arrival rate of new DHCP clients), even that can provide sub-
   stantial benefits in allowing the DHCP subsystem to ride through
   minor problems that could occur and be fixed within that hour.  In
   these cases, no possibility of duplicate IP address allocation
   exists, and re-integration after the failure is solved will be
   automatic and require no operator intervention.

8.  Security

   The Failover protocol MAY be secured with a simple shared secret mes-
   sage digest which covers each message.  Since there are a number of
   configuration parameters that must be the same on each server in a
   pair, it is not unreasonable to require a shared secret be configured
   as well.

   Only information within the packet and covered by the message digest
   is used for operation of the protocol.  It is for this reason that
   the IP address of the sending server is sent in the 'sending server
   id' field of the fixed header of the failover message when it might
   seem that the same information could be recovered from the source
   address of the IP packet.


9.  Extended Discussion

   Some areas in the draft above warranted more extended discussion than
   was feasible to insert directly into the next.

      1.  UDP or TCP

          There has been debate about the utility of using UDP for the
          Failover protocol, since it doesn't supply guaranteed
          delivery.  UDP has been chosen as the protocol of choice for
          the failover protocol due to the following factors:

          First, it is important to recognize that mere receipt of a
          packet by the other server in the pair (e.g., receipt of a
          DHCPBNDUPD packet by the secondary server) is not sufficient
          for the primary to update its own bindings database with new
          information about what the secondary knows.  In all cases of


Droms, et. al.                                                 [Page 51]

DRAFT                                                      November 1998


          transfers of binding information, the server of a DHCPBNDUPD
          message MUST update its own stable storage prior to replying
          with a DHCPBNDACK message (except in the marginal case where
          all of the updates are rejected).  An action is required by
          the receiving server and an explicit ACK is needed by the
          sending server to ensure the integrity of the protocol.  So,
          just knowing that the other server has received a Failover
          protocol packet is not intrinsically interesting.

          Second, the DHCP protocol, both the client and server side, is
          being implemented in progressively smaller and smaller
          machines.  While this progression is most evident in DHCP
          clients, there exist implementations today of DHCP servers
          embedded in devices that are by no stretch of the imagination
          traditional "servers" running mainstream operating systems.
          In many ways, the Failover protocol is very well suited to
          such devices.  Adding additional protocol infrastructure
          requirements to implement the Failover protocol might prevent
          its implementation in devices that in some ways need it most
          (devices with limited stable storage of their own).

          Third, there are only a few cases where the Failover protocol
          requires guaranteed delivery of packets.  In particular, the
          normal Primary to Secondary DHCPBNDUPD message do not have to
          be delivered reliably.  The consequences of lost DHCPBNDUPD
          messages are handled by the use of the MCLT, for the simple
          reason that since these messages are "lazy", they may not get
          delivered because of a server Failover prior to their
          transmission.  The protocol is robust in the face of loss of
          either a DHCPBNDUPD message or a DHCPBNDACK message.

          Furthermore, a technique known as "fire and forget" may be
          used with this protocol and two cooperating implementations.
          If the DHCPBNDACK message contains all of the information ori-
          ginally in the DHCPBNDUPD message, then the DHCPBNDUPD message
          may be transmitted and forgotten by the sending server (typi-
          cally the primary).  When and if the secondary receives the
          DHCPBNDUPD and replies with a DHCPBNDACK message and the pri-
          mary receives it, the primary will update its stable storage
          with a new picture of what the secondary knows about the lease
          time.  If either of these messages is lost, the only downside
          is that the DHCP client associated with the binding in ques-
          tion may receive a shorter lease for one lease period than it
          would otherwise.   This "fire and forget" technique could sub-
          stantially ease both the complexity of implementation and
          memory requirements of an implementation of the Failover pro-
          tocol, especially where two servers were communicating over a
          very slow link.


Droms, et. al.                                                 [Page 52]

DRAFT                                                      November 1998


10.  Acknowledgments

   Ralph Droms started it all, by sketching out an initial interserver
   draft that embodied ideas from several past IETF meetings.  In that
   draft, he acknowledged contributions by Jeff Mogul, Greg Minshall,
   Rob Stevens, Walt Wimer, Ted Lemon, and the DHC working group.

   Kim Kinnear and Bob Cole each extended that draft, separately and
   then together, until they created an interserver draft that supported
   any number of servers.  The complexity of that approach was just too
   great, and that draft wasn't greeted with enthusiasm by many, includ-
   ing its authors.

   It did however lead to a much simpler approach embodied in the first
   Failover draft by Greg Rabil, Mike Dooley, Arun Kapur and Ralph
   Droms.  This draft posited only two servers -- a primary and a secon-
   dary.

   Kim Kinnear then wrote the Safe Failover draft to layer on top of the
   Failover Draft and increase its robustness in the face of certain
   rare network failures.

   At the spring 1998 IETF meeting in LA, the DHC working group said
   that they wanted a merged Failover and Safe Failover draft.  Steve
   Gonczi and Bernie Volz stepped up and produced the raw material for
   such a merged draft, along with a new message format designed around
   DHCP options and other extensions and clarifications.  Kim Kinnear
   edited their work into draft format and made other changes in time
   for the Summer Chicago IETF meeting.

   During the summer and fall of 1998, two groups have been working on
   separate implementations of the evolving draft.  Bernie Volz and
   Steve Gonczi constitute one group, and Kim Kinnear, Mark Stapp and
   Paul Fox make up the other.  These two groups have worked together to
   produce considerable changes and simplifications of the protocol dur-
   ing this period, and Steve Gonczi and Kim Kinnear have edited these
   changes into this latest revision in time for submission to the
   December 1998 Orlando IETF meeting.

   These most recent changes have been reviewed by Ralph Droms, Greg
   Rabil, Bernie Volz, Steve Gonczi, Mark Stapp, Paul Fox, and Kim Kin-
   near.  This does not preclude any of these people from expressing
   disagreement with what is contained in this draft at any future time.

   Many people have reviewed the various earlier drafts that went into
   this result.  At American Internet, ideas were contributed by Brad
   Parker.  At Cisco Systems, Paul Fox, and Ellen Garvey have contri-
   buted greatly to the form of the protocol.  Glenn Waters of Bay


Droms, et. al.                                                 [Page 53]

DRAFT                                                      November 1998


   Networks contributed ideas and enthusiasm to make a Failover protocol
   that was both "safe" and "lazy".


11.  References


   [RFC 2131] Droms, R., "Dynamic Host Configuration Protocol", RFC
      2131, March 1997.

   [RFC 2119] Bradner, S. "Key words for use in RFCs to Indicate
      Requirement Levels", RFC 2119.

   [RFC 2132] Alexander, S.,  Droms, R., "DHCP Options and BOOTP Vendor
      Extensions", Internet RFC 2132, March 1997.

12.  Author's information

      Ralph Droms
      323 Dana Engineering
      Bucknell University
      Lewisburg, PA  17837

      Phone: (717) 524-1145
      EMail: droms@bucknell.edu


      Greg Rabil, Mike Dooley, Arun Kapur
      Lucent Technologies (Quadritek)
      10 Valley Stream Parkway, Suite 240
      Malvern, PA 19355

      Phone: (800) 208-2747

      EMail: grabil@lucent.com
             mdooley@lucent.com
             akapur@lucent.com


      Kim Kinnear
      Mark Stapp
      Cisco Systems
      250 Apollo Drive
      Chelmsford, MA  01824

      Phone: (978) 244-8000


Droms, et. al.                                                 [Page 54]

DRAFT                                                      November 1998


      EMail: kkinnear@cisco.com
             mjs@cisco.com


      Steve Gonczi, Bernie Volz
      Process Software Corporation
      959 Concord St.
      Framingham, MA  01701

      Phone: (508) 879-6994

      EMail: gonczi@process.com
             volz@process.com


Droms, et. al.                                                 [Page 55]