HIPRG                                                        Tao Yuan
Internet Draft                                             Qinqin Chu
Intended status: Experimental                             Zhangfeng Hu
Expires: January 5, 2012                                        Bo Hu
                                                          Shanzhi Chen
                                                          BUPT, China
                                                          July 5, 2011


          HIP-based Failure Detection and Recovery for Multihoming
             draft-yuan-hiprg-failure-detection-recovery-02.txt


Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html

   This Internet-Draft will expire on January 5, 2012.

Copyright Notice

   Copyright (c) 2011 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents carefully,
   as they describe your rights and restrictions with respect to this
   document.


Yuan, et al.           Expires January 5, 2012                [Page 1]

Internet-Draft   HIP Failure Detection and Recovery          July 2011


Abstract

   This document defines a mechanism of failure detection as well as
   communication recovery for the scenarios of multi-homing of Host
   Identity Protocol (HIP). It can be applied to HIP by using newly
   defined HIP parameters, so that a HIP-enabled multihomed host can
   timely detect the failures and switch to an alternative locator if
   currently used link breaks down or currently used locator is
   deprecated.


Yuan, et al.           Expires January 5, 2012                [Page 2]

Internet-Draft   HIP Failure Detection and Recovery          July 2011


Table of Contents

   1. Introduction................................................4
   2. Alternative Proposals........................................4
      2.1. Reap4hip...............................................4
   3. HIP locator management.......................................5
      3.1. Locator States.........................................5
      3.2. Locators of Multihomed Host.............................6
   4. Failure Detection and Communication Recovery for HIP..........7
      4.1. Initialization Module...................................7
      4.2. Link-Monitor Module.....................................8
      4.3. Link-Keep Module........................................9
      4.4. Recovery Module........................................10
   5. Packet Formats.............................................11
      5.1. New HIP Parameter......................................11
         5.1.1. KEEPALIVE_TIMEOUT.................................11
      5.2. NOTIFICATION Parameter.................................12
         5.2.1. KEEPALIVE........................................12
         5.2.2. INVALD_LOCATOR....................................12
   6. Security Considerations.....................................12
   7. IANA Considerations........................................13
   8. References.................................................13
      8.1. Normative References...................................13
      8.2. Informative References.................................13
   Authors' Addresses............................................14


Yuan, et al.           Expires January 5, 2012                [Page 3]

Internet-Draft   HIP Failure Detection and Recovery          July 2011


1. Introduction

   Multihomed host often has multiple locators available to provide the
   capability of fault tolerance. This means it can use an alternative
   locator without re-establishing connections between two communicating
   hosts if currently used link breaks down. To achieve such failure
   tolerance capability, a HIP-enabled host needs a failure detection
   mechanism to decide when to process such a locator switching, and
   further how to recover ongoing HIP communications between two hosts.

   RFC5206 describes HIP basic support for multi-homing. It mainly
   focused on how to set up multiple connections between multihomed host
   and correspondent peers, but left multi-homing support of advanced
   scenarios for further study. So we propose a failure detection and
   communication recovery extension for HIP in this document. It can be
   implemented easily without too many modifications to HIP protocol.

2. Alternative Proposals

2.1. Reap4hip

   Reap4hip is a solution which provides fault tolerance capability for
   multihomed HIP hosts. In order to support such configurations,
   reap4hip introduced REAP protocol which described in RFC5534 for HIP,
   and thus updated the behavior for HIP multihomed hosts currently
   defined.

   The REAP protocol provides failure detection and locator pair
   exploration for the Shim6 protocol. Reap4hip applied REAP's features
   to HIP. It created a REAP instance to communicate with the ESP and
   HIP layers. For the on-going source/destination locator pair, rep4hip
   sets up two timers to detect communicating condition. Besides, a new
   message named Keepalive message is introduced to maintain the
   reachability of an active locator. Once the path breaks down, hosts
   try all possible locator pairs sequentially until a new available
   locator pair is found.

   To implement reap4hip, a new message named Probe message is defined
   for HIP. It is used to probe new locator pairs for two communicating
   hosts after currently used locator pair becomes invalid. The probe
   process would try all possible locator pairs by sending Probe
   messages sequentially until at least one working locator pair is
   found. Since the Probe message format is huge, the probe process may
   negatively impact the performance of both host and network. Besides,


Yuan, et al.           Expires January 5, 2012                [Page 4]

Internet-Draft   HIP Failure Detection and Recovery          July 2011


   Probe message and Keepalive message is not a predefined HIP parameter,
   so it can not be applied to existing HIP implementation directly.

3. HIP locator management

3.1. Locator States

   A HIP locator can be in one of the three different states at any time.
   The state is used to track the reachability of a locator. The three
   states include UNVERIFIED, ACTIVE and DEPRECATED. UNVERYFIED locator
   can not be used unless its reachability has been verified. ACTIVE
   state indicates that the reachability of the locator has been
   verified and the locator has not been deprecated. DEPRECATED state
   indicates an expired locator.

   As described in RFC5206, the following state changes are allowed:

   If current state of a locator is UNVERIFIED, the state can be changed
   to ACTIVE if the reachability procedure completes successfully, or
   can be changed to DEPRECATED if the lifetime expires.

   If current state of a locator is ACTIVE, the state can be changed to
   UNVERIFIED if there has been no traffic on the address for a period
   of time, or can be changed to DEPRECATED if the lifetime expires.

   If current state of a locator is DEPRECATED, the state can be changed
   to UNVERIFIED if the host receives a new lifetime for the locator.

   A HIP host may receive a LOCATOR parameter in R1, I2, UPDATE and
   NOTIFY packets. For each locator listed in the LOCATOR parameter, the
   host should check whether it is already bound to the SPI indicated.
   If the locator is already bound, its lifetime is updated. If the
   state of the listed locator is DEPRECATED, the state is changed to
   UNVERIFIED. If the locator is not already bound, the locator is added,
   and its state is set to UNVERIFIED.  Mark all locators that were not
   listed in the LOCATOR parameter as DEPRECATED. Furthermore, if the
   LOCATOR parameter contains a new Preferred Locator, the host should
   initiate a change of the Preferred Locator.

   A HIP host must verify the reachability of an UNVERIFIED locator. The
   state of a newly learned locator MUST initially be set to UNVERIFIED
   unless the new locator is advertised in a R1 packet as a new
   Preferred Locator.  A host may also want to verify the reachability
   of an ACTIVE locator again after a period of time, in which case it
   would set the state of the locator to UNVERIFIED and reinitiate
   address verification.


Yuan, et al.           Expires January 5, 2012                [Page 5]

Internet-Draft   HIP Failure Detection and Recovery          July 2011


3.2. Locators of Multihomed Host

   Before failure detection operations, the two communicating HIP hosts
   must be aware of the set of locators that can be used during later
   communication. As described in RFC5206, HIP multihomed hosts can
   convey multiple locators by sending UPDATE message, or by exchanging
   locators in R1 and I2 during the HIP Base Exchange.

   The locator set contains at least one pair of locators (the currently
   used locators) in state of ACTIVE, and may contain several UNVERIFIED
   or DEPRECATED locators. Along with the normal communication, host
   must verify the reachability of all UNVERIFIED locators. After the
   address verification procedure, additional ACTIVE locators may be
   determined. Due to such a locator management mechanism for HIP, in
   most case, the locator set of two communicating hosts contains more
   than one pair of ACTIVE locators in multi-homing scenarios.

   Furthermore, multihomed host must select one specific locator from
   the locator set as Preferred Locator on which it prefers to receive
   data for communicating peers. By default, the locators used in the
   HIP Base Exchange are selected as the Preferred Locators if they are
   not indicated explicitly. A host can change the Preferred Locator
   when needed through setting the "P" bit.

   This document considers a HIP multi-homing scenario in which the two
   communicating hosts have exchanged locator information, and indicated
   Preferred Locators. As shown in Figure 1, the function of failure
   detection and communication recovery is added at the MH block. Once
   currently used preferred Locators become unreachable, hosts should
   choose another locator as Preferred Locator to maintain communication.


Yuan, et al.           Expires January 5, 2012                [Page 6]

Internet-Draft   HIP Failure Detection and Recovery          July 2011


                   +-------+
                   |  TCP  |  (sockets bound to HITs)
                   +-------+
                       |
                   +-------+
        ---------->|  ESP  |  {HIT_s, HIT_d} <-> SPI
       |           +-------+
       |               |
   +-------+       +-------+
   |  MH   |------>|  HIP  |  {HIT_s, HIT_d, SPI} <-> {IP_s, IP_d, SPI}
   +-------+       +-------+
                       |
                   +-------+
                   |  IP   |
                   +-------+
      Figure 1: Architecture for HIP Mobility and Multi-homing (MH)

4. Failure Detection and Communication Recovery for HIP

   The failure detection and communication recovery mechanism divides
   HIP multihomed hosts into two different roles: D (Detection) host and
   K (Keepalive) host. D host is responsible for monitoring currently
   used link, and maintains a Send Timer. When D host sends a payload
   packet, it turns on the Send Timer. When receiving a payload packet
   from peers, it turns off the Send Timer. K host is the corresponding
   host of D host. Its main task is to keep the reachability of
   currently used locator. If no other packet has been sent since the
   payload packet has been received, K host transmits a HIP NOTIFY
   message with KEEPALIVE parameter to D host. K host maintains a
   Keepalive Timer to control when to send the NOTIFY message. In
   addition, another procedure of failure detection and communication
   recovery can be added in the reverse direction when needed.

   The failure detection and communication recovery mechanism consists
   of four functional modules, initialization module, link-monitor
   module, link-keep module, and recovery module. These modules are
   added into the MH block. The details of each module are described as
   follows.

4.1. Initialization Module

   Both D host and K host runs initialization module firstly when D host
   triggers failure detection and communication recovery scheme. The
   initialization module makes D host notify the timeout value of its
   Send Timer to the K host as a KEEPALIVE_TIMEOUT parameter in the I2,
   R1, or UPDATE messages.


Yuan, et al.           Expires January 5, 2012                [Page 7]

Internet-Draft   HIP Failure Detection and Recovery          July 2011


   For K host, upon receiving packets that contains KEEPALIVE_TIMEOUT
   parameter, initialization module will make it set its Keepalive Timer.
   The timeout value of Keepalive Timer is set to the value of
   KEEPALIVE_TIMEOUT parameter.

   The initialization module is responsible for the initial work of
   failure detection and communication recovery. After the work of
   initialization module, D host and K host have reached an agreement on
   how frequently NOTIFY message will be sent and how frequently
   detection of failure will take place. The time value of Send Timer
   can be modified in real time in order to adapt to unusual situations.
   The communication of the two hosts is capable of providing fault
   tolerance by initialization module's work.

4.2. Link-Monitor Module

   In HIP, monitoring currently used link aims at providing a mechanism
   to detect the path between one pair of Preferred Locators. If failure
   takes place, a new pair of locators should be chosen as Preferred
   Locators. D host runs link-monitor module to detect the potential
   failures. D host has four states under the control of link-monitor
   module. The behavior of D host is described as follows:

   Wait state. After the work of initialization module or having
      received packet from K host, D host is in Wait state. It is the
      initial state of D host. D host is waiting for the arrival of
      upper packet. Whenever outgoing payload packets are generated, D
      host enters Send Packet state.

   Send Packet state. D host in Send Packet state is prepared for
      sending packet to K host. It will turn on Send Timer, and then
      send out the packet. After that, D host enters Detect state.

   Detect state. While in Detect state, D host is monitoring the
      currently used link, and the Send Timer is on. If D host received
      packet from K host in Detect state, it will turn off Send Timer,
      and change back to Wait state. Otherwise, after Send Timer expires,
      if no return traffic from K host, D host will assume currently
      used link has broken down, and enters Failure state to explore a
      new pair of locators.

   Failure state. After the expiration of Send Timer, D host enters
      Failure state. It will turn off Send Timer, and get ready for
      communication recovery. Then, the recovery module will take in
      charge of D host.


Yuan, et al.           Expires January 5, 2012                [Page 8]

Internet-Draft   HIP Failure Detection and Recovery          July 2011


4.3. Link-Keep Module

   K host runs Link-Keep Module to keep the reachability of currently
   used locator. K host has five states under the control of Link-Keep
   Module. The behavior of K host is described as follows:

   Wait state. After the work of initialization module, having sent out
      packet to D host or the expiration of Keepalive Timer, K host is
      in Wait state. It is the initial state of K host. K host is
      waiting for the arrival of packet from upper layer protocol or D
      host. Whenever outgoing payload packets are generated, D host
      enters Send Packet state to send out the packet. Whenever incoming
      payload packets are received, K host enters Receive Packet state
      to bring incoming packets into the proper process.

   Send Packet state. K host in Send Packet state is prepared for
      sending packet to D host. At this time, if Keepalive Timer is on,
      K host will turn it off, and then send out the packet, change back
      to Wait state.

   Receive Packet state. While incoming payload packets are received, K
      host is in Receive Packet state. If the received packet is a HIP
      NOTIFY packet with INVALD_LOCATOR parameter, K host will enter
      Failure state. If it is a payload packet, K host will enter Keep
      state.

   Keep state. K host maintains the reachability of currently used
      locator in Keep state, and the Keepalive Timer is on. If K host in
      Keep state is going to send packet, it will turn off Keepalive
      Timer, and enter Send Packet state. Otherwise, K host should turn
      on Keepalive Timer, and send HIP NOTIFY packet with KEEPALIVE
      parameter to keep the reachability of currently used locator. This
      process is under the control of Keepalive Timer. Usually, K host
      sends out HIP NOTIFY packet with KEEPALIVE parameter at the
      frequency of every one-third to one-half of the time value of
      Keepalive Timer until timer expires. After Keepalive Timer expires,
      K host changes back to Wait state.

   Failure state. K host is in Failure state for having received a HIP
      NOTIFY packet with INVALD_LOCATOR parameter. It will turn off Send
      Timer, and get ready for communication recovery. Then, the
      recovery module will take in charge of K host.


Yuan, et al.           Expires January 5, 2012                [Page 9]

Internet-Draft   HIP Failure Detection and Recovery          July 2011


4.4. Recovery Module

   After the failure is detected, D host runs recovery module to start a
   process to choose new Preferred Locator pair to recover
   communications.  There are two work modes that D host can choose to
   recovery, Active mode and Passive mode.

   A. Active mode

   If D host is in active mode, it sends a LOCATOR parameter to K host
   in an UPDATE packet as described in RFC5206. In this packet, a new
   Preferred Locator is set. If no responses are received during a
   period of time, D host will assume the testing locator does not work.
   Then, it may enter passive mode or choose another locator as
   Preferred Locator, and re-send UPDATE packet. The locator selection
   order could be a local policy. This process may continue for a period
   of time until available locator pair is found. If D host have tried
   all its locators, and could not find any available locator, it must
   enter passive mode. The behavior of D host in active mode while under
   the control of Recovery Module is described as follows:

   Choose Address state. It is the initial state of Active mode. D host
      will choose a new address as Preferred Locator to send UPDATE
      packet to K host. Then it enters Wait Response state. D host can
      switch to Passive mode on this state.

   Wait Response state. After sending out the first UPDATE packet, D
      host is in Wait Response state. If no response received from K
      host, D host will back to Choose Address state. Otherwise, once
      receiving the replied UPDATE packet from K host, it will enter HIP
      Update state.

   HIP Update state. In this state, D host will finish the address
      change with K host as defined in HIP protocol. Then, the
      communication can be recovered.

   For K host under the control of Recovery Module, after receiving the
   UPDATE packet, it will send back an UPDATE packet as described in
   RFC5206 to D host. Then, the two hosts continue their communications
   using the new Preferred Locator pair.

   B. Passive mode

   If D host is in passive mode, it selects one ACTIVE locator from the
   locator set which has been exchanged with K host before. If there is
   no ACTIVE locator can be used, D host must enter active mode. D host
   sends a NOTIFY packet to K host by using the selected ACTIVE locator.


Yuan, et al.           Expires January 5, 2012               [Page 10]

Internet-Draft   HIP Failure Detection and Recovery          July 2011


   In this NOTIFY packet, the Notify Message Type is set to
   INVALID_LOCATOR, which notifies errors in the communication path. The
   NOTIFY packet may fail to reach K host. In such a case, D host may
   enter active mode or try another ACTIVE locator if no responses are
   obtained during a period of time. After trying all ACTIVE locators,
   if no available locator is found, D host must enter active mode. The
   behavior of D host in passive mode while under the control of
   Recovery Module is described as follows:

   Choose K host's Address state It is the initial state of passive
      mode. D host will choose a new address of K host to send HIP
      NOTIFY packet with INVALD_LOCATOR parameter. Then it enters Wait
      Response state. D host can switch to active mode when in this
      state.

   Wait Response state. After sending out the HIP NOTIFY packet with
      INVALD_LOCATOR parameter, D host is in Wait Response state. If no
      response received from K host, D host will back to Choose K host's
      Address state. Otherwise, once receiving the replied UPDATE packet
      from K host, it will enter HIP Update state.

   HIP Update: D host and K host will finish the address change as
      defined in HIP protocol. Then, the communication can be recovered.

   For K host under the control of Recovery Module, after receiving the
   NOTIFY packet, it is aware of the failure of previously used path,
   and then it enters active mode. By default, K host may use the
   source/destination locator pair of the NOTIFY packet as new Preferred
   Locator to initiate the active mode. Additionally, K host is also
   allowed to choose Preferred Locator according to local policy.

   D host firstly enters active mode or passive mode is a local policy.
   The main difference in the two modes is the selection of preferential
   changing side of two communicating hosts. No matter which combination
   is chosen, the communication can always be recovered.

5. Packet Formats

5.1. New HIP Parameter

5.1.1. KEEPALIVE_TIMEOUT

   This parameter is used to indicate how quickly KEEPALIVE NOTIFY
   message will be send. The value of KEEPALIVE_TIMEOUT is set to the
   value of Send Timeout of the host. KEEPALIVE_TIMEOUT can be sent in
   the I2, R1, or UPDATE messages. After receiving a packet with


Yuan, et al.           Expires January 5, 2012               [Page 11]

Internet-Draft   HIP Failure Detection and Recovery          July 2011


   KEEPALIVE_TIMEOUT, the peer maps this value to set its Keepalive
   Timer.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           Type = 10           |        Length  = 4            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           Reserved            |      Keepalive Timeout        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
              Figure 2: KEEPALIVE_TIMEOUT Parameter Format

   The fields are exactly the same as defined in RFC5534.

   Type: This field identifies the parameter and MUST be set to 10
   (Keepalive Timeout).

   Length: 4.

   Reserved: 16-bit field reserved for future use. Set to zero upon
   transmit and MUST be ignored upon receipt.

   Keepalive Timeout: Value in seconds corresponding to suggested
   Keepalive Timeout value for the peer.

5.2. NOTIFICATION Parameter

5.2.1. KEEPALIVE

   This document defines a new kind of NOTIFICATION parameter named
   KEEPALIVE. It is sent if there is no outgoing packet to inform
   corresponding peer that current path is OK. Types of NOTIFY message
   in the range 0-16383 are intended for reporting errors and in the
   range 16384-65535 for other status information.  Thus, the type value
   of KEEPALIVE should range among 16384-65535.

5.2.2. INVALD_LOCATOR

   This document defines a new kind of NOTIFICATION parameter named
   INVALD_LOCATOR. Host in passive mode sends it to corresponding peer
   to indicate an error in the path. The type value of INVALD_LOCATOR
   should range among 0-16383.

6. Security Considerations

   To be done.


Yuan, et al.           Expires January 5, 2012               [Page 12]

Internet-Draft   HIP Failure Detection and Recovery          July 2011


7. IANA Considerations

   This document defines a KEEPALIVE Notify Message Type and a
   INVALD_LOCATOR Notify Message Type for HIP.

8. References

8.1. Normative References

   [RFC5206] P. Nikander, T. Henderson, C. Vogt, J.Arkko, "End-Host
             Mobility and Multihoming with the Host Identity Protocol",
             RFC 5206, April 2008.

   [RFC5201] R.Moskowitz, P. Nikander, P. Jokela, T. Henderson, "Host
             Identity Protocol", RFC 5201, April 2008.

8.2. Informative References

   [REAP4HIP] A. de la Oliva, M. Bagnulo, "Fault tolerance
              configurations for HIP multihoming", <draft-oliva-hiprg
              reap4hip-00>, July, 2007.

   [RFC5534] J. Arkko, I. van Beijnum, "Failure Detection and Locator
             Exploration Protocol for IPv6 Multihoming", RFC 5534, June,
             2009

   [RFC5533] E. Nordmark. and M. Bagnulo, "Shim6: Level 3 Multihoming
             Shim Protocol for IPv6", RFC 5533, June 2009

   [BFD]     D. Katz, D. Ward, "Bidirectional Forwarding Detection",
             <draftietf-bfd-base-11(work in progress)>, January, 2010


Yuan, et al.           Expires January 5, 2012               [Page 13]

Internet-Draft   HIP Failure Detection and Recovery          July 2011


Authors' Addresses

   Tao Yuan
   State Key Laboratory of Networking and Switching,
   Beijing University of Posts and Telecommunications
   Beijing, P.R.China, 100876
   Email: alberty2004@gmail.com

   Qinqin Chu
   State Key Laboratory of Networking and Switching,
   Beijing University of Posts and Telecommunications
   Beijing, P.R.China, 100876
   Email: qinqinchu@gmail.com

   Zhangfeng Hu
   State Key Laboratory of Networking and Switching,
   Beijing University of Posts and Telecommunications
   Beijing, P.R.China, 100876
   Email: zfhu2001@gmail.com

   Bo Hu
   State Key Laboratory of Networking and Switching,
   Beijing University of Posts and Telecommunications
   Beijing, P.R.China, 100876
   Email: bj.hubo@gmail.com

   Shanzhi Chen
   State Key Laboratory of Networking and Switching,
   Beijing University of Posts and Telecommunications
   Beijing, P.R.China, 100876


Yuan, et al.           Expires January 5, 2012               [Page 14]