HIPRG                                                         Tao Yuan
Internet Draft                                              Qinqin Chu
Intended status: Experimental                             Zhangfeng Hu
Expires: January 3, 2011                                         Bo Hu
                                                          Shanzhi Chen
                                                           BUPT, China
                                                          July 2, 2010


          HIP-based Failure Detection and Recovery for Multihoming
             draft-yuan-hiprg-failure-detection-recovery-00.txt


Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html

   This Internet-Draft will expire on January 2, 2011.

Copyright Notice

   Copyright (c) 2010 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents carefully,
   as they describe your rights and restrictions with respect to this
   document.


Yuan, et al.           Expires January 3, 2011                [Page 1]

Internet-Draft   HIP Failure Detection and Recovery          July 2010


Abstract

   Fault tolerance is one of greatest feature of multihomed host. This
   document proposes a failure detection and communication recovery
   mechanism for Host Identity Protocol. It can be applied to HIP by
   using new defined HIP parameters. A HIP-enabled multihomed host can
   switch to alternative locator if current link breaks down by adopting
   this mechanism.


Yuan, et al.           Expires January 3, 2011                [Page 2]

Internet-Draft   HIP Failure Detection and Recovery          July 2010


Table of Contents


   1. Introduction................................................4
   2. Alternative Proposals.......................................4
      2.1. Reap4hip...............................................4
   3. HIP locator management......................................5
      3.1. Locator Status.........................................5
      3.2. Locators of Multihomed Host............................5
   4. Failure Detection and Communication Recovery for HIP........6
      4.1. Maintain Locator Status................................6
      4.2. Failure Detection......................................7
      4.3. Communication Recovery.................................8
   5. Packet Formats.............................................10
      5.1. New HIP Parameters....................................10
         5.1.1. KEEPALIVE_TIMEOUT................................10
      5.2. NOTIFICATION PARAMETER................................10
         5.2.1. KEEPALIVE........................................10
         5.2.2. INVALD_LOCATOR...................................11
   6. IANA Considerations........................................11
   7. References.................................................11
      7.1. Normative References..................................11
      7.2. Informative References................................11
   Authors' Addresses............................................12


Yuan, et al.           Expires January 3, 2011                [Page 3]

Internet-Draft   HIP Failure Detection and Recovery          July 2010


1. Introduction

   Multihomed host has multiple locators available for fault tolerance.
   This means it can use an alternative locator if current link breaks
   down without rebuilding connection with correspondent peers. To
   achieve such failure tolerance capability, host needs a failure
   detection mechanism to decide when to process such a locator
   switching, and how to recover communication.

   RFC5206 describes HIP support for multihoming. It mainly focused on
   how to set up multiple connections between multihomed host and
   correspondent peers, but left multihoming support in advanced
   scenarios for further study. We propose a failure detection and
   recovery extension for HIP in this document. It can be implemented
   easily without too many modifications to HIP protocol.

2. Alternative Proposals

2.1. Reap4hip

   Reap4hip is a solution which considers fault tolerance capability in
   multihomed HIP hosts. In order to support such configurations,
   reap4hip introduced REAP protocol which described in RFC5534 for HIP,
   and thus updated the behavior for HIP multihomed hosts currently
   defined.

   The REAP protocol provides failure detection and locator pair
   exploration for the Shim6 protocol. Reap4hip applied REAP's features
   to HIP. It created a REAP instance to communicate with the ESP and
   HIP layers.  For the on-going source/destination locator pair,
   rep4hip sets up two timers to detect communicating condition. Besides,
   a new message named Keepalive message is introduced to maintain the
   reachability of a locator. Once the path breaks down, hosts try all
   possible locator pairs constantly until a new available locator pair
   is found.

   To implement reap4hip, a new message named Probe message is defined
   for HIP. It is used to probe new locator pair for two communicating
   hosts after currently used locator pair become invalid. The probe
   process would try possible locator pairs by sending Probe message
   constantly until at least one working locator pair is found. Since
   the Probe message format is huge, the probe process may negatively
   impact the performance of both host and network. Besides, Probe
   message and Keepalive message is not a HIP parameter, it may not be
   applied to existing HIP code easily.


Yuan, et al.           Expires January 3, 2011                [Page 4]

Internet-Draft   HIP Failure Detection and Recovery          July 2010


3. HIP locator management

3.1. Locator Status

   A HIP locator can have three kinds of status. The status is used to
   track the reachability of a locator. They are: UNVERIFIED, ACTIVE and
   DEPRECATED. UNVERYFIED locator can not be used unless the
   reachability of it has been verified. ACTIVE indicates that the
   reachability of the locator has been verified and the locator has not
   been deprecated. DEPRECATED indicates an expired locator.

   As described in RFC5206, the following state changes are allowed:

   If current status of a locator is UNVERIFIED, the status can be
   changed to ACTIVE if the reachability procedure completes
   successfully, or can be changed to DEPRECATED if the lifetime expires.

   If current status of a locator is ACTIVE, the status can be changed
   to UNVERIFIED if there has been no traffic on the address for some
   time, or can be changed to DEPRECATED if the lifetime expires.

   If current status of a locator is DEPRECATED, the status can be
   changed to UNVERIFIED if the host receives a new lifetime for the
   locator.

3.2. Locators of Multihomed Host

   Before failure detection operates, the two communicating hosts must
   be aware of the set of locators which can be used. As described in
   RFC5206, HIP multihomed hosts can convey multiple locators by sending
   UPDATE message, and also by exchanging locators in R1 and I2 during
   the base exchange.

   Among these locators, multihomed host must declare one as Preferred
   locator on which it prefers to receive data. By default, the locators
   used in the HIP base exchange are the Preferred locators. A host can
   change the Preferred outgoing locator for different reasons, e.g.,
   because receiving a LOCATOR parameter that has the "P" bit set.

   This document considers a HIP multihoming scenario in which the two
   communicating hosts have exchanged locator information, and indicated
   Preferred locators. As shown in Figure 1, the function of failure
   detection and communication recovery is added at the MH block. Once
   currently used preferred locators become unreachable, hosts should
   choose another locator as Preferred locator to maintain communication.


Yuan, et al.           Expires January 3, 2011                [Page 5]

Internet-Draft   HIP Failure Detection and Recovery          July 2010


                   +-------+
                   |  TCP  |  (sockets bound to HITs)
                   +-------+
                       |
                   +-------+
        ---------->|  ESP  |  {HIT_s, HIT_d} <-> SPI
       |           +-------+
       |               |
   +-------+       +-------+
   |  MH   |------>|  HIP  |  {HIT_s, HIT_d, SPI} <-> {IP_s, IP_d, SPI}
   +-------+       +-------+
                       |
                   +-------+
                   |  IP   |
                   +-------+
      Figure 1: Architecture for HIP Mobility and Multihoming (MH)

4. Failure Detection and Communication Recovery for HIP

4.1. Maintain Locator Status

   A host may receive a LOCATOR parameter in R1, I2, UPDATE and NOTIFY
   packets. For each locator listed in the LOCATOR parameter, the host
   should check whether it is already bound to the SPI indicated. If the
   locator is already bound, its lifetime is updated. If the status of
   it is DEPRECATED, the status is changed to UNVERIFIED. If the locator
   is not already bound, the locator is added, and its status is set to
   UNVERIFIED.  Mark all locators that were not listed in the LOCATOR
   parameter as DEPRECATED. Furthermore, if the LOCATOR parameter
   contains a new Preferred locator, the host should initiate a change
   of the Preferred locator.

   A host must verify the reachability of an UNVERIFIED locator. The
   status of a newly learned locator MUST initially be set to UNVERIFIED
   unless the new locator is advertised in a R1 packet as a new
   Preferred locator.  A host may also want to verify the reachability
   of an ACTIVE locator again after some time, in which case it would
   set the status of the locator to UNVERIFIED and reinitiate address
   verification.

   In multihoming scenario, two communicating hosts exchange locator set
   information. This set contains at least one pair of locators (the
   currently used locators) in status of ACTIVE, and may contain several
   UNVERIFIED or DEPRECATED locators. Along with the normal
   communication, host must verify the reachability of all UNVERIFIED
   locators. After the address verification procedure, additional ACTIVE
   locators are available. Due to such a locator management mechanism in


Yuan, et al.           Expires January 3, 2011                [Page 6]

Internet-Draft   HIP Failure Detection and Recovery          July 2010


   HIP, in most case, the locator set of two communicating hosts
   contains more than one pair of ACTIVE locators in multihoming
   scenario.

4.2. Failure Detection

   In HIP, failure detection aims at providing a mechanism to detect the
   path between one pair of Preferred locators. If failure takes place,
   a new pair of locators should be chosen as Preferred locators.

   Bidirectional Forwarding Detection (BFD) is a protocol which can
   provide failure detection on bidirectional path between two hosts. A
   pair of host creates BFD session for the communications path. During
   the communication, hosts transmit BFD packets periodically over the
   path between them, and if one host stops receiving BFD packets for
   long enough, some component in the path to the correspondent peer is
   assumed to have failed. Hosts may negotiate to not send periodic BFD
   packets in order to reduce overhead. Furthermore, hosts should
   negotiate on how quickly BFD packets will be send in order to come to
   an agreement about how rapidly detection of failure will take place.
   The time value can be modified in real time in order to adapt to
   unusual situations.

   REAP protocol described a reachability verification approach similar
   to BFD. The REAP implementation relies on two timers, the Keep Alive
   Timer and the Send Timer, and a control message, named REAP Keepalive
   message. The Send Timer starts each time the host sends a packet and
   stopped each time the host receives a packet from the peer. If no
   answer is received in the Send Timer period, a failure is assumed and
   a locator path exploration is started. Consequently, the Send Timer
   reflects the requirement that when a host sends a payload packet
   there should be some return traffic within Send Timeout seconds. The
   Keep Alive Timer starts each time a host receives a packet from its
   peer, and stopped each time the host sends a payload packet to the
   peer. Keepalive Interval seconds after the last payload packet has
   been received, if no other packet has been sent since the payload
   packet has been received, a REAP Keepalive message is generated in
   question and transmitted so that the peers knows there is a
   reachability problem when it doesn't receive any incoming packets for
   the duration of a Send Timeout period. The Keepalive Timer reflects
   the requirement that when a host receives a payload packet there
   should a similar response towards the peer within Keepalive seconds.

   REAP failure detection is made as a lightweight implementation of BFD.
   REAP uses REAP Keepalive message as a kind of BFD message. The host
   communicates its Send Timeout value to the peer to indicate how
   rapidly detection of failure should take place. Payload data traffic


Yuan, et al.           Expires January 3, 2011                [Page 7]

Internet-Draft   HIP Failure Detection and Recovery          July 2010


   in both directions is observed, when payload traffic is flowing in
   both directions, there is no need to send failure detection packets.
   Only when there is traffic in one direction does the failure
   detection mechanism generate REAP Keepalives in the other direction.
   As a result, whenever there is outgoing traffic and no incoming
   return traffic or REAP Keepalives, there must be failure.

   A specific implementing method of failure detection for HIP can be
   defined as follows:

   1. Hosts should reach an agreement on when to start the detection
      process. The host communicates its Send Timeout value to the peer
      as a KEEPALIVE_TIMEOUT parameter in the I2, R1, or UPDATE messages.
      The peer then maps this value to its Keepalive Timeout value to
      set its Keepalive Timer.

   2. Whenever outgoing payload packets are generated, Send Timer starts.
      The timeout value is set to the value of Send Timeout. Whenever
      incoming payload packets are received, Keepalive Timer starts.
      This timeout value is set to the value of Keepalive Timeout. The
      reception of a Keepalive message leads to stopping the timer
      associated with the return traffic from the peer.

   3. Keepalive Interval seconds after the last payload packet has been
      received, if no other packet has been sent since the payload
      packet has been received, a HIP Keepalive message is transmitted
      to the peer. By default, Keepalive Interval is set as one-third to
      one-half of the Keepalive Timeout value. Actually, NOTIFY message
      can take the role of Keepalive message in HIP scenario to response
      to the peer. The Notify Message Type is set to KEEPALIVE, and
      should continue to be sent at the Keepalive Interval until a
      packet has been received from the peer or the Keepalive Timeout
      expires. KEEPALIVE NOTIFY messages are not sent at all if one or
      more payload packets were sent within the Keepalive Interval

   4. Send Timeout seconds after the transmission of a payload packet
      with no return traffic, the path is assumed to have failed.

4.3. Communication Recovery

   The party that detects the path failure starts a process to choose
   new Preferred locator pair to recover communication. Specifically,
   when host A detects the failure, it initiates the recovery process as
   follows:

   A. Active mode


Yuan, et al.           Expires January 3, 2011                [Page 8]

Internet-Draft   HIP Failure Detection and Recovery          July 2010


   If multihomed host A is in active mode, it sends a LOCATOR parameter
   to the peer host B in an UPDATE packet as described in RFC5206. In
   this packet, a new Preferred locator is set. After receiving the
   UPDATE packet, host B handles this packet as normal. Then, host A and
   host B continue their communication using the new Preferred locator
   pair.

   In the active mode, UPDATE packet is sent to find an available
   locator pair. If no responses are obtained during a period of time,
   host A will assume the testing locator does not work. Then, it may
   enter passive mode or choose another locator as Preferred locator,
   and re-send UPDATE packet. The locator selection order could be a
   local policy. This process may continue for a period of time until
   available locator pair is found. If host A have tried all its
   locators, and could not find any available locator, it must enter
   passive mode.

   B. Passive mode

   If multihomed host A is in passive mode, it selects one ACTIVE
   locator from the locator set which has been exchanged with B before.
   If there is no ACTIVE locator can be used, host A must enter active
   mode.

   Host A sends a NOTIFY packet to host B by using the selected ACTIVE
   locator. In this NOTIFY packet, the Notify Message Type is set to
   INVALID_LOCATOR, which notifies errors in the communication path.
   After receiving the NOTIFY packet, the receiver is aware of the
   failure of previously used path, and then it enters active mode. By
   default, the receiver may use the source/destination locator pair of
   the NOTIFY packet as new Preferred locator to initiate the active
   mode. Additionally, the receiver is also allowed to choose Preferred
   locator according to local policy.

   The NOTIFY packet may fail to reach host B. In such a case, host A
   may enter active mode or try another ACTIVE locator if no responses
   are obtained during a period of time. After trying all ACTIVE
   locators, if no available locator is found, host A must enter active
   mode.

   A host firstly enters active mode or passive mode is a local policy.
   The main difference in the two modes is the selection of preferential
   changing side of two communicating hosts. No matter which combination
   is chosen, the communication can always be recovered.


Yuan, et al.           Expires January 3, 2011                [Page 9]

Internet-Draft   HIP Failure Detection and Recovery          July 2010


5. Packet Formats

5.1. New HIP Parameter

5.1.1. KEEPALIVE_TIMEOUT

   This parameter is used to indicate how quickly KEEPALIVE NOTIFY
   message will be send. The value of KEEPALIVE_TIMEOUT is set to the
   value of Send Timeout of the host. KEEPALIVE_TIMEOUT can be sent in
   the I2, R1, or UPDATE messages. After receiving a packet with
   KEEPALIVE_TIMEOUT, the peer maps this value to set its Keepalive
   Timer.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           Type = 10           |        Length  = 4            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           Reserved            |      Keepalive Timeout        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
              Figure 2: KEEPALIVE_TIMEOUT Parameter Format

   The fields are exactly the same as defined in RFC5534.

   Type: This field identifies the paremeter and MUST be set to 10
   (Keepalive Timeout).

   Length: 4.

   Reserved: 16-bit field reserved for future use. Set to zero upon
   transmit and MUST be ignored upon receipt.

   Keepalive Timeout: Value in seconds corresponding to suggested
   Keepalive Timeout value for the peer.

5.2. NOTIFICATION Parameter

5.2.1. KEEPALIVE

   This document defines a new kind of NOTIFICATION parameter named
   KEEPALIVE. It is sent if there is no outgoing packet to inform
   corresponding peer that current path is OK. Types of NOTIFY message
   in the range 0-16383 are intended for reporting errors and in the
   range 16384-65535 for other status information.  Thus, the type value
   of KEEPALIVE should range among 16384-65535.


Yuan, et al.           Expires January 3, 2011               [Page 10]

Internet-Draft   HIP Failure Detection and Recovery          July 2010


5.2.2. INVALD_LOCATOR

   This document defines a new kind of NOTIFICATION parameter named
   INVALD_LOCATOR. Host in passive mode sends it to corresponding peer
   to indicate an error in the path. The type value of INVALD_LOCATOR
   should range among 0-16383.

6. Security Considerations

   To be done.

7. IANA Considerations

   This document defines a KEEPALIVE Notify Message Type and a
   INVALD_LOCATOR Notify Message Type for HIP.

8. References

8.1. Normative References

   [RFC5206] P. Nikander, T. Henderson, C. Vogt, J.Arkko, "End-Host
             Mobility and Multihoming with the Host Identity Protocol",
             RFC 5206, April 2008.

   [RFC5201] R.Moskowitz, P. Nikander, P. Jokela, T. Henderson, "Host
             Identity Protocol", RFC 5201, April 2008.

8.2. Informative References

   [REAP4HIP] A. de la Oliva, M. Bagnulo, "Fault tolerance
              configurations for HIP multihoming", <draft-oliva-hiprg
              reap4hip-00>, July, 2007.

   [RFC5534] J. Arkko, I. van Beijnum, "Failure Detection and Locator
             Exploration Protocol for IPv6 Multihoming", RFC 5534, June,
             2009

   [RFC5533] E. Nordmark. and M. Bagnulo, "Shim6: Level 3 Multihoming
             Shim Protocol for IPv6", RFC 5533, June 2009

   [BFD]     D. Katz, D. Ward, "Bidirectional Forwarding Detection",
             <draftietf-bfd-base-11(work in progress)>, January, 2010


Yuan, et al.           Expires January 3, 2011               [Page 11]

Internet-Draft   HIP Failure Detection and Recovery          July 2010


Authors' Addresses

   Tao Yuan
   State Key Laboratory of Networking and Switching,
   Beijing University of Posts and Telecommunications
   Beijing, P.R.China, 100876
   Email: alberty2004@gmail.com

   Qinqin Chu
   State Key Laboratory of Networking and Switching,
   Beijing University of Posts and Telecommunications
   Beijing, P.R.China, 100876
   Email: qinqinchu@gmail.com

   Zhangfeng Hu
   State Key Laboratory of Networking and Switching,
   Beijing University of Posts and Telecommunications
   Beijing, P.R.China, 100876
   Email: zfhu2001@gmail.com

   Bo Hu
   State Key Laboratory of Networking and Switching,
   Beijing University of Posts and Telecommunications
   Beijing, P.R.China, 100876
   Email: bj.hubo@gmail.com

   Shanzhi Chen
   State Key Laboratory of Networking and Switching,
   Beijing University of Posts and Telecommunications
   Beijing, P.R.China, 100876


Yuan, et al.           Expires January 3, 2011               [Page 12]