Bidirectional Forwarding Detection (BFD) on Link Aggregation Group (LAG) Interfaces
draft-mmm-bfd-on-lags-00

Abstract

This document proposes a mechanism to run BFD on Link Aggregation Group (LAG) interfaces. It does so by running an independent BFD session on every LAG member link.

A dedicated well-known multicast IP address for both IPv4 and IPv6 is introduced as the destination IP address of the BFD packets when running BFD on the member links of the LAG.

There is currently no standard that describes how BFD should run on LAG interfaces. As a result multiple non-interoperable BFD implementations for LAG interfaces exist. This draft provides a short overview as a context for the new proposed mechanism.

Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http:/⁠/⁠datatracker.ietf.org/⁠drafts/⁠current/⁠.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on June 13, 2012.

Copyright Notice

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http:/⁠/⁠trustee.ietf.org/⁠license-⁠info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

1. Introduction
2. BFD over LAG with a single session
2.1. Existing implementations
2.2. BFD over Big Pipe
3. BFD over member links of the LAG
3.1. BFD protocol details
3.2. BFD influence on the LAG Management Module
3.3. Concluded BFD state
3.4. Motivation for the technical design
4. BFD for LAG and layer-3 applications
5. Security Consideration
6. IANA Considerations
7. Acknowledgements
8. References
8.1. Normative References
8.2. Informative References
Authors' Addresses

1. Introduction

The Bidirectional Forwarding Detection (BFD) protocol [RFC5880] provides a mechanism to detect faults in the bidirectional path between two forwarding engines, including interfaces, data link(s), and to the extent possible the forwarding engines themselves, with potentially very low latency.

BFD can be used for detecting failures of the path between two network devices. Typically the application clients are not aware of any inner structure of the underlying interface, being layer 3 applications themselves like Open Shortest Path First (OSPF) [RFC2328] or Border Gateway Protocol (BGP)[RFC4271]. While this works for interfaces like Ethernet and Packet Over SONET (POS), it causes problems for bundled interfaces like LAG.

A LAG is used to bind together several physical ports between two adjacent nodes so they appear to higher-layer protocols as a single, higher bandwidth "virtual" pipe. A LAG interface thereby allows aggregation of multiple network interfaces as one virtual interface for the purpose of providing fault-tolerance and higher bandwidth.

The problem for BFD is that a single BFD session is subject to the load-balance algorithm used for the LAG, i.e. BFD has no control which physical links are used and in which sequence. This makes it impossible for BFD to guarantee a detection of anything but a full LAG shutdown. This LAG shutdown would be initiated by the LAG Management Module (LMM) and is typically multiple times slower than BFD detection times (multiple 100msec of LMM vs. multiple 10msec of BFD). The solution proposed in this document is to run a BFD session on every physical member link the LAG is built upon. This requires the LMM to request BFD sessions for every member link, using BFD as a fast detection mechanism. BFD can combine this information from LMM with the layer-3 centric session requests from OSPF and alike to provide fast detection for layer-3 applications.

2. BFD over LAG with a single session

2.1. Existing implementations

As mentioned, no standard exists on how to run BFD over LAG interfaces. As a result, a simple approach has been chosen by several implementations, that allows it to interoperate and solves the problem of establishing a BFD session over a LAG. This is typically done by treating LAG as a big, virtual pipe and ignoring the underlying structure (i.e., the component member links). This is not desirable as it does not allow deterministic and fast detection of individual member link failures. We call this conventional approach of running BFD as "BFD over Big Pipe" or "BBP" in short.

There are various ways of running BFD over LAG interfaces. Some implementations send BFD packets only over the primary or the active member link . Others spray BFD packets over all member links of the LAG. There are issues with each of these approaches.

In the first approach, BFD will remain up as long as the primary port is alive. It will go down once the primary port goes down till another port is selected as the primary. Another problem with this design is with BFD being oblivious to the presence of other member links in the LAG. If a non-primary member link goes down, then BFD remains unaffected as it can still send BFD packets over the primary link. BFD will thus remain up and all traffic sent over the failed member link will get dropped, till an upper layer protocol like Link Aggregation Control protocol (LACP) detects the failed link and removes it from the LAG.

In the second approach, BFD packets are sprayed over all the member links of a LAG. This is done naively via round-robin, where each BFD packet is sent using the subsequent member link, in a round-robin fashion. It solves the problem of BFD going down because of the primary port going down, but it still does not solve the problem of traffic getting lost when one of the member link goes down. This is because when a member link goes down, BFD still remains up and traffic continues to go over the link that has failed till a higher layer protocol detects this and removes the offending link from the LAG.

2.2. BFD over Big Pipe

This document proposes using one of the mechanisms described in the previous section in combination with the new mechanism of a separate BFD session per LAG member link, which will be defined in the next section.

For this reason we need to standardize the simple approach. The main task is to define what it means to treat LAG as a single "big pipe". It means:

BFD must work no matter what member link, packets are sent and received on.
the Rx/Tx link can change any time and/or regularly with every change pattern without causing BFD to fail

This allows to use the LAG like any other interface and RFC 5880 and RFC 5881 can be used without modification.

As described in the last section it is advantageous to spray the packets in a round-robin fashion across the LAG member links, as opposed to sending those over only the primary or the active port. It is thus RECOMMENDED that implementations do that. However, there are still some issues left with spraying the BFD packets, that will get addressed in the scheme described in the next section.

3. BFD over member links of the LAG

3.1. BFD protocol details

The proposal is to run a BFD session on every member link of the LAG. The BFD packets are IP/UDP based packets as defined in RFC 5880 [RFC5880] and RFC 5881 [RFC5881]. Currently, only asynchronous mode is considered in this document. The echo function is outside the document's scope. At least one system MUST take the Active role (possibly both). The BFD sessions on the member links are independent sessions. They use their own, unique local discriminator and their own set of state variables. Timer values MAY be different, even between the sessions belonging to the same LAG.

The destination IP address is a dedicated well-known multicast IP address (224.XXX.XXX.XXX for IPv4, FFXX:: for IPv6, to be assigned by IANA). On Ethernet-based LAG member links the corresponding destination multicast MACs will be 01:00:5e:XX:XX:XX for IPv4 and 33:33:XX:XX:XX:XX for IPv6. Each member link will use its own MAC address as the source MAC address.

The demultiplexing of a received packet is solely based on the Your Discriminator field, if this field is nonzero. A zero value may happen for the initial Down packet of a session. In this case demultiplexing a BFD for LAG packet MUST be based on some combination of other fields which MUST include either the destination IP or the destination MAC address.

The Address Family used is fixed per LAG, i.e. the BFD sessions on the member links of a particular LAG are either all using IPv4 or all using IPv6. An implementation MUST provide a configuration knob to select the address family and MAY extend this to some sort of auto-discovery. The default address family is IPv4.

3.2. BFD influence on the LAG Management Module

The LAG Management Module (LMM) is a client of BFD, requesting BFD sessions for all the LAG member links. For link failure detection the LMM can use BFD instead of or in parallel with LACP.

A member link of the LAG is not used anymore for data forwarding when the particular BFD session running over that link goes down. The member link MUST be removed from the LAG. The BFD session for the link remains, i.e. it is not deleted.

To add a member link to the LAG, LMM MAY wait for the BFD session on the link to come Up. There may be a deadlock situation since the link interface not being active (e.g., layer 3 protocol down) may prevent BFD packets, including other control protocols packets (e.g. ARP) that are tightly coupled with the status of the interface, to be transmitted between the pair of interfaces, thus failing to bring up the interfaces.

To avoid the deadlock, BFD packets SHOULD NOT be blocked by the layer N protocol status of the interface when the application depends on the BFD status to enable layer N of the interface. If this cannot be achieved then the BFD status MUST be ignored by the application when bringing up an interface. The BFD status can then be used afterwards to bring the interface down.

The behaviour of the LMM MUST be configurable if waiting for BFD status of Up to add a member link is supported, to allow an alternative mode of adding the member link irrespective of the BFD state for interoperability purpose.

3.3. Concluded BFD state

An additional state variable is introduced for BFD on LAG: the concluded state. The state values are Down and Up.

The details of how BFD derives the concluded state is outside the scope of the document. The idea is that the LMM may declare a LAG as down when a certain threshold has been hit, e.g. a minimum required bandwidth for the LAG. BFD could for example duplicate the LMM logic or it could use an API to LMM to learn about the decision of the LAG management module. What is relevant for BFD on LAG is that the concluded state is the overall state of the LAG.

The concluded state is important for layer-3 clients requesting BFD sessions over the LAG or over Vlans on the LAG. Details will be discussed in section 4.

3.4. Motivation for the technical design

The primary goal was to stay close to the existing standards RFC 5880 [RFC5880] and RFC 5881 [RFC5881], allowing the reuse of existing implementations for IP-based point-to-point BFD. At the same time BFD for LAG and the already existing BFD over Big Pipe drafted in Section 2 should be able to run in parallel. The destination IP address can be used to disambiguate the BFD packets over the member links and the BFD packets over the Big Pipe, should an implementation decide to support both.

To overcome the problem that a member link may not support ARP when not being an active LAG member a multicast MAC was chosen to allow the destination interface port to accept the BFD packet. Combining these two requirements results in using a well-defined IP Multicast address.

4. BFD for LAG and layer-3 applications

The information about the member links belonging to a LAG interface comes from the LAG management module (LMM). BFD helps the LMM to detect and converge fast. Layer-3 protocols may use BFD for LAG in one of the following ways:

For sessions requested by layer-3 clients like OSPF a virtual session is created. This virtual session is not creating actual BFD packets on the LAG interface. Instead it's state, which is reported to the layer-3 client, is identical with the concluded state.

BFD on LAG MUST support this mode, and it is the default mode.

With virtual sessions synchronization between the two BFD modules on either end of the LAG is guaranteed only by an identical configuration and logic on both ends.
BFD for LAG is requested by the LMM only. For sessions requests from layer-3 applications the BFD over Big Pipe (BBP) mechanism is used. BBP would run as an independent detection mechanism over the LAG, detecting when the LMM is bringing the LAG down.

BFD on LAG SHOULD support this mode.

This may result in two separate detection steps before the layer-3 clients are informed: first the detection of a member links session, then BBP detecting the LAG has been taken down. To improve the detection time BFD MAY use the concluded state and declare BBP sessions on the same LAG as Down when the concluded state is down.
A beneficial side effect of combining BFD on LAG member links with BBP is the synchronization of both BFD modules on either end of the LAG by the BBP BFD sessions.

An implementation MUST provide a configuration knob which lets the user select the mode if both modes are supported.

5. Security Consideration

This document does not introduce any additional security issues and the security mechanisms defined in [RFC5880] apply in this document.

6. IANA Considerations

The IANA is requested to assign a well-known multicast IP address: "224.XXX.XXX.XXX" for IPv4 and FFXX:: for IPv6.

7. Acknowledgements

Most of the text for this document came originally from draft-chen-bfd-interface-00.

We would also like to thank the members of the BFD WG who expressed strong support about needing such a mechanism.

8. References

8.1. Normative References

[RFC2119]	Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC5880]	Katz, D. and D. Ward, "Bidirectional Forwarding Detection (BFD)", RFC 5880, June 2010.
[RFC5881]	Katz, D. and D. Ward, "Bidirectional Forwarding Detection (BFD) for IPv4 and IPv6 (Single Hop)", RFC 5881, June 2010.

8.2. Informative References

[RFC2328]	Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998.
[RFC4271]	Rekhter, Y., Li, T. and S. Hares, "A Border Gateway Protocol 4 (BGP-4)", RFC 4271, January 2006.

Authors' Addresses

Manav Bhatia Alcatel-Lucent Bangalore, 560045 India EMail: manav.bhatia@alcatel-lucent.com

Mach(Guoyi) Chen Huawei Technologies Co., Ltd Q14 Huawei Campus, No. 156 Beiqing Road, Hai-dian District Beijing, 100095 China EMail: mach@huawei.com

Zuliang Wang Huawei Technologies Co., Ltd Q15 Huawei Campus, No. 156 Beiqing Road, Hai-dian District Beijing, 100095 China EMail: liang_tsing@huawei.com

Liang Guo China Telecom Guangzhou, China EMail: guoliang@gsta.com

Marc Binderberger Lausanne, Switzerland EMail: marc@sniff.de