Network Working Group M. Hajduk Internet-Draft Individual Intended status: Standards Track 21 October 2023 Expires: 23 April 2024 Link failure detection by Ethernet data plane draft-hajduk-lfdedp-00 Abstract This document describes a method to detect link failures which relies solely on Ethernet data plane. An ordinary Ethernet frame can be modified and sent back to sender to signal a loss of connection. No control plane protocol is utilized in the process which makes the method very simple and fast to react. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 23 April 2024. Copyright Notice Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Hajduk Expires 23 April 2024 [Page 1] Internet-Draft NFFRR October 2023 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Layer 2 Point of Local Repair . . . . . . . . . . . . . . . . 3 4. Layer 3 Point of Local Repair . . . . . . . . . . . . . . . . 4 4.1. Updating FIB . . . . . . . . . . . . . . . . . . . . . . 4 4.2. Activating FRR . . . . . . . . . . . . . . . . . . . . . 4 5. Network convergence . . . . . . . . . . . . . . . . . . . . . 4 6. Interoperability considerations . . . . . . . . . . . . . . . 4 7. Implementation status . . . . . . . . . . . . . . . . . . . . 5 8. Security Considerations . . . . . . . . . . . . . . . . . . . 5 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 5 10.1. Normative References . . . . . . . . . . . . . . . . . . 5 10.2. Informative References . . . . . . . . . . . . . . . . . 6 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 6 1. Introduction To minimize packet loss after an unexpected failure it is crucial to detect the failure the shortest possible time after its occurrence. Ethernet is well-known example of L2 protocol without signaling loss of connection between two endpoints. Various protocols have been proposed to address this drawback. Traditional methods rely on periodic transmission of control plane messages. Inability to receive such messages on time is interpreted as a link failure by receiver. This method may and usually does fail to detect failure at the precise time it happened. The event is detected with a varying delay during which significant number of packets may be lost. Attempts to minimize the delay lead to more frequent transmission of PDUs and more processor cycles consumed. In the end, it results in higher energy consumption and higher carbon footprint. In this document a novel method based solely on existing Ethernet data plane is introduced. It features near-to-zero packet loss and zero control plane overhead. Data plane frame is used to signal the inability to reach a destination. Moreover, Ethernet switches can save frames from dropping in a similar way routers save MPLS or IP packets by using FRR. Hajduk Expires 23 April 2024 [Page 2] Internet-Draft NFFRR October 2023 2. Overview Frame which cannot be forwarded by a switch is modified by the switch and sent back to the original sender. The receiver of the frame understands the original frame could not be forwarded to the destination and may activate FRR for the encapsulated PDU. Thus, such frame is signaling a failure and is also saved for further forwarding on a backup path. Switch which can react to a failure of its link in the way described in this document is L2PLR (Layer 2 Point of Local Repair). The end recipient of the returned frame, modified by L2PLR, is L3PLR (Layer 3 Point of Local Repair). L3PLR is typically but not exclusively router. The type of L3PLR depends on higher-layer PDU encapsulated in a frame. 3. Layer 2 Point of Local Repair When a switch interface goes down, the associated MAC table entries are put into a transitional state meaning an unhandled failure event. If a frame cannot be delivered using that entry and its Ethertype value is supported, the process of returning of frame begins. The L2PLR MUST modify the frame in this way: 1. Copy source MAC address to destination MAC address (or set destination MAC address to a predefined value). 2. Set source MAC address to MAC address of the switch interface. 3. Change the innermost Ethertype value to a reserved value associated with the original Ethertype value. 4. Recalculate FCS. Optionally, L2PLR MAY apply MACsec protection on the frame (details are provided in Security considerations). After the modification, the L2PLR MUST send the frame in the VLAN of the original frame. The receiver of the modified frame (L3PLR) MAY be the sender of the original frame. Alternatively, it MAY be a different entity in the same broadcast domain configured to collect frames it has not sent. In the former case, new destination address is the original source address, in the latter, new destination address is taken from configuration. If a modified frame cannot be forwarded on its return path (because of another network failure), it MUST be dropped. Hajduk Expires 23 April 2024 [Page 3] Internet-Draft NFFRR October 2023 4. Layer 3 Point of Local Repair 4.1. Updating FIB L3PLR MUST recognize the new Ethertype value of the received frame to understand the frame was returned. The Ethertype value is used to find the associated higher-layer protocol. The L3PLR MUST update FIB entry for the address from higher-layer PDU. Updating FIB entry MAY mean invalidating ARP [RFC826] entry for destination IPv4 address or NDP [RFC4861] entry for destination IPv6 address in the received frame. Transition to a new entry state MAY be delayed in order to dampen flapping. New entry state and delay MAY be configurable by operator. 4.2. Activating FRR Regardless the FIB entry state, if the L3PLR supports FRR and has a valid backup path, it MUST forward PDUs from all frames returned by L2PLR to the backup path. If the FIB entry is invalid, FRR MUST be activated for all packets having their destination impacted by the entry, not only those from returned frames. PDUs to be sent over backup path are encapsulated in an ordinary way. If L3PLR does not have any valid backup path, the packet MUST be dropped. 5. Network convergence L2PLR MUST stop returning frames when the respective MAC table entry is deleted or put into valid state. Either change triggers normal operation for these frames. If a FIB entry (for a neighbor) has been invalid sufficiently long, the routing process declares the neighbor unreachable. Time interval for such change is not defined in this document. Before the expiration, L3PLR SHOULD be probing logical address from the FIB entry. The entity being probed might reply either with the original MAC address (after L2 network converges), or with a new MAC address. Once FIB entry is valid again, reachability is considered renewed. Not receiving returned frames for a period of time MUST NOT be interpreted as renewal of reachability. 6. Interoperability considerations Frames returned by L2PLR can be forwarded by switches which do not support the method. However, such switches and other L1 or L2 devices may prevent L2PLR from detecting link failure. Operator should avoid using them if the full support of the method is required. Hajduk Expires 23 April 2024 [Page 4] Internet-Draft NFFRR October 2023 If L3PLR not supporting the method received frames returned by L2PLR, the frames would be discarded due to unrecognized Ethertype value. Advertising support of the method is not covered in this document. Operator must ensure frames are returned to an entity with can act as L3PLR. Default behavior of returning a frame to its original sender is overridden when MAC address of L3PLR is used for destination MAC address of the returned frame. 7. Implementation status The method described in this document has not been implemented. 8. Security Considerations The method may be misused for malicious redirection of traffic. If there are multiple paths from a L3PLR to a destination, secondary path may use a weaker encryption than the primary path. Even worse, secondary path may use no encryption while the primary path does. In such scenario, an attacker may intercept a frame on primary path, modify it to look like a returned frame and inject it back. Thus, L3PLR which received such frame would be spoofed to believe the primary path failed. The frame would be sent over the secondary path where the attacker might be able to eavesdrop its payload. To prevent from such attack, MACsec should be deployed on a link of the primary path. This would make impossible for an unauthorized entity to modify a frame without a receiver being able to detect the spoofed frame. The countermeasure presumes L2PLR is trusted by receiver of returned frame. The trust establishment is specified by MACsec standard (IEEE 802.1AE) and is not covered in this document. If the frame not protected by MACsec must be returned, the L2PLR applies MACsec integrity protection on the modified frame. If the frame protected by MACsec must be returned, the L2PLR may decrypt the frame, modifies it, re-encrypts it and applies the integrity protection. In either case L3PLR should be able to detect a spoofed returned frame. 9. IANA Considerations The method requires new Ethertype values. For each higher-layer protocol which needs to be supported a unique value is needed. This value would be used for returned frames of the respective protocol only. Since IANA does not assign Ethertype values [RFC7042], the requirement is applicable for IEEE. 10. References 10.1. Normative References Hajduk Expires 23 April 2024 [Page 5] Internet-Draft NFFRR October 2023 [IEEE802.1AE] Seaman, M., "IEEE Standard for Local and metropolitan area networks–Media Access Control (MAC) Security", September 2018, . [RFC4861] Narten, T., Nordmark, E., Simpson, W., and H. Soliman, "Neighbor Discovery for IP version 6 (IPv6)", RFC 4861, DOI 10.17487/RFC4861, September 2007, . [RFC826] Plummer, D., "An Ethernet Address Resolution Protocol: Or Converting Network Protocol Addresses to 48.bit Ethernet Address for Transmission on Ethernet Hardware", STD 37, RFC 826, DOI 10.17487/RFC0826, November 1982, . 10.2. Informative References [RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection (BFD)", RFC 5880, DOI 10.17487/RFC5880, June 2010, . [RFC7042] Eastlake 3rd, D. and J. Abley, "IANA Considerations and IETF Protocol and Documentation Usage for IEEE 802 Parameters", BCP 141, RFC 7042, DOI 10.17487/RFC7042, October 2013, . Author's Address Martin Hajduk Individual Email: martin.hajduk.ietf@gmail.com Hajduk Expires 23 April 2024 [Page 6]