TRILL Working Group Tissa Senevirathne Internet Draft CISCO Intended status: Informational David Bond IBM Sam Aldrin Yizhou Li Huawei Rohit Watve CISCO Anoop Ghanwani DELL Jon Hudson Brocade Naveen Nimmu Broadcom Radia Perlman Intel Tal Mizrahi Marvell May 4, 2012 Expires: November 2012 Requirements for Operations, Administration and Maintenance (OAM) in TRILL draft-tissa-trill-oam-req-01.txt Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. This document may not be modified, and derivative works of it may not be created, and it may not be published except as an Internet-Draft. This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. This document may not be modified, and derivative works of it may not be created, except to publish it as an RFC and to translate it into languages other than English. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that Senevirathne Expires November 4, 2012 [Page 1] Internet-Draft TRILL OAM Requirements May 2012 other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This Internet-Draft will expire on November 4, 2012. Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Abstract OAM (Operations, Administration and Maintenance) is a general term used to identify functions and toolsets to troubleshoot and monitor networks. This document presents, OAM Requirements applicable to TRILL. Also presented in this document is the proposed frame structure for TRILL OAM messages. Table of Contents 1. Introduction...................................................3 1.1. Contributors..............................................4 2. Conventions used in this document..............................4 3. Terminology....................................................4 4. OAM Requirements...............................................5 4.1. Data Plane................................................5 4.2. Connectivity Verification.................................5 4.2.1. Unicast..............................................5 Senevirathne Expires November 4, 2012 [Page 2] Internet-Draft TRILL OAM Requirements May 2012 4.2.2. Multicast............................................6 4.3. Continuity Check..........................................6 4.4. Path Tracing..............................................7 4.5. General Requirements......................................7 4.6. Performance Monitoring....................................8 4.7. Packet Loss...............................................8 4.8. Packet Delay..............................................8 4.9. ECMP Utilization..........................................9 4.10. Security and Operational considerations..................9 4.11. Fault Indications........................................9 4.12. Defect Indications.......................................9 4.13. Live Traffic monitoring.................................10 5. General Format of TRILL OAM Messages..........................10 5.1. Requirements of OAM Message channel......................11 6. Security Considerations.......................................12 7. IANA Considerations...........................................12 8. References....................................................12 8.1. Normative References.....................................12 8.2. Informative References...................................13 9. Acknowledgments...............................................13 1. Introduction OAM (Operations, Administration and Maintenance) generally covers various production aspects of a network. In this document we use the term OAM as defined in [RFC6291]. Success of any mission critical network depends on the ability to proactively monitor networks for faults, performance, etc. as well as its ability to efficiently and quickly troubleshoot defects and failures. A well-defined OAM toolset is a vital requirement for wider adoption of TRILL as the next generation data forwarding technology in larger networks such as data centers. In this document we define the Requirements for TRILL OAM. Also the proposed format for OAM frames is presented. It is assumed that the readers are familiar with the OAM concepts and terminologies defined in other OAM standards such as [802.1ag], [RFC5860]. This document does not attempt to redefine the terms and concepts specified elsewhere. Senevirathne Expires November 4, 2012 [Page 3] Internet-Draft TRILL OAM Requirements May 2012 1.1. Contributors The following members were part of the design team that produced this document. Their names are listed below in alphabetical order. Anoop Ghanwani, David Bond, Donald Eastlake 3 , Jon Hudson, Naveen Nimmu, Radia Perlman, Rohit Watve, Sam Aldrin, Shivakumar Sundaram, Tal Mizrahi, Thomas Narten, Tissa Senevirathne, Yizhou Li. 2. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [RFC2119]. In this document, these words will appear with that interpretation only when in ALL CAPS. Lower case uses of these words are not to be interpreted as carrying RFC-2119 significance. 3. Terminology Section: The term Section refers to a partial segment of a path between any two given RBridges. As an example, consider the case where RB1 is connected to RBx via RB2,RB3 and RB4. The segment between RB2 to RB4 is referred to as a Section of the path RB1 to RBx. Flow: The term Flow indicates a set of packets that share the same path and per-hop behavior (such as priority). A flow is typically identified by a portion of the inner payload that affects the hop-by hop forwarding decisions. This may contain Layer 2 through Layer 4 information. All Least Cost Paths: The term "all least cost paths" refers to all potentially available least cost paths from a given source to a specified destination RBridge as determined by the TRILL network topology learned from IS-IS. Note, however, that differences in ECMP implementations at intermediate RBs may result in only a subset of all available paths actually being realizable for traffic forwarding. For example, an RB may be limted in how many equal cost next hops it can use, or a poor combination of hashing choices on different RBs can result in some equal cost next-hops never being selected. ECMP: Equal Cost Multi Path. Senevirathne Expires November 4, 2012 [Page 4] Internet-Draft TRILL OAM Requirements May 2012 Connectivity: The term connectivity indicates reachability between any two RBridges RB1 and RB2. Fault: The term Fault refers to an inability to perform a required action, e.g., an unsuccessful attempt to deliver a packet. Defect: The term Defect refers to an interruption in the normal operation, such that over a period of time no packets are delivered successfully. Failure: The term Failure refers to the termination of the required function over a longer period of time. Persistence of a defect for a period of time is interpreted as a failure. 4. OAM Requirements 4.1. Data Plane OAM frames, utilized for connectivity verification, continuity checks, performance measurements, etc., MUST follow the same path as the data frames. RBridges MUST have the ability to identify OAM frames destined for themselves or which require processing by the control plane. RBridges MUST have the ability to differentiate between OAM frames and data frames experiencing errors. TRILL OAM frames MUST be confined to the TRILL domain and MUST NOT be forwarded out of the TRILL domain. E.g. they must not be sent as native frames on an end station service enabled port OAM MUST have the capability to provide services for both IP and non-IP flows. OAM MUST be able to function in IP and non-IP infrastructure. 4.2. Connectivity Verification 4.2.1. Unicast OAM MUST have the ability to verify connectivity from an RBridge RB1 to a specific RBridge RB2. Senevirathne Expires November 4, 2012 [Page 5] Internet-Draft TRILL OAM Requirements May 2012 OAM MUST have the ability to verify connectivity from an RBridge RB1 to a specific RBridge RB2 for a specific flow. An RBridge SHOULD have the ability to verify the above connectivity tests on sections. As an example, assume RB1 is connected to RB5 via RB2, RB3 and RB4. An operator SHOULD be able to verify the RB1 to RB5 connectivity on the section from RB3 to RB5. The difference is that the ingress and egress TRILL nicknames in this case are RB1 and RB5 as opposed to RB3 and RB5, even though the message itself may originate at RB3. OAM MUST have the ability to invoke the above functions both proactively and on-demand. 4.2.2. Multicast OAM MUST have the ability to verify connectivity either to a specific set of RBridges or all member RBridges, for a specified multicast tree. This functionality is referred to as verification of the un-pruned multicast tree. OAM MUST have the ability to verify connectivity either to a specific set of RBridges or all member RBridges, for a specified multicast tree and for a specified flow. This functionality is referred to as verification of the pruned tree. OAM MUST have the ability to invoke the above multicast connectivity verification functions either proactively or on-demand. 4.3. Continuity Check [RFC5860] defines Continuity Check as the ability of end points to monitor liveliness of a path or a section of a path [RFC5860]. We use similar semantics in this document where end points are the ingress or egress RBridges. OAM MUST provide functions that allow an RBridge to perform a Continuity Check to any other RBridge. OAM MUST provide functions that allow an RBridge to perform a Continuity Check to any other RBridge for a specified flow. OAM SHOULD provide functions that allow an RBridge to perform a Continuity Check to any other RBridge over all least cost paths. OAM SHOULD provide the ability to perform a Continuity Check on sections of any path within the network. Senevirathne Expires November 4, 2012 [Page 6] Internet-Draft TRILL OAM Requirements May 2012 OAM SHOULD provide the ability to perform a multicast Continuity Check for specified multi-destination tree(s) as well as specified multi-destination tree and flow combinations. The former is referred to as an un-pruned multi-destination tree Continuity Check and the latter is referred to as a pruned tree Continuity Check. 4.4. Path Tracing OAM MUST provide the ability to trace a path between any two RBridges per specified unicast flow. OAM SHOULD provide the ability to trace all least cost paths between any two RBridges. OAM SHOULD provide functionality to trace all branches of a specified multi-destination tree (un-pruned tree) OAM SHOULD provide functionality to trace all branches of a specified multi-destination tree for a specified flow (pruned tree). 4.5. General Requirements OAM MUST provide the ability to initiate and maintain multiple concurrent sessions between any given two RBridges. OAM MUST NOT require extensions to or modifications of the TRILL header. OAM MUST provide a single OAM framework for all TRILL OAM functions OAM, as practical and as possible, SHOULD provide a single framework between TRILL and other similar standards. OAM MUST maintain related error and operational counters. Operations of OAM MUST NOT result in errors on end devices. OAM MAY be required to provide the ability to specify a desired response mode for a specific OAM message. The desired response mode can be either in-band, out-of band or none. The OAM Framework MUST be extensible to future needs of TRILL and the needs of other standard organizations. OAM MAY provide methods to verify control plane and forwarding plane alignments. Senevirathne Expires November 4, 2012 [Page 7] Internet-Draft TRILL OAM Requirements May 2012 OAM SHOULD leverage existing OAM technologies, where practical. 4.6. Performance Monitoring 4.7. Packet Loss In this document, term loss of a packet is used as defined in [RFC2680] (see Section 2.4 of RFC2680). OAM SHOULD provide the ability to measure packet loss for a specified flow between any two RBridges. OAM SHOULD provide the ability to measure packet loss over a segment, for a specified flow between any two RBridges. OAM SHOULD provide the ability to measure packet loss between any two RBridges over all least cost paths. An RBridge SHOULD be able to perform the above packet loss measurement functions either proactively or on-demand. 4.8. Packet Delay There are two types of packet delays -- one-way delay and two-way delay (Round Trip Delay). One-way delay is defined in [RFC2679] as the time elapsed from the start of transmission of the first bit of a packet by an RBridge until the reception of the last bit of the packet by the destination RBridge. Two-way delay is also referred to as Round Trip Delay is defined similar to [RFC2681]; i.e. the time elapsed from the start of transmission of the first bit of a packet by an RBridge until the reception of the last bit of the packet by the same RBridge. OAM MUST provide functions to measure one-way delay between two RBridges for a specified flow. OAM MUST provide functions to measure one-way delay between two RBridges for a specified flow over a specific section. OAM MAY provide functions to measure two-way delay between two RBridges for a specified flow. OAM MAY provide functions to measure two-way delay between two RBridges for a specified flow over a specific section. Senevirathne Expires November 4, 2012 [Page 8] Internet-Draft TRILL OAM Requirements May 2012 4.9. ECMP Utilization OAM MAY provide functionality to monitor utilization of ECMP paths to specified destination RBridges. Utilization is defined as distribution of traffic over different ECMP members. 4.10. Security and Operational considerations Methods MUST be provided to protect against exploitation of OAM framework for security and denial of service attacks. Methods SHOULD be provided to prevent OAM messages causing congestion in the networks. Periodically generated messages with high frequencies MAY lead to congestion, hence methods such as shaping or rate limiting SHOULD be utilized. 4.11. Fault Indications The term Fault refers to an inability to perform a required action, e.g., an unsuccessful attempt to deliver a packet [OAMOVER]. The unsuccessful attempt may be due to Hop Count expiry, invalid nickname, etc. OAM MUST provide a Fault Indication framework to notify faults to the ingress RBRidge of the flow or other interested parties (such as syslog servers). OAM MUST provide functions to selectively enable or disable different types of Fault Indications. 4.12. Defect Indications [OAMOVER] defines "The term Defect refers to an interruption in the normal operation, such as a consecutive period of time where no packets are delivered successfully." OAM SHOULD provide a framework for Defect Detection and Indication. OAM implementations that provide Defect Indication MUST provide methods to selectively enable or disable Defect Detection per defect type. OAM implementations that provide Defect Indication MUST provide methods to configure Defect Detection thresholds per different types of defects. Senevirathne Expires November 4, 2012 [Page 9] Internet-Draft TRILL OAM Requirements May 2012 OAM implementations that provide Defect Indication facilities MUST provide methods to log defect indications to a locally defined archive such as log buffer or SNMP traps. OAM implementations that provide Defect Indication facilities SHOULD provide a Remote Defect Indication framework that facilitates notifying the originator/owner of the flow experiencing the defect, which is the ingress RBridge. Remote Defect Indication MAY be either in-band or out-of-band. 4.13. Live Traffic monitoring OAM implementations MAY provide methods to utilize live traffic for troubleshooting and performance monitoring. Implementations MAY leverage Data Driven CFM [802.1Q] or IPFIX [RFC5101] for the purpose of performance monitoring. 5. General Format of TRILL OAM Messages The TRILL forwarding paradigm allows an implementation to select a path from a set of equal cost paths to forward a packet. Selection of the path of choice is implementation dependent. However, it is a common practice to utilize Layer 2 through Layer 4 information in the inner payload for path selection. As specified above, OAM Messages are required to follow the exact path as the data packets. The proposed frame format for TRILL OAM messages is as follows. Senevirathne Expires November 4, 2012 [Page 10] Internet-Draft TRILL OAM Requirements May 2012 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | . Outer Header . (variable) | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + TRILL Header + 8 bytes | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | . Flow Entropy . 128 bytes . . | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | . OAM Message Channel . Variable . . | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 1 Frame format of OAM Messages Outer Header: Media dependent header. For Ethernet this included Destination MAC, Source MAC, VLAN (optional) and EtherType fields. TRILL Header: Minimum of 8 bytes when the Extended Header is not included [RFC6325] Flow Entropy: This is a 128 byte Fixed size opaque field. The field MUST be padded with zeros when the flow entropy is less than 128 bytes. Flow entropy emulates the forwarding behavior of the desired data packets. OAM Message Channel: This is a variable size section that carries OAM related information. Reusing existing OAM message definitions such as [RFC4379] and [802.1ag] will be explored. 5.1. Requirements of OAM Message channel The OAM Message channel header MUST contain a version number Senevirathne Expires November 4, 2012 [Page 11] Internet-Draft TRILL OAM Requirements May 2012 The OAM Message channel header MUST contain flags that facilitate hardware level processing. (e.g. indicate this is two-way delay measurement probe) The OAM Message channel MUST contain time stamping in fixed locations to facilitate hardware level performance monitoring. (e.g. delay measurements). The OAM Message channel MUST have the ability to be extensible to include future capabilities without requiring a change to the version of the message header. (e.g. include TLV structures) The OAM Message header MUST contain a unique marker that allows for identifying the presence of the OAM channel. This marker MUST provide equal or better uniqueness compared to the IP checksum define in [RFC791] The OAM Message SHOULD provide methods to include arbitrary data to test functions such as MTU testing. 6. Security Considerations Security Requirements are specified in section 4.10. 7. IANA Considerations 8. References 8.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [OAMOVER] Mizrahi, T, et.al., "An Overview of Operations, Administration, and Maintenance (OAM) Mechanisms", draft- ietf-opsawg-oam-overview-06, Work in Progress, March 2012. [RFC5860] Vigoureux, M., et.al., "Requirements for Operations, Administration and Maintenance (OAM) in MPLS Transport Networks", RFC5860, May 2010. [RFC4377] Nadeau, T., et.al., "Operations and Management (OAM) Requirements for Multi-Protocol Label Switched (MPLS) Networks", RFC 4377, February 2006. Senevirathne Expires November 4, 2012 [Page 12] Internet-Draft TRILL OAM Requirements May 2012 8.2. Informative References [RFC6325] Perlman, R., et.al., "Routing Bridges (RBridges): Base Protocol Specification", RFC 6325, July 2011. [RFC5101] Claise, B., "Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of IP Traffic Flow Information", RFC5101, January 2008. [RFC2680] Almes, G., et.al. "A One-way Packet Loss Metric for IPPM", RFC 2680, September 1999. [RFC2679] Almes, G., et.al. "A One-way Delay Metric for IPPM", RFC 2679, September 1999. [RFC2681] Almes, G., et.al. "A Round-trip Delay Metric for IPPM", RFC 2681, September 1999. [RFC6291] Anderson, L., et.al. "Guidelines for the Use of the "OAM" Acronym in the IETF", RFC 6291, June 2011. [8021ag] IEEE, "Virtual Bridged Local Area Networks Amendment 5: Connectivity Fault Management", 802.1ag, 2007. [8021Q] IEEE, "Media Access Control (MAC) Bridges and Virtual Bridged Local Area Networks", IEEE Std 802.Q-2011, August 31 , 2011. [RFC791] "Internet Protocol", RFC 791, September 1981. 9. Acknowledgments This document was prepared using 2-Word-v2.0.template.dot. Authors' Addresses Tissa Senevirathne CISCO Systems 375 East Tasman Drive San Jose, CA 95134 USA. Phone: +1-408-853-2291 Email: tsenevir@cisco.com Senevirathne Expires November 4, 2012 [Page 13] Internet-Draft TRILL OAM Requirements May 2012 David Bond IBM 2051 Mission College Blvd Santa Clara, CA 95054 USA Phone: +1-603-339-7575 Email: mokon@mokon.net Sam Aldrin Huawei Technologies 2330 Central Express Way Santa Clara, CA 95951 USA Email: aldrin.ietf@gmail.com Yizhou Li Huawei Technologies 101 Software Avenue, Nanjing 210012 China Phone: +86-25-56625375 Email: liyizhou@huawei.com Rohit Watve CISCO Systems 375 East Tasman Drive San Jose, CA 95134 USA. Phone: +1-408-424-2091 Email: rwatve@cisco.com Senevirathne Expires November 4, 2012 [Page 14] Internet-Draft TRILL OAM Requirements May 2012 Anoop Ghanwani DELL 350 Holger Way San Jose, CA 95134 USA. Phone: +1-408-571-3500 Email: Anoop@duke.alumni.duke.edu John Hudson Brocade 120 Holger Way San Jose, CA 95134 USA. Email: jon.hudson@gmail.com Naveen Nimmu Broadcom 9th Floor, Building no 9, Raheja Mind space Hi-Tec City, Madhapur, Hyderabad - 500 081, INDIA Phone: +1-408-218-8893 Email: naveen@broadcom.com Radia Perlman Intel Labs 2700 156th Ave NE, Suite 300, Bellevue, WA 98007 USA. Phone: +1-425-881-4824 Email: radia.perlman@intel.com Tal Mizrahi Marvell 6 Hamada St. Yokneam, 20692 Israel Email: talmi@marvell.com Senevirathne Expires November 4, 2012 [Page 15]