Neil Harrison Internet Draft Peter Willis Document: draft-harrison-mpls-oam-00.txt British Telecom Expires: August 2001 Shahram Davari PMC-Sierra Ben Mack-Crane Tellabs Hiroshi Ohta NTT February 2001 OAM Functionality for MPLS Networks Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Copyright Notice Copyright(C) The Internet Society (2001). All Rights Reserved. Abstract This Internet draft provides requirements and mechanisms for OAM (Operation and Maintenance) for the user-plane in MPLS networks. A connectivity verification "CV" OAM packet is defined, which is transmitted periodically from LSP source to LSP sink. The CV flow could be used to detect defects related to misrouting of LSPs as well as link and nodal failure, and if required to trigger protection switching to the protection path. Harrison et.al Expires August 2001 Page 1 OAM Functionality for MPLS Networks February 2001 A forward defect identifier "FDI" and a backward defect identifier "BDI" are defined, which carry the defect type and location to the near end and far end respectively. At every LSP terminating node, the FDI is mapped from server layer to client layer. By doing so FDI could suppress the alarm storm, and let the appropriate layer take control of protection switching. BDI is used by LSP source to start or stop the QoS aggregation, depending on whether the LSP is in available or unavailable state. The criteria for entry and exit to the available and unavailable states are also defined in this document. Table of Contents 1. Introduction..................................................3 2. Definitions...................................................4 3. Symbols and Abbreviations.....................................5 4. Requirements for MPLS OAM.....................................5 5. Principles of OAM Function....................................6 5.1 Client/Server Recursion-Layering..............................6 5.2 OAM Functionality and Layer Independence......................7 5.3 Defects.......................................................7 5.4 Availability..................................................7 5.5 Decoupling of User behavior from Connectivity Assessment......8 5.6 Forward and Backward Defect Indicators........................8 5.7 Connectivity Verification.....................................9 5.8 Customers Should not be Used as Defect Detectors.............10 5.9 The Reliability of OAM Functionality Under Fault Conditions..10 6. Mechanisms of MPLS OAM.......................................10 6.1 Special MPLS Label Values....................................10 6.2 Handling of Errored OAM Packets..............................10 6.3 Label Stack Overhead Encoding Rules for OAM Packets..........11 6.3.1 For CV OAM Packets...........................................11 6.3.2 For P OAM Packets............................................12 6.3.3 For FDI and BDI OAM Packets..................................12 6.3.4 MPLS OAM Function Types for the OAM Alert Label..............13 6.4 MPLS OAM Packets.............................................14 6.4.1 Connectivity Verification (CV) Packets.......................15 6.4.2 Performance ôPö Packets......................................16 6.4.3 Forward defect Indicator ôFDIö packets.......................16 6.4.4 Backward Defect Indicator ôBDIö..............................17 6.5 Defect Types and their Entry/Exit Criteria...................18 6.5.1 Defect Type Codepoints.......................................18 6.5.2 dLOCV Entry Criteria.........................................20 6.5.3 DTTSI Entry Criteria.........................................21 6.5.4 dLoop Entry Criteria.........................................21 6.5.5 dLOCV, dTTSI and dLoop exit criteria.........................22 6.6 Available and unavailable state processing...................23 Harrison et. al. Expires August 2001 Page 2 OAM Functionality for MPLS Networks February 2001 6.6.1 Short Break definition.......................................23 6.6.2 Available/Unavailable State Definition.......................24 6.6.3 Near-end and Far-end Measurements of Availability............24 6.6.4 Near-End State Processing Flow-chart.........................25 6.6.5 Far-End State Processing Flow-chart..........................27 6.6.6 A pictorial view of near-end and far-end state processing....28 7. Security Considerations......................................29 8. References...................................................29 9. Author's Addresses...........................................29 Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [1]. 1. Introduction This Internet draft provides requirements and mechanisms for OAM (Operation and Maintenance) for the user-plane in MPLS networks. It is recognized that OAM functionality is important in public networks for ease of network operation, for verifying network performance and to reduce operational costs. OAM functionality is especially important for networks, which are required to deliver (and hence be measurable against) QoS (Quality of Service) and availability performance parameters/objectives. A connectivity verification "CV" OAM packet is defined in this document, which is transmitted periodically from LSP source to LSP sink. The CV flow could be used to detect defects related misrouting of LSPs as well as link and nodal failure, and if required to trigger protection switching to the protection path. A forward defect identifier "FDI" and a backward defect identifier "BDI" are defined, which carry the defect type and location to the near end and far end respectively. At every LSP terminating node, the FDI is mapped from server layer to client layer. By doing so FDI could suppress the alarm storm, and let the appropriate layer take control of protection switching. BDI is used by LSP source to start or stop the QoS aggregation, depending on whether the LSP is in available or unavailable state. The criteria for entry and exit to the available and unavailable states are also defined in this document. The OAM functionality defined herein is limited to point-point LSP tunnels. OAM functionality for multipoint-point and point-multipoint LSP tunnels is FFS. Harrison et. al. Expires August 2001 Page 3 OAM Functionality for MPLS Networks February 2001 2. Definitions This document introduces some new terminology, which is required to discuss the functional network components associated with OAM. Functional Architecture Meaning Term ------------------ ------------------ Client/server A term referring to the transparent (relationship between transport of a client (ie higher) layer networks) layer link connection by a server (ie lower) layer network trail. Link connection A partition of a layer N trail that exists between two logically adjacent switching points within the layer N network. LSP Tunnel An LSP Tunnel is an LSP with well- defined source (ingress point) and sink (egress point) Subnetwork A subnetwork is a contiguous topological region of a network delimited by its set of peripheral access points, and is characterized by the possible routing across the subnetwork between those access points. A network is the largest subnetwork and a node is the smallest subnetwork (at least in practical physical terms, though there are smaller sub-networks within nodes). Trail A generic transport entity at layer N which is composed of a client payload (which can be a packet from a client at higher layer N-1) with specific overhead added at layer N to ensure the forwarding integrity of the server transport entity at layer N. Trail termination point A source or sink point of a trail at layer N, at which the trail overhead is added or removed respectively. A trail termination point must have a unique means of identification within the layer network. Harrison et. al. Expires August 2001 Page 4 OAM Functionality for MPLS Networks February 2001 3. Symbols and Abbreviations This list is not exhaustive of all the abbreviations used in this draft. In particular, those in common usage within the MPLS community (like 'MPLS' itself) have been excluded. Abbreviation Meaning --------------- ---------------------------- AIS Alarm Indication Signal BDI Backward Defect Indication CV Packet Connectivity Verification Packet FDI Forward Defect Indication FFS For Further Study OAM Operations and Maintenance P Packets Performance Packets QoS Quality of Service SLA Service Level Agreement TTSI Trail Termination Source Identifier 4. Requirements for MPLS OAM MPLS layer OAM functionality is not a substitute for physical or server layer OAM (e.g., SDH/SONET) or client layer OAM (e.g., IP). MPLS LSPs create layer networks in their own right, and will have defects that are only relevant to the MPLS LSP layer networks. OAM functionality is useful because: 1) It allows the Operator to verify whether Quality of Service guarantees given in SLAs (Service Level Agreements) are in fact being met by the connection. 2) It allows the Operator to reduce networkÆs operating costs, by allowing more efficient detection and handling of defects. Long-term statistics show that the costs of operating a public network are higher than the initial installation costs. 3) It gives support for improved accounting/billing procedures. 4) It helps provide security for customer traffic by the detection of traffic mis-connections (which may otherwise be undetectable). Harrison et. al. Expires August 2001 Page 5 OAM Functionality for MPLS Networks February 2001 The following functions are required: 1) Connectivity Verification of LSPs to confirm that defects do not exist on the target LSPs. 2) Fast and efficient defect detection, notification and localization. 3) Measurement of availability performance. The necessity of additional functions are for further study. In particular, the need for in-service measurement of LSP QoS performance (measurement of packet losses, spurious packets, errored packets, delay and delay variation) is for further study. Note that an LSP needs to be in the available state for QoS assessment to be valid. Defects include following cases: 1) Simple loss of LSP connectivity (due to a server layer failure or a failure within the MPLS layer network); 2) Swapped LSP trails; 3) Unintended LSP mismerging (of 2 or more LSP trails); 4) Unintended replication of LSP packets (of the same LSP trail for example, due to routing loops). 5. Principles of OAM Function The following principles can, for the most part, be applied to any layer networks, ie not just MPLS. This recommendation defines specific embodiments of these principles, as functional OAM entities, for MPLS layer networks. Although it is recommended that all the OAM functional entities are deployed network-wide, operators are free to choose if they wish to apply all or only some of these OAM functional entities (ie CV flows but not P flows), and whether deployment is network-wide or limited in scope to LSPs of certain types, e.g. apply only to important LSPs such as those supporting VPNs. In cases of limited OAM functional entity deployment or scope, then operators should be aware that there could be deficiencies in their ability to detect/handle certain defect cases. 5.1 Client/Server Recursion-Layering A very important functional architecture feature of layer networks is client/server recursion (also known as layering). That is, a client layer link connection (ie a partition of a longer client layer trail between two logically adjacent client layer nodes) is created by a server layer trail. This is the basis of client layer topology construction. This recursion principle extends between various client/server layer relationships and ultimately 'to the duct'. Note also that client layer link connections can be multiple in number, ie a single server layer trail entity can support a multiple number of client layer link connections. Harrison et. al. Expires August 2001 Page 6 OAM Functionality for MPLS Networks February 2001 The key points to note here are: (1) The client and server layer trails termination points will generally not be congruent. And since the trail termination points are associated with the addressable access points of a layer network, it follows that the addressing of the two layers will also generally not be congruent. (2) The 'duct' (or more precisely the environment of physical occupancy and connectivity) is the lowest layer network. The degree of connectivity in this layer effectively defines the degree of independent connectivity in all client layers. This could be put another way, by saying that the availability performance of any client layer network design is determined (and inherited from) the physical infrastructure. This means that if one cannot state which link connections have a common lower server layer trail, then one cannot say anything with certainty about the resilience design of a client layer network. 5.2 OAM Functionality and Layer Independence The OAM functionality of a layer network must not be dependent on any specific server or client layer technology. This is critical to ensure that layer networks can evolve (or new/old layer networks be added/removed) without impacting other layer networks. The control-plane of a given layer network must also have its own OAM. [Note - Control-plane OAM is outside the scope of this draft.] 5.3 Defects All the major defect conditions must be identified with in-service measurable entry and exit criteria, and all consequent actions must be specified. The entry and exit criteria of various defects should be temporally harmonized as far as possible to simplify trail defect-state processing. Attention should be paid to relating the defect entry/exit criteria to æshort-breaksÆ, which are generally accepted by many operators as 3-9s periods of gross signal disturbance from which the network may self-recover. If the event lasts for >=10s this is the normally accepted threshold for entering the unavailable state (also see the next item). 5.4 Availability The most important performance metric of a trail (or a subnetwork partition thereof) is availability. This means that the entry and exit criteria for the available state must be defined. It is also important to understand how unavailable/available state transitions relate to the stopping/starting of the aggregation of available Harrison et. al. Expires August 2001 Page 7 OAM Functionality for MPLS Networks February 2001 state QoS metrics; noting that from pragmatic considerations this may be effectively applied at an earlier point to preserve the integrity of the available state metrics, e.g. after 3s say, which marks the onset of (at least) a short-break, and which from operational experience is a good practical rule-of-thumb for setting a point beyond which a network is unlikely to self-recover. 5.5 Decoupling of User behavior from Connectivity Assessment User traffic behavior must not be a factor in connectivity status assessment. In practical terms, this means decoupling user traffic behavior from all defects and (the dependent) available state entry/exit criteria. 5.6 Forward and Backward Defect Indicators The node in the layer network, which first detects a defect (sourced from within that layer), should apply a well-known 'Forward Defect Indication' (FDI) signal in the downstream direction. In the majority of current transport network technologies such a signal has been termed AIS (Alarm Indication Signal). At the trail termination point where the appropriate FDI signal is generated: (1) There should be a complimentary Backward Defect Indication (BDI) signal (which is removed at the upstream trail termination point) and (2) There must be a mapping of the FDI signal from the server layer to the appropriate FDI signal of the client layer(s) as part of the server->client adaptation process. The primary purpose of the FDI signal is to suppress client layer alarms (which would otherwise create an 'alarm storm' in places which could be geographically and organizationally far removed from the originating defect source location). Three secondary purposes of FDI (and in some cases BDI) are: (1) To allow correct processing of available state performance metrics. (2) To inform applications that the connection is no longer functioning correctly and to take appropriate action, e.g. perhaps invoke a 're-connect' action, or in the case of voice perhaps mute the speech path. (3) To inform client layer trails (e.g. nested LSPs in the case of MPLS) that a defect has occurred in a lower server layer trail, and hence to provide some indication that protection-switching in the affected client layer trails could be postponed to give the server layer trail an opportunity to effect protection switching. FDI/BDI signals should also provide information on the defect location and type. Such information is very useful to the lead Harrison et. al. Expires August 2001 Page 8 OAM Functionality for MPLS Networks February 2001 operator in a co-operating domain scenario, and can also differentiate failures, which are internal or external to public and private domains. Note that, if being used, the BDI signal must be generated (in the backward direction) in response to detecting a defect at a trail sink termination point (in the forward direction) and not from some intermediate point, such as where the defect might be actually located. The reasons for this are that: (1) In the case of bi-directional trails and unidirectional defects, each trail direction might not be congruently routed. (2) In the case of unidirectional trails the BDI signal may be provided out-of-band, e.g. perhaps via a control-plane or management-plane mechanism. [Note: The exact means for providing the BDI functionality in this is FFS] The above requirements mean that the FDI/BDI architecture is valid for all routing cases. 5.7 Connectivity Verification An essential characteristic of the trails in a layer network is that their trail termination points must have a unique identifier (at least within that layer network). However, on link connections between nodes within the layer network, relative identifiers are commonly used for traffic forwarding. These relative identifiers only have to be unique per interface, e.g. the VPI/VCI of ATM, the DLCI of FR, the ælabelÆ of MPLS. When relative identifiers are used for traffic forwarding there is a possibility of trail misconnectivity due to defects. These cover a variety of connectivity failure modes, including: 1) Simple loss of continuity (due to a server layer failure or a failure within the layer network considered); 2) Swapped connections; 3) Unintended mismerging (of 2 or more trails); 4) Unintended replication (of the same trail due, for example, to routing loops). Although some of these defects may be rare in practice, unless detected/corrected their consequences can be very severe for an operator; ranging from simple availability/QoS SLA violations through to more serious security, censorship and mis-billing implications. It is therefore required that a unique trail source identifier be periodically transmitted from the trail source to the trail sink to detect these types of defect. Harrison et. al. Expires August 2001 Page 9 OAM Functionality for MPLS Networks February 2001 5.8 Customers Should not be Used as Defect Detectors The OAM tools provided should ensure (as far as reasonably practicable) that customers should not have to act as failure detectors for the operator. 5.9 The Reliability of OAM Functionality Under Fault Conditions Under fault conditions a layer network cannot, by definition, be expected to behave in a predictable manner. Therefore care should be exercised when specifying and using OAM functions that require a layer network to function in a reliable and predictable manner for fault diagnosis. 6. Mechanisms of MPLS OAM 6.1 Special MPLS Label Values The label structure defined in [1] indicates a single label field of 20 bits. Label field values 0-3 have already been reserved for special functions. A special label, the 'OAM Alert Label', is defined as follows: Table 1: OAM Alert Label Label value (Decimal) Meaning ------------ ----------------------- 4 OAM Alert Label. This indicates that the first octet following the OAM Alert Label [Note: this value is in the OAM payload (ie octet 5) is an OAM yet to be officially Function Type field whose value defines assigned by IANA] the type of defect handling OAM function (ie CV, P, FDI or BDI), which follows in the payload area. All OAM packets must have a minimum payload length of 40 octets to facilitate ease of processing. This is achieved by padding with all 0s when necessary. All padding bits are reserved for future operator defined usage. 6.2 Handling of Errored OAM Packets Each OAM packet uses a BIP16 (in the last two octets of the OAM payload area) to detect errors. The BIP16 is computed over all the fields of the OAM payload, including the initial octet, which Harrison et. al. Expires August 2001 Page 10 OAM Functionality for MPLS Networks February 2001 specifies the Function Type and the BIP16 bit positions (which are all pre-set to zero for initial calculation purposes). BIP16 processing must be performed on all OAM packets prior to being able to reliably pass their payload for further processing. Any OAM packets that show a BIP16 violation upon reception processing should be discarded. In the case of the CV packet flow, persistent BIP16 violations will cause a Loss of Connectivity Verification; this defect is defined later, but for now we can note that it would occur after nominally 3s. This behavior is consistent with the nature of the defect. However, it is recommended that at a local equipment level some notification is given to the Network Management System to indicate that BIP16 discards are occurring. In the case of the other OAM packet types, ie the FDI, BDI and P packets (these are defined later), it is again recommended that at a local equipment level some indication is given to the Network Management System that BIP16 discards are occurring. The threshold to be used for recording/reporting such BIP16 discard activity for these OAM packets should be programmable, and is outside the scope of this Recommendation. 6.3 Label Stack Overhead Encoding Rules for OAM Packets 6.3.1 For CV OAM Packets CV OAM packets are differentiated from normal user-plane traffic by an increase of one in the label stack depth at a given LSP level at which they are inserted. Therefore, they maintain this label stack difference of one (from normal user-plane traffic) as they traverse any lower layer server LSPs. The OAM Alert Labeled header is added before (ie below) the normal user-plane forwarding labeled header at the LSP trail source point. The S bit is set only in the OAM Alert Label. The CV OAM packet can be used on both E-LSPs and L-LSPs. However, the coding of the EXP field is different in the two cases. In the case of L-LSPs, the coding of the EXP field should be set to all 0s in both the OAM Alert Labeled header and the preceding normal user-plane forwarding header. This is to ensure the CV OAM packets have a Per Hop Behavior (PHB), which ensures the lowest drop probability [2]. In the case of E-LSPs, the coding of the EXP field should be set to all 0s in the OAM Alert Labeled header and to whatever is the 'minimum loss-probability PHB' in the preceding normal user-plane forwarding header for that E-LSP. This is again to ensure the CV OAM packets have a PHB, which ensures the lowest drop probability [2]. Harrison et. al. Expires August 2001 Page 11 OAM Functionality for MPLS Networks February 2001 The TTL field should be set to 1 in the OAM Alert Labeled header. The reasons for this are: ¸ CV OAM packets should never travel beyond the LSP trail termination sink point at the LSP level they were originally generated (noting that they are not examined by intermediate label-swapping LSRs, and are only observed at LSP sink points), and ¸ The TTL of the immediately prior normal user-plane forwarding header is used to mitigate against damage from looping packets. 6.3.2 For P OAM Packets The label stack overhead encoding rules of performance P OAM packets are FFS. 6.3.3 For FDI and BDI OAM Packets FDI and BDI OAM packets are invoked, on a nominal 1 per second basis, when defects are detected. The FDI packet traces forward and upward through any nested LSP stack. The BDI packet is sent backwards towards its peer-level LSP trail termination sink point in the reverse direction (assuming a bi-directional in-band LSP exists) for each LSP at and above the level of the defect. The OAM Alert labeled header is inserted before (ie below) a normal user-plane forwarding labeled header, and a label stack of 2 is only ever required for either the FDI or BDI packet at their origin. Note that in the case of FDI, it is assumed that the server->client LSP adaptation mappings that were in existence prior to the failure are recursively used to ensure correct FDI forwarding. It is therefore important that the LSP sink point remembers any server- >client LSP labels mappings that were in existence prior to the failure. Although the exact means for achieving this are outside the scope of this Recommendation, some examples of how these server- > client layer label mappings could be configured are as follows: ¸ Manually, via the NMS say; ¸ Automatically on LSP set-up via extensions to LDP/RSVP signaling; ¸ By an automatic 'learning process', i.e. if, during the establishment of the client LSPs, the signaling is tunneled trough the server layer, then the server trail terminating node could keep the information about the established LSPs in memory as they occur. When server->client layer LSP relationships are changed (e.g. existing client layer LSP removed, or new client LSP added say), then it is important that the server->client label mappings are also updated to reflect the new relationships. Harrison et. al. Expires August 2001 Page 12 OAM Functionality for MPLS Networks February 2001 The S bit is set only in the OAM Alert Labeled header. The FDI OAM packet is recursively mapped upwards, through a client/server adaptation process at LSP trail termination sink points, into any further affected higher client layer LSPs. When this arrives at the top LSP it needs to be mapped into an equivalent FDI for whatever client layer is then being carried. In the case of IP (or indeed any other client layer), this is outside the scope of this document. Note that higher level LSPs will also see failures (as a result of corruption of their own CV flow) but they will also see an incoming FDI OAM packet flow from the lowest level LSP where the failure originates. This dynamic behavior allows for correct identification of the true source of the defect and is explained in more detail later. But for now it is sufficient to note that the incoming FDI is needed to: ¸ Suppress unnecessary alarms in the affected higher layer LSPs. ¸ Give an indication to affected higher-level LSPs that they may need to hold-off protection switching as the defect is at a lower level LSP. ¸ To allow the appropriate BDI coding at the affected higher layer. It is assumed that when a BDI OAM packet is returned in-band it follows a bi-directional LSP and, like the CV and P OAM packets, that it should never travel beyond the LSP trail termination sink point (of the return LSP). The coding of the EXP field associated with the OAM Alert Labeled header and the preceding normal user-plane forwarding labeled header at the LSP level at which the FDI or BDI is inserted is the same as that previously described for the CV OAM packet. The TTL field should be set to 1 in the OAM Alert Labeled packet header. The reasons for this are: ¸ The FDI OAM packet is recursively regenerated at each LSP trail termination sink point into all affected client layer LSPs (if any); so the TTL field is recursively regenerated with a value of 1; ¸ The BDI OAM packet should never travel beyond the LSP trail termination sink point of the return LSP at the LSP level that it was originally generated; ¸ The TTL of the immediately prior normal user-plane forwarding header is used to mitigate against damage from looping packets. 6.3.4 MPLS OAM Function Types for the OAM Alert Label The first octet of the OAM packet payload specifies the OAM Function Type as follows: Table 2: OAM Function Types Harrison et. al. Expires August 2001 Page 13 OAM Functionality for MPLS Networks February 2001 OAM Function Type First octet of OAM packet payload codepoint (Hex) Function Type Purpose ----------------- ---------------------------------- 00 Reserved 01 CV (Connectivity Verification). Used to detect/diagnose all types of LSP connectivity defect (sourced either from below or within the MPLS network). This will be the main in- service OAM defect detection tool. 02 P (Performance). Used to measure user-plane loss of packets and their aggregate octets. 03 FDI (Forward Defect Indicator). This is generated by an MPLS node detecting any defect (defined later) and inserted into affected client layers. Its primary purpose is to suppress alarms being raised within affected higher level client LSPs and (in turn) their client layers. It includes fields to indicate the nature of the defect and its location. 04 BDI (Backward Defect Indicator). This is generated at a return LSP trail termination source point in response to a defect being detected at a LSP trail termination sink point in the other direction. The defect type and location codepoints of the complimentary FDI are mapped into similar fields of the BDI. The BDI may be realized either in the user- plane if bi-directional LSPs are being used (the case considered in this document) or out-of-band (e.g. via management-plane function) in the case of uni-directional LSPs. The latter scenario is outside the scope of this document. All other OAM Function Type codepoints are reserved for possible future standardization. 6.4 MPLS OAM Packets Harrison et. al. Expires August 2001 Page 14 OAM Functionality for MPLS Networks February 2001 6.4.1 Connectivity Verification (CV) Packets 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Func Type (1) | (must be 0) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + + | Ingress Router ID | + + | | + + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | LSP ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | \\ Reserved (0) 14 bytes \\ | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | BIP 16 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 1: CV Payload Structure The intention is that the CV OAM packet is transmitted from the LSP trail termination source point at a nominal rate of 1 CV per second. It is important that the rate of CV OAM packet generation is constant so that simple and deterministic defect processing can be carried out at the LSP trail termination sink point. CV OAM packets within a given LSP are not synchronous to any other CV OAM packets in any other LSP (this includes all nested LSPs, and CV OAM packets from the remote end of an LSP at level N but in the other direction when bi-directional LSPs at level N are being used). The structure of the LSP Trail Termination Source Identifier (TTSI) is defined by using a 16 octet Router ID IPv6 address plus a 4 octet LSP Tunnel ID [3]. Note that the first 2 octets of the LSP Tunnel ID are currently padded with all 0s to allow for any future increase in the Tunnel ID field. For nodes that do not support IPv6 addressing, an IPv4 address can be used for the Router ID using the format described in RFC1884 [4]. Harrison et. al. Expires August 2001 Page 15 OAM Functionality for MPLS Networks February 2001 That is: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + + | (0) | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | (FF) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IPv4 Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 2: IPV6 Compatible IPV4 Address On LSP establishment the LSP trail termination sink point should be configured with the expected TTSI (Ingress router ID + LSP ID). Ideally this should be done automatically via LSP signaling at LSP set-up time (e.g. via a CR-LDP or RSVP control-plane mechanism), but it could also be configured manually. The mechanism for achieving this configuration is outside the scope of this Recommendation. 6.4.2 Performance ôPö Packets The structure of the P OAM packet is FFS. 6.4.3 Forward defect Indicator ôFDIö packets 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Func Type (3) | (must be 0) | Defect Type | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Defect Location | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | \\ Reserved (0) 30 bytes \\ | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | BIP 16 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 3: FDI Payload Structure The FDI is sent downstream from the first node detecting the defect. In the case of MPLS server layer failures (i.e. in a lower layer technology such as SDH) this would be the first MPLS node downstream of the server layer failure (as a consequence of the appropriate client/server adaptation of the server FDI signal). In the case of MPLS layer failures (i.e. failures within the MPLS fabric) this Harrison et. al. Expires August 2001 Page 16 OAM Functionality for MPLS Networks February 2001 would be the first LSP trail termination sink point at the same LSP level as the failure. The primary function of the FDI is to stop downstream client layer alarm storms and hence correctly focus the attention of Operational personnel. However, FDI can also have an important role in: ¸ Facilitating correctly targeted nested LSP protection schemes, i.e. one would want a lower level (server) LSP to protection switch before a higher level (client) LSP if the fault was sourced from within the lower level LSP, and ¸ Identifying availability/short-break events and hence suspend up-state QoS metric aggregation. The format of the Defect Location field and its handing at inter domain NNI boundaries is FFS. The Defect Type field is set at 2 octets here. This is currently considered sufficient, but it should be confirmed once all the Defects Types have been identified and fully specified. A candidate set of Defect Types and their codepoints are given later. The handling of the Defect Type field at inter domain NNI boundaries is FFS. However, 2 octets have been reserved for this function. When a FDI is to be passed from a server layer LSP to its client layer LSP(s) (ie at the client/server adaptation function following the server layer LSP trail termination sink point), the Defect Location and Defect Type field should be copied from the server layer LSP FDI into the client layer LSP(s) FDI. The mapping of MPLS layer sourced FDI from the highest-level LSP into its client layer (e.g. IP) is outside the scope of this document. 6.4.4 Backward Defect Indicator ôBDIö 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Func Type (4) | (must be 0) | Defect Type | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Defect Location | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | \\ Reserved(0) 30 bytes \\ | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | BIP 16 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 4: BDI Payload Structure Harrison et. al. Expires August 2001 Page 17 OAM Functionality for MPLS Networks February 2001 For the case of bi-directional LSPs, the BDI is sent from the LSP trail source point of the return LSP as a mirror of the appropriate (see Note) FDI at the LSP trail sink point of the other direction. The Defect Location and Defect Type fields are a direct mapping of those sets in the appropriate (see Note) FDI and have identical formats as described previously for the FDI OAM packet. Note - The word 'appropriate' here signifies that any incoming FDI (i.e. from a lower layer) takes precedence over any FDI that would have been generated at the layer being considered due to detecting defects at this layer (where these defects are only consequential as a result of a lower layer defect). The BDI does not propagate beyond its return LSP trail termination sink point, and it is discarded at that point after any processing based its observation is carried out, e.g. for single-ended short- break and/or availability measurements. 6.5 Defect Types and their Entry/Exit Criteria 6.5.1 Defect Type Codepoints The following coding structure is proposed for the various defect types so far identified: Table 3: Defect Types DT code in FDI/BDI OAM packets (Hex) Note: first octet indicates layer and Defect second octet Type indicates defect Meaning ------- -------------------- ------------------------ dServer 01 01 Any server layer defect arising below the MPLS layer network. It is not suggested that these are individually identified and defined for each type of server layer, since this function is only appropriate to the server layer itself. Hence, we only need an indication that it is the server layer and not the MPLS layer. dLOCV 02 01 Simple Loss of Connectivity Verification due to missing CV OAM packets with expected TTSI. Note that if the cause Harrison et. al. Expires August 2001 Page 18 OAM Functionality for MPLS Networks February 2001 of dLOCV is the server layer (ie there is also an incoming FDI signal from the server layer) then the DT codepoint 01 01_H is used. The dLOCV codepoint 02 01_H is only used for MPLS layer simple connectivity failures only. dTTSI 02 02 Trail Termination Source Identifier Mismatch due to an unexpected TTSI observed in the incoming CV OAM packets. This detects swapped connections and unintended mismerging failures, which can be differentiated by noting whether an expected TTSI is also missing or present respectively. Note that in the case of the former (ie swapped connections), the dTTSI defect condition takes priority over the dLOCV defect condition, which is also present. dLoop 02 03 This detects an unintended replication Looping defect from observation of an increased rate of expected CV OAM packets above the nominal 1/sec. (Note this defect is added for completeness, but it is expected to be rare) dUnknown 02 FF Unknown defect detected in the MPLS layer. This is expected to be used for MPLS nodal failures, which are detected within the node (probably by proprietary means) and affect user-plane traffic. None 00 00 Reserved None FF FF Reserved There are 3 MPLS layer user-plane defects, ie dLOCV, dTTSI and dLoop, which we now define in more detail. Harrison et. al. Expires August 2001 Page 19 OAM Functionality for MPLS Networks February 2001 6.5.2 dLOCV Entry Criteria Entry to the dLOCV condition, and hence entry to the LSP Trail Sink Near-End Defect State, occurs when there are no expected CV OAM packets observed in any period of 3 consecutive seconds. In terms of consequent actions: ¸ If there is an incoming FDI signal from a server layer below the MPLS network, then this is mapped to the DT codepoint 01 01_H in the FDI OAM packets sent forwards and the BDI OAM packets sent backwards. The local DL codepoint is also inserted in these FDI and BDI OAM packets. There are no alarms associated with the MPLS layer itself but only the server layer, which sourced the FDI signal. Else: ¸ If there is an incoming FDI signal from a lower level LSP within the MPLS network, then that FDI signal's DL/DT codepoints are mapped into the FDI sent to any further client layers (i.e. suppresses generation of FDI DL/DT codepoints from this point) and the BDI OAM packet sent backwards. There are no alarms generated regarding this LSP (the alarm will be associated with the lowest layer LSP within which the defect originated). Else: ¸ If there is no FDI signal incoming from the server layer or a lower level LSP AND there are no CV OAM packets observed with an unexpected TTSI which give rise to the dTTSI defect, then the DT codepoint 02 01_H is inserted in the FDI OAM packets sent downstream and the BDI OAM packets sent upstream. The local DL codepoint is also inserted in these FDI and BDI OAM packets. A local alarm is raised relevant to this defect condition. Note: (i) Since OAM packet flows are not synchronized in LSPs at different hierarchical levels (ie when LSPs are nested), there is a possibility that a client layer LSP detects a defect before its server layer LSP. This error could be up to 1s due to CV packet arrival time differences plus some additional uncertainty due to network delay effects. This could result in an error of judgment as to the type of defect that is present and hence which consequent actions are appropriate; especially whether the raising of a local alarm is appropriate and the correct setting of the DL and DT codepoints in FDI/BDI OAM packets. To mitigate this effect, it is recommended that the raising of an alarm is deferred for at least 2 Harrison et. al. Expires August 2001 Page 20 OAM Functionality for MPLS Networks February 2001 seconds after a defect state is detected (the exact value is FFS). This will also allow the network to settle into a stable state as regards defect detection behavior. (ii) The starting/stopping of aggregation of any LSP user- plane packet/octet loss metrics (e.g. if using the P OAM packet say) is dependent on whether the LSP is in the available or unavailable state. 6.5.3 DTTSI Entry Criteria Entry to the dTTSI condition, and hence entry to the LSP Trail Sink Near-End Defect State, occurs when there are >= 2 CV OAM packets observed in any period of 3 consecutive seconds each with an unexpected TTSI. Any expected CV OAM packets or any incoming FDI signals (from either the server layer or a lower level LSP) are ignored, and it should be noted that the dTTSI defect overrides the dLOCV defect if both are present (as would be the case, for example, with swapped LSPs). The DT codepoint 02 02_H is inserted in the FDI OAM packets sent forwards and the BDI OAM packets sent backwards. The local DL codepoint is also inserted in these FDI and BDI OAM packets. A local alarm is raised relevant to this defect condition and the unexpected TTSI captured locally (this may also be optionally sent to the NMS as an exception report say). The downstream traffic must also be suppressed. Note: (i) Since OAM packet flows are not synchronized in LSPs at different hierarchical levels (ie when LSPs are nested), there is a possibility that a client layer LSP detects a defect before its server layer LSP. This error could be up to 1s due to CV packet arrival time differences plus some additional uncertainty due to network delay effects. This could result in an error of judgment as to the type of defect that is present and hence which consequent actions are appropriate; especially whether the raising of a local alarm is appropriate and the correct setting of the DL and DT codepoints in FDI/BDI OAM packets. To mitigate this effect, it is recommended that the raising of an alarm is deferred for at least 2 seconds after a defect state is detected (the exact value is FFS). This will also allow the network to settle into a stable state as regards defect detection behavior. (ii) The starting/stopping of aggregation of any LSP user-plane packet/octet loss metrics (e.g. if using the P OAM packet say) is dependent on whether the LSP is in the available or unavailable state. 6.5.4 dLoop Entry Criteria Harrison et. al. Expires August 2001 Page 21 OAM Functionality for MPLS Networks February 2001 Entry to the dLoop condition, and hence entry to the LSP Trail Sink Near-End Defect State, occurs when there are >= 5 CV OAM packets observed in any period of 3 consecutive seconds each with an expected TTSI. The DT codepoint 02 03_H is inserted in the FDI OAM packets sent forwards and the BDI OAM packets sent backwards. The local DL codepoint is also inserted in these FDI and BDI OAM packets. A local alarm is raised relevant to this defect condition. Note: (i) Since OAM packet flows are not synchronized in LSPs at different hierarchical levels (ie when LSPs are nested), there is a possibility that a client layer LSP detects a defect before its server layer LSP. This error could be up to 1s due to CV packet arrival time differences plus some additional uncertainty due to network delay effects. This could result in an error of judgment as to the type of defect that is present and hence which consequent actions are appropriate; especially whether the raising of a local alarm is appropriate and the correct setting of the DL and DT codepoints in FDI/BDI OAM packets. To mitigate this effect, it is recommended that the raising of an alarm is deferred for at least 2 seconds after a defect state is detected (the exact value is FFS). This will also allow the network to settle into a stable state as regards defect detection behavior. (ii) The starting/stopping of aggregation of any LSP user-plane packet/octet loss metrics (e.g. if using the P OAM packet say) is dependent on whether the LSP is in the available or unavailable state. 6.5.5 dLOCV, dTTSI and dLoop exit criteria Exit of the dLOCV, dTTSI or dLoop condition, and hence exit of the LSP Trail Sink Near-End Defect State, occurs when there are: ¸ >= 2 but <= 4 CV OAM packets observed each with an expected TTSI, AND ¸ No CV OAM packets observed with an unexpected TTSI in any period of 3 consecutive seconds. Note that the numbers of CV OAM packets observed each with an expected TTSI are a suggested number. It must be further studied if these numbers are appropriate. All the consequent actions invoked when entering the LSP Trail Sink Near-End Defect State (i.e. sending of FDI and BDI OAM packets, the raising of local alarms and the suppression of traffic in the dTTSI case only) are stopped when we exit the LSP Trail Sink Near-End Defect State. Note û The starting/stopping of aggregation of any LSP user-plane packet/octet loss metrics (e.g. if using the P OAM packet say) is Harrison et. al. Expires August 2001 Page 22 OAM Functionality for MPLS Networks February 2001 dependent on whether the LSP is in the available or unavailable state. 6.6 Available and unavailable state processing The main purpose of defining harmonized defect entry/exit criteria as noted above is in order to significantly simplify: ¸ Near-end/far-end LSP Trail Sink Defect State processing; ¸ Near-end/far-end LSP Available State processing (which will shortly be discussed); ¸ The decision point at which any LSP user-plane traffic QoS metrics (if being collected) are stopped/started with respect to aggregation into long-term registers. In all sections where the evaluation of events is described, the measurement technique is based on a sliding-window with a 1 second granularity of advance. Note that the datum for the commencement of the sliding window is an arbitrary point in time decided by the each node independently and is not synchronized to OAM packet arrival events on any LSPs. This is deemed acceptable to allow simpler nodal processing. It should be noted that this Recommendation uses the traditional functional dependency relationship between QoS and availability. That is: ¸ QoS is a unidirectional metric, ie if QoS metrics are being measured then each direction is measured independently. ¸ Availability is a bi-directional metric in the case of bi- directional LSPs, in the sense that if any direction enters the unavailable state (defined later) then both directions are deemed to be unavailable. In the case of unidirectional LSPs, then availability can only have unidirectional significance. ¸ QoS measurements must be suspended (as regards aggregation into long-term available state registers) if an LSP enters the unavailable state; noting that this means the QoS measurements of both directions from the definition of the availability metric above in the case of bi-directional LSPs. However, it should also be noted that (for both pragmatic reasons and to preserve their statistical significance) QoS metric aggregation is actually suspended after detecting a short-break event. 6.6.1 Short Break definition We first define a short-break event. This is defined as a period where the entry and exit to any of the previously defined defect conditions both occur within 9s, ie the LSP Trail Sink Near-End Defect State lasts for <= 9s. The start of the short-break occurs at the beginning of the defect entry criteria and the end of the Harrison et. al. Expires August 2001 Page 23 OAM Functionality for MPLS Networks February 2001 short-break occurs at the beginning of the defect exit criteria. Clearly this has a minimum period of 3s. Short-breaks are only defined to exist when the LSP is in the Available State. Note û Short-breaks are more common than many people realize (in one operator's network a study of SES (Severely Errored Second) events showed that about 50% of these would have been classified as short- breaks). They can cause severe disruption to some applications and are therefore an important performance metric (perhaps second in importance after availability). Since they exist at the physical layers they will exist (by inheritance) in client layers, such as MPLS and IP. An important property of the short-break, which we will exploit, is that it yields a pragmatic harmonized threshold for defect evaluation (across all defect types as noted previously) and the stopping/starting of QoS metric aggregation into long-term up- state performance registers. 6.6.2 Available/Unavailable State Definition If the LSP Trail Sink Near-End Defect State exceeds 10 consecutive seconds in duration then the LSP enters the Unavailable State. The start point of the Unavailable State is deemed to be at the beginning of these 10 consecutive seconds. We therefore no longer have a short-break (and the event should not be registered as such). A LSP re-enters the Available State after first exiting the LSP Trail Sink Near-End Defect State and there has been an aggregate period of 10 consecutive seconds in which there have been: ¸ >=9 and <= 11 CV OAM packets each with an expected TTSI, AND ¸ No CV OAM packets with an unexpected TTSI. Note that the numbers of CV OAM packets observed each with an expected TTSI are suggested numbers. It must be further studied if these numbers are appropriate. The start point of the Available State is deemed to be at the beginning of these 10 consecutive seconds. 6.6.3 Near-end and Far-end Measurements of Availability All of the above discussion is strictly only relevant to the near- end processing when the LSP trail termination sink point is in the LSP Trail Sink Near-End Defect State as discussed previously. We can also measure the far-end availability behavior (useful when only a single end is accessible for measurement) by using the BDI signal (when bi-directional LSPs are being used) since this is a reflected upstream mirror of the duration over which FDI is sent downstream. We therefore define the LSP Trail Sink Far-End Defect State to be the period over which BDI OAM packets are observed subject to the following entry and exit criteria: Harrison et. al. Expires August 2001 Page 24 OAM Functionality for MPLS Networks February 2001 ¸ Entry of the LSP Trail Sink Far-End Defect State occurs on the first BDI OAM packet observed. ¸ Exit of the LSP Trail Sink Far-End Defect State occurs after a period of 3 consecutive seconds in which no BDI OAM packets have been received. Note that this 3s processing delay on exit is to cater for cases in which perhaps a single BDI is lost (say due to congestion or errors). Its effect must be catered for in the far-end processing state machine as discussed later. Since we have fixed the temporal duration of the far-end state to be directly related to the near-end state (albeit with a +3s exit checking period) we can therefore measure both short-breaks and unavailability of both directions from a single end (on the assumption that bi-directional LSPs are being used). 6.6.4 Near-End State Processing Flow-chart The following figure summarizes many of the key points regarding the near-end state-processing algorithm for a given LSP. Figure 5: LSP Near-End State Processing Flow Chart 1. Assume we start in the available state in the box marked æStartÆ. All timers (shown later) can conceptually be assumed reset at this point. If there are any QoS metrics being collected (e.g. packet/octet loss measurements from the P OAM packet) then this is assumed to be active at this time. 2. The first decision box is ædLOCV, dTTSI or dLoop?Æ. These defects were defined previously. If none of these defects are present we keep checking for this condition and stay in the available state. However, if one of these defects is present we enter the Trail Sink Near-End Defect State. 3. The consequent actions now required depend on the nature of the defect observed, and whether there is any incoming FDI from a lower layer, and should follow the rules given previously. But note that any QoS metrics, which are being collected, are suppressed from aggregation into the long-term registers against available time. The registers are effectively backdated 3s to allow for the defect detection time (at this stage we cannot judge whether the event will be a Short-Break, and hence the LSP remains in the Available State, or whether the LSP will enter the Unavailable State). 4. We now start timer T1. This timer is used to determine the duration of the Trail Sink Near-End Defect State, and if this persists for a sufficient time (ie a further 10s) then this timer is used to branch the flow-chart into the Unavailable State processing region. 5. Below (timer) T1, we loop round the decision boxes æT1<10s?Æ and æEnd dLOCV, dTTSI or dLoop?Æ. We can exit this loop if the defect state ends (in accordance with criteria given Harrison et. al. Expires August 2001 Page 25 OAM Functionality for MPLS Networks February 2001 previously) before T1 reaches 10s. Since we are still in the available state, we restart any QoS metric aggregation into the long-term registers (noting the last 3s must be accounted for), we stop FDI/BDI OAM packet generation and capture the short- break event in the local registers. Additionally, if the event was due to a dTTSI, then we should also capture the TTSI of the offending LSP and cease the suppression of traffic. The timestamp of the event should be related to the onset of the defect, which caused it. If however T1 reaches 10s we enter the Unavailable State. Note that it is not possible to enter the Unavailable State unless the Trail Sink Near-End Defect State has persisted for at least 10s in the Available State. 6. We now record a date/time-stamped Unavailable State entry event in the local registers together with information on the nature of the defect, which caused it. Note that the date/timestamp must be backdated 13s. Optionally, we may also send an exception report to the NMS with the Unavailable State entry date/timestamp noted above, together with any other relevant information about the defect which caused it, e.g. in the case of dTTSI this should include the TTSI of the offending LSP. We now stop timer T1 and start timer T2, whose purpose is to record the duration of the Unavailable State. Note that when we enter the Unavailable State we also remain in the Trail Sink Near-End Defect State. 7. We now run round a decision box æEnd dLOCV, dTTSI or dLoop?Æ, which is just below the point where we started timer T2, which checks for the end of the defect state. When the defect ends (in accordance with the criteria given previously) we stop FDI/BDI OAM packet generation and exit the Trail Sink Near-End Defect State. Any QoS metric aggregation is still inhibited. 8. We now run round the decision loop comprised of the two boxes æ>=9 but <= 11 expected CV OAM packets in last 10s AND no unexpected CV OAM packets' and ædLOCV, dTTSI or dLoop?Æ. If a further defect occurs before we meet the exit criteria of the former decision box, we re-enter the Trail Sink Near-End Defect State and hence restart the generation of FDI/BDI OAM packets (with DL/DT codepoints and other consequent actions relevant to the specific defect observed). Any QoS metric aggregation continues to be inhibited. In this case we are back at point 7 above in the state processing and recommence checking for the end of the defect. Note that timer T2 continues to run. 9. To get out of the Unavailable State we must first have exited the Trail Sink Near-End Defect State as noted in 7 above, and then met the criteria of the decision box æ>=9 but <= 11 expected CV OAM packets in last 10s AND no unexpected CV OAM packets?Æ as noted in 8 above. Note that the ælast 10sÆ referred to here includes the 3s interval required to check for the end of the Trail Sink Near-End Defect State as noted above in item 7. 10. We now stop timer T2 and record the duration of the unavailability event in the local registers. We recommence any QoS metric aggregation into the local registers and cease all consequent actions associated with the Unavailable State. Note Harrison et. al. Expires August 2001 Page 26 OAM Functionality for MPLS Networks February 2001 that T2 will record Unavailable State duration, which is 3s less than the true unavailability event. Note also that the last 10s belong to the Available State and so any QoS metric aggregation will need to take these 10s into account. Optionally, we may also send an exception report to the NMS with the Unavailable State exit date/timestamp suitably corrected as noted above. 11. This now takes us back to our starting point in the Available State. 6.6.5 Far-End State Processing Flow-chart The following figure summarizes many of the key points regarding the far-end state-processing algorithm for a given LSP. Figure 6: LSP Far-End State Processing Flow Chart 1. Assume we start in the available state at the box marked æStartÆ. All timers shown later in the flow chart can conceptually be assumed to be reset at this point. If there is any backward QoS aggregation activated on the return direction LSP then this will be via a separate P OAM packet flow on the return LSP. 2. The first decision box is æBDI OAM packet?Æ. If the answer is 'No', then we keep looping this check condition and stay in the Available State. If the answer is 'Yes', then this implies that the near-end processing at the other end of the (outgoing) LSP has entered the Trail Sink Near-End Defect State. Note that this also implies that the defect has already existed for 3s at the other end of this LSP. 3. We then enter the Trail Sink Far-End Defect State and inhibit any backward QoS metric aggregation. The QoS registers will need to be corrected for the previous 3s, which should not be, aggregated into the long-term Available State counts. 4. We now start timer T3, and run round the loop composed of the decision boxes æT3 <13s?Æ and æ3s BDI-Free?Æ. T3 is used to check the duration of the Trail Sink Far-End Defect State. If T3 does not reach 13s and we get 3s, which are BDI-Free, then we re-start any backward packet level metric aggregation. Note that the last 6s must be accounted for in any backward QoS metric aggregation registers. This arises since it takes the near-end processing 3s to declare the end of the defect at the other send of the (outgoing) LSP, and a further 3s to declare the end of the Trail Sink Far-End Defect State at this end of the (return) LSP, and all this time should count towards the Available State at this end of the LSP to ensure correct QoS metric aggregation. A Short-Break date/time-stamped event should also be recorded in the local registers together the DL/DT information of the defect as given in the BDI OAM packet. This Short-Break event must be date/time-stamped relative to 3s before the time at which the first BDI OAM packet was observed. This now takes us back to the initial start position. If however T3 reaches 13s we enter the far-end Unavailable State. Harrison et. al. Expires August 2001 Page 27 OAM Functionality for MPLS Networks February 2001 Note that it is not possible to enter the Unavailable State unless the Trail Sink Far-End Defect State has effectively persisted for at least 13s (and which means that at the other end of the (outgoing) LSP the Trail Sink Near-End Defect State has persisted for at least 10s) in available time. 5. Optionally, we may now send a date/time-stamped unavailability entry exception report to the NMS, which includes the relevant BDI OAM packet DL/DT information. Note that the date/timestamp of any such exception report should be backdated by 16s (ie 3s prior to the first BDI OAM packet being observed for this event) to align the far-end processing with that of the near- end processing at the other end. We now stop timer T3 and start a timer T4, whose purpose is to record the duration of this unavailability event. Note that when we enter the Unavailable State we also remain in the Trail Sink Far-End Defect State. 6. We now run round a loop that checks for 3s which are BDI-Free. This is used to take us out of the Trail Sink Far-End Defect State. Note that this is not strictly necessary, and this check condition could have been omitted and we could just have shown the following one which checks for a continuous (ie overall) 10s of BDI-Free behavior. However, it has been shown like this to harmonize the ælookÆ of the near-end and far-end Trail Sink Defect State processing. 7. If we get 3s which are BDI-Free then we exit the Trail Sink Far-End Defect State and run a loop which checks if we have had an overall continuous period of 10s which are BDI-Free. If any further BDI OAM packets appear within this overall 10s checking period then we re-enter the Trail Sink Far-End Defect State and need to repeat the process from step 6 above. If, however, no further BDI OAM packets appear within the 10s checking period we exit the far-end Unavailable State. 8. We stop timer T4 and record the duration of the unavailability event. T4 will record a time, which is 3s less than the true unavailability event. A date/time-stamped unavailability exit event, backdated 13s, together with the unavailability duration should now be recorded in the local registers. Optionally, this information may also be sent to the NMS as an exception report. 9. Any backward QoS metric aggregation can now be restarted, noting that the last 13s belong to available time and so the aggregate registers should be corrected accordingly 6.6.6 A pictorial view of near-end and far-end state processing The following figure is given to help clarify the temporal relationships between the near-end and far-end state processing given in the previous flow-charts for short-break event and an unavailability event. Figure 7: Near-End and Far-End Temporal Processing of a Short-Break and Unavailability event Harrison et. al. Expires August 2001 Page 28 OAM Functionality for MPLS Networks February 2001 7. Security Considerations The OAM function described in this document enhances the security of MPLS networks, by detecting mis-connections, and therefore preventing customersÆ traffic to be exposed to other customers. The MPLS OAM functions as defined in this document do not raise any new security issue, to MPLS networks. 8. References [1] Rosen E, et al, RFC 3032, "MPLS label stack encoding". [2] Le Faucheur et al, "MPLS support of Differentiated Services", draft-ietf-mpls-ext-08.txt, work in progress. [3] Awduche et al, "RSVP-TE: Extensions to RSVP for LSP Tunnels", draft-ietf-mpls-rsvp-lsp-tunnel-05.txt, work in progress. [4] Hinden and Deering, RFC 1884, "IP Version 6 Addressing Architecture". 9. Author's Addresses Neil Harrison British Telecom Phone: 44-1604-845933 Heath Bank Email: neil.2.Harrison@bt.com Iugby Road, Harleston South Hampton, UK Peter Willis British Telecom Phone: 44-1473-645178 BT, PP RSB10/PP3 B81 Email: peter.j.willis@bt.com Adastrial Park Martlesham, Ipswich, UK Shahram Davari PMC-Sierra 411 Legget Drive Phone: 1-613-271-4018 Kanata, ON, Canada Email: Shahram_Davari@pmc-sierra.com Ben Mack-Crane Tellabs 4951 Indiana Ave Phone: 1-630-512-7255 Lisle, IL, USA Email: ben.mack-crane@tellabs.com Hiroshi Ohta NTT Y-709A, 1-1 HikarinoÆka phone: 81-468-59-8840 Yokosuka-Shi Email: ohta.hiroshi@nslab.ntt.co.jp Kanagawa, Japan Harrison et. al. Expires August 2001 Page 29