Internet Draft                                      David Allan, editor 
       Document: draft-allan-mpls-oam-frmwk-01.txt                   Mina Azad
                                                               Nortel Networks
       Enrique Cuevas                                            Neil Harrison
       AT&T                                                    British Telecom
       Sanford Goldfless                                             Arun Punj           
       Lucent                                                          Marconi
       Marcus Brunner                                             Chou Lan Pok
       NEC                                                             SBC TRI
       Wesam Alanqar 
       Sprint                                                                                

                                                                 November 2001         

                           A Framework for MPLS User Plane OAM         

       Status of this Memo         

          This document is an Internet-Draft and is in full conformance with 
          all provisions of Section 10 of RFC2026. 
           
          Internet-Drafts are working documents of the Internet Engineering 
          Task Force (IETF), its areas, and its working groups.  Note that 
          other groups may also distribute working documents as Internet-
          Drafts. 
           
          Internet-Drafts are draft documents valid for a maximum of six 
          months and may be updated, replaced, or obsoleted by other documents 
          at any time.  It is inappropriate to use Internet-Drafts as 
          reference material or to cite them other than as "work in progress." 
           
          The list of current Internet-Drafts can be accessed at 
               http://www.ietf.org/ietf/1id-abstracts.txt 
          The list of Internet-Draft Shadow Directories can be accessed at 
               http://www.ietf.org/shadow.html.         

       Copyright Notice         

          Copyright(C) The Internet Society (2001). All Rights Reserved.         

       Abstract 
        
          This Internet draft discusses many of the issues associated with 
          user plane OAM for MPLS. The goal being to provide tools to perform 
          "in service" maintenance of LSPs. Included in this discussion is 
          some of the implications of MPLS architecture on the ability to 
          support fault and performance management OAM applications, potential 
          solutions for distinguishing user plane OAM, and a summary of what 
          the authors believe can be achieved. 
           
          This framework is predicated on requirements described in [HARRISON-
          REQ].  
            
          Allan et.al           Expires January 2002                   Page 1 

                        A Framework for MPLS User Plane OAM    November 2001 
           

       Table of Contents 

        
       1.  Conventions used in this document...............................2 
       2.  Changes since the last version..................................2 
       3.  Motivations.....................................................3 
       4.  Requirements....................................................3 
       5.  Terminology.....................................................3 
       6.  Different deployment scenarios..................................4 
       7.  MPLS architecture implications for OAM..........................5 
          7.1 Topology variations within an MPLS level.....................5 
          7.1.1 Implications for fault management..........................7 
          7.1.2 Implications for performance management....................7 
          7.2 LSP Creation Method..........................................9 
          7.3 Lack of Fixed Hierarchy......................................9 
          7.4 Use of time to live (TTL)...................................10 
          7.5 Other design issues.........................................11 
       8.  OAM Applications...............................................11 
       9.  OAM Messaging..................................................12 
       10.  Distinguishing OAM user plane flows...........................13 
          10.1  Adding an LSP level with arbitrary label..................13 
          10.2  Adding an LSP level with a reserved label value...........14 
          10.3  Header modification.......................................14 
       11.  The OAM return path...........................................14 
       12.  Security Considerations.......................................16 
       13.  A summary of what can be achieved.............................17 
       14.  References....................................................17 
       15.  Author's Addresses............................................18 
        
       1. Conventions used in this document 
        
          The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 
          "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in 
          this document are to be interpreted as described in RFC-2119 [1]. 
           
       2. Changes since the last version 
        
          1) Elaboration on implications of PHP and protocol multiplexing 
          mechanisms (e.g. explicit v4/v6 labels). 
           
          2) Some discussion on the use of per-platform label space. 
           
          3) Discussion of the implications of the ILM mapping to multiple 
          NHLFEs added. 
           
          4) Discussion of the impact of LSP creation technique on the ability 
          to audit an LSP's constituent components added. 

           
          Allan et. al.          Expires April 2002                    Page 2 

                        A Framework for MPLS User Plane OAM    November 2001 
           
          5) Comparison of the concept of "operational domain" and the 
          concepts of horizontal and vertical hierarchy as defined in 
          [HIERARCHY]. 
           
          6) Enhanced discussion of TTL issues. 
           
       3. Motivations         

          MPLS OAM and survivability have been tackled in numerous Internet 
          drafts. However all existing drafts focus on single provider 
          solutions or focus on a single aspect of the MPLS architecture such 
          as RSVP, or LDP.  This leads to inconsistent and inefficient 
          applicability across the MPLS architecture, and/or requires 
          significant modifications to operational procedure in order to 
          provide OAM connectivity. As MPLS matures and relationships between 
          providers become more complex, there is a need to consider 
          deployments that span arbitrary networking arrangements and have a 
          broader and more uniform applicability to the MPLS architecture. 
           
       4. Requirements 

          MPLS user-plane OAM specific requirements and a summary of 
          requirements that have appeared in numerous PPVPN, PWE3, and MPLS 
          documents appears in [HARRISON-REQ]. This Internet draft discusses 
          the implications of extending OAM across the MPLS architecture, and 
          adds additional user-plane OAM requirements and capabilities for 
          managing multi-provider networks. This document also broadens the 
          scope of the requirements discussion in identifying where certain 
          OAM applications simply cannot be implemented without modifications 
          to current practice/architecture.         

       5. Terminology         

          MPLS introduces a richness in layering which renders traditional 
          definitions inadequate. In particular, it is noted that MPLS has no 
          fixed layered hierarchy (this is a unique property that no other 
          technology has offered before). 
           
          A provider may have MPLS peer providers, use MPLS transit from 
          serving providers (and require MPLS or non-MPLS client transport), 
          and offer MPLS transit to MPLS or non-MPLS clients). Further, the 
          same provider may use a hierarchy of LSPs within their own network. 
          Hence this Internet Draft defines the concept of an "Operations 
          Domain" (to cover OAM capabilities operated by a single provider) 
          that may only be a partition of the end-end LSP. Operations Domain 
          functions are an interdependent mix of control-plane, user-plane, 
          and management-plane functions. 
           
          An LSP "of level m" may span numerous Operational Domains 
          (contiguous user plane) while the control and management planes may 
          be disjoint. The goal is to provide OAM functionality for each LSP 
          "of level m" regardless of "m".           

            
          Allan et. al.          Expires April 2002                    Page 3 

                        A Framework for MPLS User Plane OAM    November 2001 
           
          It is possible to have a hierarchy of operators (e.g. carriers of 
          carriers), where overlay Operational Domains are opaque to the 
          serving Domain. Therefore it is required that each LSP "of level m" 
          implement its own OAM functionality, and the OAM applications are 
          confined to the Operational Domains traversed at level "m". 
           
          Note that this concept has subtle differences with concepts of 
          horizontal and vertical hierarchy as defined in [HIERARCHY]. 
          Vertical hierarchy refers to networking layer boundaries 
          distinguished by technology. An operational domain may refer to an 
          operator specific subset of the LSP levels within the MPLS layer. 
          Similarly there is a loose mapping of the concept of operational 
          domain and horizontal hierarchy. An operational domain may be 
          hierarchically partitioned (e.g. OSPF "areas") yet operationally 
          integrated and contiguous.         

       6. Different deployment scenarios         

          At the present time there are a number of deployment scenarios 
          proposed for MPLS each with a number of subtleties from a user plane 
          OAM perspective. Each can be viewed as a characteristic of an 
          operational domain: 
           
          The sparse model: This can be in conjunction with a control plane 
          (e.g. MPLS based traffic engineering applied to an IP network) or 
          with simple provisioned LSPs (no control plane). The key feature 
          being that the MPLS operational domain will most likely not have 
          any-to-any connectivity at the MPLS layer within the operational 
          domain due to the sparse use of LSPs to augment the served layer 
          connectivity. This has operational and scalability implications as 
          OAM connectivity must be explicitly added to the model, or the 
          operator may be obliged to depend on "layer violations" embedded in 
          OAM mechanisms which are strictly only relevant to a higher layer 
          network (e.g. [ICMP]) to generate a return path. 
           
          The ubiquitous model: This model generally combines MPLS, integrated 
          routing and control to produce ubiquitous any-to-any connectivity 
          within an operational domain. This may be combined with a hierarchy 
          of LSPs to modify the topology presented to the client layer. This 
          offers providers the option of utilizing the resources inherent to 
          all planes of the Operational Domain in designing OAM functionality. 
           
          These two models of MPLS connectivity can be stacked or concatenated 
          to support numerous manners of peering and overlay networking 
          arrangements between providers and users. A direct inference being 
          that an operational domain will not necessarily have knowledge of 
          the domains above and below it, and in the general case far less 
          knowledge of (and certainly less control over) its peers. OAM 
          applications for LSPs of a specific level are confined to an 
          operational domain and its user plane peers. 
           
          More recently there is a tendency to overlay a L2 or L3 VPN service 
          level on the user plane of an operational domain, with it's own 
            
          Allan et. al.          Expires April 2002                    Page 4 

                        A Framework for MPLS User Plane OAM    November 2001 
           
          identifiers and addressing, while tunneling control information 
          across the control plane of the operational domain using BGP-4 
          [2547][KOMPELLA] or extended LDP adjacencies [MARTINI][HEINANEN]. 
          From a user plane OAM perspective, we would consider this to be a 
          separate operational domain, and anticipate that it is only a matter
          of time before such service levels evolve to span multiple 
          operational domains (for example, an L2 or L3 VPN that spans 
          multiple providers, or the introduction of tandem points at the user
          plane of the service level).         

       7. MPLS architecture implications for OAM         

       7.1 Topology variations within an MPLS level         

          There are a number of topology variations in the MPLS architecture 
          that have OAM implications. These are: 
           
          - Uni-directional and bi-directional LSPs. A uni-directional LSP 
          only provides connectivity in one direction, and if return path 
          connectivity exists, it is an attribute of the operational domain, 
          and not a unique attribute of the LSP. Bi-directional LSPs or 
          specific return path (e.g. [CHANG]) have inherent symmetrical 
          connectivity as an attribute of the LSP. 
           
          - Multipoint-to-point (MP2P) LSPs are where a single LSP uses 
          "merge" LSR transfer functions to provide connectivity between 
          multiple ingress LSRs and a single egress LSR. There are a number of 
          problems inherent to mp2p topological constructs that cannot be 
          addressed by traditional p2p mechanisms. One issue being that for 
          some OAM applications (e.g. user plane fault propagation) OAM flows 
          may require visibility at merge-points to limit the impact of 
          partial failures or congestion. 
           
          "Best effort" MP2P LSPs may have fairness issues with some packet 
          schedulers. This may complicate obtaining consistent measurements 
          under congestion conditions. Explicitly routed MP2P LSPs with 
          associated resource reservations are significantly more complex. The 
          resource reservations required will be cumulative at merge points, 
          and the ability to provide differentiated handling for specific 
          ingresses disappears. One opinion would be that the complexity and 
          difficulty in the maintenance of ER-MP2P LSPs significantly 
          outweighs the scalability considerations, and would not likely be 
          deployed. 
           
          - Penultimate label popping (PHP), an optimization in the 
          architecture in which the last LSR prior to the egress removes the 
          (supposedly) redundant current MPLS label from the label stack. 
          Therefore the ability to infer LSP specific OAM context is lost 
          prior to reaching the final destination.  
           
          MPLS does not provide for protocol multiplexing via payload 
          identification (with the exception of the explicit IPV4 and IPV6 
          labels). PHP requires that the final hop have a common protocol 
            
          Allan et. al.          Expires April 2002                    Page 5 

                        A Framework for MPLS User Plane OAM    November 2001 
           
          payload (typically IP) or is able to map to lower layer protocol 
          multiplexing capability (e.g. PPP Protocol Field or Ethernet 
          ethertype) as the ability to infer payload from LSP label is lost. 
           
          When a specific queuing discipline is associated with the LSP, such 
          as reserved resources, the outgoing interface at the PHP hop must be 
          able to provide the differentiated packet handling. 
           
          Another scenario where PHP is employed is when the egress LSR is not 
          actually MPLS user plane capable. This has user plane OAM 
          implications in that MPLS specific flows may need to terminate at 
          the PHP LSR. This would include the potential requirement that the 
          PHP LSR proxies OAM functions on behalf of the egress LSR.  
           
          - E-LSPs [MPLSDIFF] in which a single LSP supports multiple queuing 
          disciplines to support multiple behavior aggregates. Ability to use 
          OAM probing on a "per behavior aggregate" basis is critical to 
          managing E-LSPs.  
           
          - Provisioned LSPs, vs. LSPs associated with a control plane. In 
          many scenarios associated with a control plane, the topology of the 
          LSP varies over time. This can be due to many reasons, implicit 
          routing, dynamic set up of local repair tunnels etc. etc. 
           
          - The potential existence of multiple LSPs between an ingress and an 
          egress LSR. This can be for many reasons, L-LSPs, equal cost 
          multipath routing etc. etc. 
           
          - The potential existence of multiple next hop label forwarding 
          entries (NHLFEs) for a single incoming label. This is the scenario 
          whereby the incoming label map (ILM) for an incoming label switch 
          hop (LSH) maps to an inverse multiplex of NHLFEs which may be re-
          merged into a common egress or have multiple egress points. The 
          mechanism for selecting the NHLFE to use may be proprietary and is 
          performed on a packet by packet basis. Similarly such a construct 
          can partially degrade. 
           
          OAM tools not specifically aware of this construct may produce 
          random results (insufficient frequency of failure to trigger 
          threshold detection), or pathologically may only test a portion of 
          the NHLFEs. Similarly performance monitoring is problematic as 
          packets in flight cannot accurately be accounted for. 
           
          - Use (and abuse) of per-platform label space. A per-platform label 
          has significance at a nodal level and not just an interface level. 
          Some of the more interesting applications being the ability to 
          create unsignalled backup LSPs in "bypass tunnels" [SWALLOW]. 
          Traffic arriving on multiple interfaces and/or LSP tunnels may use a 
          common per-platform label and will have a common ILM and NHLFEs. 
          This can have implications similar to MP2P and PHP depending on how 
          it is used; packet origin information is not conserved when multiple 
          sources use a common label.            

            
          Allan et. al.          Expires April 2002                    Page 6 

                        A Framework for MPLS User Plane OAM    November 2001 
           
          - P2MP and MP2MP LSPs (a.k.a. MPLS Multicast) is for further study. 
          At the present time what placeholders exist in the architecture for 
          multicast treat it as a separate protocol from "unicast" MPLS. 
           
          These topological variations introduce complexity when attempting to 
          instrument OAM applications such as performance management, fault 
          detection, fault isolation/diagnosis, fault handling (e.g. 
          consequent actions taken to avoid raising unnecessary alarms in 
          client layers) and fault notification.         

       7.1.1 Implications for fault management         

          MP2P, E-LSPs and PHP have implications for fault management, 
          specifically if an LSR is required to have knowledge of both the 
          ingress LSR and the specific LSP that an OAM message arrived on, or 
          is expected to have knowledge of, and maintain state about the set 
          of ingress LSRs for an LSP. OAM messaging needs to carry sufficient 
          information to distinguish both the ingress LSR and the specific 
          LSP. (This ability is expressed on these terms as LSPs are typically 
          not given globally unique identifiers, more frequently some locally 
          administered LSR-ID is used).  
           
          Frequently it will not be possible to infer the ingress LSR and 
          specific LSP as such information is lost at merge points in MP2P 
          LSPs or due to a PHP. This is true for both OAM messaging, and 
          normal user plane payloads. There may be numerous reasons why an 
          ingress-egress pair may have a plurality of LSPs between them, so 
          the ability to distinguish the source and purpose of specific probes 
          beyond mere knowledge of the originating LSR is a hard requirement.         

       7.1.2 Implications for performance management         

          Many performance management functions can be performed by obtaining 
          and comparing measurements taken at different points in the network. 
          Comparing ingress and egress statistics being the simplest and most 
          obvious example (but is usually restricted to within a single 
          domain). The key issue is ensuring that "apples-to-apples" 
          comparison of measurements is possible. This means that all 
          measurement points need to be able to similarly classify what they 
          are measuring, and that the measurements are synchronized in time 
          and compensate for traffic in flight between the measurement points. 
           
          For example, a relatively simple technique for establishing key 
          performance metrics would be to compare what was sent with what was 
          received. For example the PPP line quality monitoring (LQM) function 
          the ingress periodically sends statistics to the egress for 
          comparison subject to the same queuing discipline as the user plane 
          traffic, such that traffic in flight is properly accounted for. 
          (Note that re-ordering will introduce errors but is not expected to 
          be frequent, examples of re-ordering situations would be routing 
          changes (e.g. due to protection-switching), or E-LSPs encountering 
          congestion). 
          
            
          Allan et. al.          Expires April 2002                    Page 7 

                        A Framework for MPLS User Plane OAM    November 2001 
           
          It is also important to distinguish, and be able to measure, what 
          constitutes the up and down states of an LSP.  This also needs to be 
          standardized so that there is unified treatment.  A key observation 
          here is that and QoS metrics (like loss, errored packets, delay, 
          etc) are strictly only relevant to when the LSP is in the up-state; 
          and so any collection of QoS measurements is suspended when the LSP 
          enters the down-state.  This is a particularly important metric to 
          operators, since customers will be expecting operators to be able to 
          offer both QoS and availability SLAs, and so these must be 
          differentiated and uniquely measurable         

          Returning to the measurement of QoS metrics, such a simple 
          ingress/egress comparison is not always possible, there is not 
          necessarily the ability to similarly classify what is being measured 
          at the ingress and egress of an LSP. Mp2p LSPs and PHP do not have a 
          1:1 relationship between the ingress and the egress. LSPs containing 
          ILMs that map to multiple NHLFEs introduce measurement inaccuracy as 
          not all packets share a common queuing discipline and where this 
          results in multiple egress points from the network, there is an 
          inability to synchronize measurements.   
           
          So, in addition to having to define up/down-state transitions, for 
          successful PM the 1:1 relationship needs to be restored by either: 
           
          - The mp2p/PHP LSP is modeled as a collection of "ingress" LSPs for 
          measurement. This means that the egress needs to be able maintain 
          statistics by ingress and appropriately classify traffic 
          measurements. In which case the measurement result of common LSP 
          segments could be misleading. 
           
          - The mp2p/PHP LSP is modeled as one LSP for measurement. This means 
          that measurements performed at ingress points need to be 
          synchronized and adjusted for common LSP segments such that the 
          results are all presented to the egress simultaneously (again 
          correcting for traffic in flight), a technique dependent on such a 
          high degree of synchronization would be impossible to perfect, hence 
          prone to a degree of error. 
                      
          Neither of the above is achievable at the present time without 
          modifying existing operational procedures, such as overlaying p2p 
          connectivity on top of a merge/PHP based transport level. 
           
          The existence of E-LSPs adds a wrinkle to the problem of measurement 
          synchronization. An E-LSP may implement multiple diffserv PHBs and 
          incorporate multiple queuing disciplines. An aggregate measurement 
          for the entire LSP sent from ingress to egress would frequently have 
          a small margin of error when compared with an aggregate measurement 
          taken at the egress. Separate measurement comparisons for each 
          supported EXP code point would be required to eliminate the error. 
           
          The situation is slightly different for P2P LSPs containing ILMs 
          that map to multiple NHLFEs. If all the NHLFEs are merged back into 

            
          Allan et. al.          Expires April 2002                    Page 8 

                        A Framework for MPLS User Plane OAM    November 2001 
           
          a single entity prior to the egress, then the upshot is that there 
          will inherently be a degree of measurement error that modifications 
          to operational procedure cannot correct. However there is no 
          guarantee that this will be the case, and any individual ingress 
          measurement may be compared with only one of several egress 
          measurement points (either random or pathological). 
        
       7.2 LSP Creation Method 
           
          The ability to usefully audit the constituent components of an LSP 
          is dependent on the technique used to create the LSP.Presently 
          defined are provisioning, LDP, CR_LDP and RSVP-TE. 
           
          LSP creation techniques that are currently defined fall at two 
          relative extremes: 
           
          At one extreme is explicitly routed point to point connection 
          between fixed ingress and egress points in the network. Explicitly 
          routed (ER) LSPs  (today created via provisioning, CR-LDP or RSVP-
          TE) have a significant degree of testability as the path across the 
          network and the egress point is fixed and knowable to a testing 
          entity. Similarly explicit pairwise and stateful 
          testing/measurement relationships can be set up (e.g. connectivity 
          verification) and strict criteria for failure established. 
           
          At the other extreme is when LSP construction is topology driven  
          (such as dynamic "shortest path first" routing combined with LDP), 
          whereby the details of path construction between the ingress and 
          egress points in the network will vary over time and may involve 
          several stages of multiplexing with traffic from other sources. The 
          details of path construction at any given instant are not 
          necessarily knowable to an auditing entity so any attempt to 
          interpret the results of an audit may generate spurious results. 
           
          The connectivity instantiated in a specific LSP created by a 
          topology driven control plane will recover from many defects in the 
          network. Problems are detected by fate sharing with the constituent 
          physical links and routing adjacencies, and topology driven path 
          re-arrangement will restore the connectivity (with some 
          interruption and other side effects occurring between the initial 
          failure and re-convergence of the network). However the dependence 
          on fate sharing for failure detection means that LSP components may 
          have unique failure modes from which the network will not recover 
          and can only be diagnosed reactively. 
           
       7.3 Lack of Fixed Hierarchy 
          MPLS supports arbitrary hierarchy in the form of label stacking. 
          This is a facility that can be leveraged for OAM purposes. As an 
          example, the section on implications for performance management has 
          already outlined how p2p topology for PM can be overlaid on an 
          arbitrary merged topology to add manageability of services. 
          Similarly functions requiring sectionalization of an LSP or ability 
          to isolate partial failure of a complex construct can be achieved by 
            
          Allan et. al.          Expires April 2002                    Page 9 

                        A Framework for MPLS User Plane OAM    November 2001 
           
          constructing the LSP as an overlay upon a concatenation of 
          operationally significant shorter LSPs. By operationally significant 
          we would refer to LSPs that spanned useful portions of the whole 
          construct (e.g. a branch of an MP2P LSP, or bypassed LSRs that did 
          not have OAM capability). 
           
          This could simplify the instrumentation of level specific OAM by 
          ensuring only e2e functions were required (as opposed to functions 
          originating or terminating at arbitrary points in the network), 
          while driving up the complexity of LSP establishment due to the 
          resultant inter-level configuration issues when creating multi-level 
          constructs with the desired manageability. 
           
       7.4 Use of time to live (TTL) 
       
          Experience within the IP world has suggested that TTL was a 
          serendipitous feature that can most likely be similarly leveraged by 
          MPLS. 
           
          However in the MPLS world, TTL suffers from inconsistent 
          implementation depending on the link layer technology spanned by the 
          target LSP. The existence of non-TTL capable links (e.g. MPLS/ATM) 
          has impact on the utility of using TTL to augment the MPLS OAM 
          toolkit. For example, use of TTL as an aid in fault sectionalization 
          can only isolate a fault to the granularity of a non-TTL capable 
          span of LSH or LSP segments. 
           
          There are other variations in TTL handling that suggest interpreting 
          results of TTL based tests may be problematic. As outlined in [TTL] 
          there are two models of TTL handling with different implications: 
           
          - the uniform model, in which decrement of TTL is independent of the 
          MPLS level. At the ingress point to an MPLS level, the current TTL 
          is copied into the new top label, and at egress is copied back to 
          the revealed top level.  
           
          - the pipe and short pipe models, whereby MPLS tunnels (aka LSPs) 
          are used to hide the intermediate MPLS nodes between LSP Ingress and 
          Egress from a TTL perspective.          

          The uniform model originates with preserving IP TTL semantics when 
          IP traffic transits an MPLS subnetwork. The uniform model will 
          reduce the resource consumption of routing loops, but in a correctly 
          operating network may lead to premature discard of packets outside 
          the operational domain they originated from (due to the existence of 
          an arbitrary number of serving MPLS levels). Similarly when a 
          routing loop occurs, diagnosing the MPLS level that is the source of 
          the problem will be difficult as there is no method to correlate it 
          with the level where the exhaustion event occurred. 
           
          The pipe model is more consistent with the operational domain model 
          in that TTL exhaustion will only occur at a specified level and the
 
            
          Allan et. al.          Expires April 2002                   Page 10 

                        A Framework for MPLS User Plane OAM    November 2001 
           
          initial values used at LSP ingress are more likely to be reflective 
          of detecting what would genuinely constitute a routing loop.  
           
          A reasonable expectation is that the uniform model would not be used 
          outside of an operational domain. 
           
          A separate issue is that it is also possible that an LSR may 
          decrement TTL by an amount other than one as a matter of policy. 
          This means that the results obtained via any tools that use TTL 
          exhaustion will require some interpretation.          

       7.5 Other design issues         

          It is desirable to make the user plane OAM implementations 
          independent of LSP specifics. We do not want to have to define 
          separate transactions/protocols for p2p and mp2p LSPs, PHP or no-
          PHP. Further, if we allowed this then we would have extremely 
          complex relationships (in terms of fault specification/handling) 
          when a fault spanned more than one mode of OAM type.  The OAM 
          application originator should not need (as far as is practical) any 
          knowledge of the details of LSP construction. 
           
          PM may impose certain operational procedures such as the 
          implementation of many OAM applications only being possible for p2p 
          LSPs and will most likely be segregated into only being possible for 
          a select group of levels (e.g. overlaid service labels as per 
          [KOMPELLA] or [MARTINI]).  
           
          Fault management must be applicable across the spectrum of all label 
          levels and LSR transfer functions.  
           
          Finally, the possibility of re-ordering of OAM messaging must be 
          considered. The design of OAM applications and messaging must be 
          tolerant of out of order delivery. For some applications the 
          originator/termination will require a means to uniquely correlate 
          requests with probe responses (including responses to mis-directed 
          probes) or verify in sequence receipt.         

       8. OAM Applications 
       
          The purpose of having user plane LSP specific OAM transactions is to 
          support useful OAM operator required applications. Examples of such 
          applications include: 
           
          Fault management 
           
          - On demand verification: the ability to perform connectivity tests 
          that exercise the specific LSP and the provisioning at the ingress 
          and egress. On demand suggests that verification may be performed on 
          an ad-hoc basis. 
           
          - Fault detection: Operators cannot expect customers to act as fault 
          detectors, and so the ability to perform automated detection of the
            
          Allan et. al.          Expires April 2002                   Page 11 

                        A Framework for MPLS User Plane OAM    November 2001 
           
          failure of a specific LSP is a "must have" feature (although when 
          one reviews the section on LSP creation above, one realizes it will 
          not be ubiquitously used). Some MPLS deployment scenarios may not 
          have a control plane or may have LSP processing components not in 
          common with the control plane, so fault detection procedures may 
          need to be augmented with LSP specific methods. 
           
          - Fault sectionalization: The ability to efficiently determine where 
          a failure has occurred in an LSP.  Sectionalization must be able to 
          be performed from an arbitrary LSR along the path of the LSP. 
           
          - Fault Propagation: specific MPLS deployment scenarios may not have 
          a control plane to propagate LSP failure information. Fault 
          propagation has numerous forms and there are variations depending on 
          whether the failure is in the serving layer/level or : 
          i)  Northbound from the failed level to the management plane.  
          ii)  Within the failed level. 
          iii) From the failed level to its clients. 
          iv)  Within the client level to the LSP ingress and egress either 
          via the user or control planes. 
          And in all cases it is the termination of a layer that performs the 
          function.  
           
          Performance management 
           
          - The ability to determine whether an LSP meets certain goals with 
          respect to latency, packet loss etc.  
          - The ability to collect information to facilitate network 
          engineering  
          decisions. 
           
          Of the above applications, verification, detection and 
          sectionalization explicitly need to exercise all components of the 
          forwarding path of the target LSP, otherwise there will be failure 
          scenarios that cannot be detected or properly sectionalized. These 
          applications cannot be supported properly if there are differences 
          in handling between user traffic and OAM probes at intermediate 
          LSRs. 

       9. OAM Messaging 
       
          OAM should be decoupled from user behavior to ensure consistent OAM 
          functional behavior (under any traffic conditions) and avoid the use 
          of customers as guinea pigs. We consider it to be self evident that 
          providers will require a toolkit that includes some form of user 
          plane OAM messaging.  
           
          At the specific LSP level, support of OAM applications require 
          messages that flow between three entities, the LSP ingress, the 
          intervening network and the LSP egress. As an LSP is unidirectional, 
          it should be self evident that OAM applications that require 
          feedback in the reverse direction will have such communication occur 
          either at the specific LSP level, or some user plane LSP level in 
            
          Allan et. al.          Expires April 2002                   Page 12 

                        A Framework for MPLS User Plane OAM    November 2001 
           
          the operational domain, or one of the other planes (control or 
          management) of the operational domain. 
           
          The set of possible individual transactions (plus examples of their 
          utility) is as follows: 
           
          LSP specific user plane transactions: 
          - ingress to egress 
              applicability: verification, fault detection, performance 
          management 
          - ingress to network 
              message will terminate at an intermediate LSR traversed by   
              the LSP. 
              Applicability: sectionalization from source 
          - network to egress 
              message is inserted into the LSP at an intermediate node     
              and terminates at the LSP egress LSR. 
              Applicability: sectionalization from arbitrary point in an  
                             LSP. 
          - Network to network 
              Applicability: sectionalization from arbitrary point in an  
                             LSP. 
           
          Feedback transactions 
          - egress to ingress 
              applicability: verification, fault detection. 
          - egress to network 
              flow originates at the LSP egress and terminates at   
              an  
              intermediate node traversed by the LSP. 
                 Applicability: sectionalization from arbitrary point in an  
                             LSP. 
          - network to ingress 
              flow will originate at an intermediate LSR traversed by   
              the LSP and terminate at the LSP source. 
              Applicability: sectionalization from ingress. 
          - network to network 
              Applicability: sectionalization from arbitrary point in an  
              LSP. 
               
       10.Distinguishing OAM user plane flows 
        
          MPLS does not currently provide for protocol multiplexing at a 
          specific LSP level. However a requirement still exists to 
          distinguish per-LSP OAM messaging from user payload.  
           
          The options for addressing the identification of OAM flows are: 
           
       10.1 Adding an LSP level with arbitrary label 
           
          OAM flows could be identified by adding an LSP level to the existing 
          LSP using an arbitrary label value that by convention (negotiated at 

            
          Allan et. al.          Expires April 2002                   Page 13 

                        A Framework for MPLS User Plane OAM    November 2001 
           
          LSP establishment) carries OAM payload. Adding an arbitrarily 
          labeled LSP level to multiplex OAM flows will add complexity and 
          additional failure modes that make it an undesirable solution. 
           
       10.2 Adding an LSP level with a reserved label value 
           
          Adding an LSP level to the existing LSP via the use of a stacked 
          reserved label value that explicitly identifies OAM flows. Note that 
          this approach was proposed in [HARRISON-MECH]. Adding an LSP level 
          using a reserved label has a number of virtues: 
           
          - LSP level OAM flows will explicitly exercise the components of the   
            forwarding path. 
          - LSP level OAM flows will not be erroneously forwarded outside the 
          LSP  
            they have been inserted into. The LSP itself may be defective or  
            misrouted (but that is a separate issue) but we have not 
          introduced an  
            additional LSP level for OAM with its own set of possible defects. 
          - LSP level e2e OAM flows are transparent to non-compliant/legacy  
            equipment. 
           
          However hop-by-hop OAM flows are not possible  (although the 
          approach could be augmented via the use of MPLS TTL or the router 
          alert label to gain SOME of the benefits of hop-by-hop messaging). 
          By losing hop-by-hop, the ability to provide flows with preferential 
          treatment when required is also lost. 
           
       10.3 Header modification 
          Modifying the MPLS header or stealing a bit from the label space to 
          permit OAM payload to be uniquely distinguished by all LSRs 
          traversed at the specific LSP level similarly has a number of 
          advantages and disadvantages. We gain hop-by-hop visibility of 
          messaging but at the expense of: 
          - Losing backwards compatibility with legacy equipment.  
          - Losing the ability to ensure we are fully exercising the 
          forwarding  
            path for verification and sectionalization. 
           
          Similarly stealing a label bit for PHP and E-LSPs becomes 
          problematic as the OAM identifier is lost when the label is popped 
          (in the original MPLS architecture, the label was considered to have 
          no informational value past the next to egress LSR). Adding special 
          handling for OAM packets to specifically avoid PHP would no longer 
          exercise all components of the forwarding path. 
           
          The use of a reserved label under the top label would appear to be 
          the approach that has the most utility and least impact on current 
          deployments. 
        
       11.The OAM return path 
       

          Allan et. al.          Expires April 2002                   Page 14 

                        A Framework for MPLS User Plane OAM    November 2001 
           
          The ability to use OAM applications such as single-ended monitoring 
          of both directions from one end, or to support applications such as 
          protection switching in a 1/N:M case, requires the existence of a 
          return path to the LSP ingress. This enhances the scalability and 
          reliability of some OAM applications as initiation need only occur 
          at a single LSR, all further coordination of LSRs exercised by the 
          application being performed by user plane messaging inherent to the 
          OAM application. A specific example being use of a loopback where 
          only place state and timing need be maintained is at the loopback 
          originator.  
           
          This requires a return path to complete the loop between the "target 
          LSP" and the OAM application originator. This will permit reliable 
          transaction flows to be implemented that impose minimal state on the 
          network.  
           
          For the few OAM applications that require a return path, the return 
          path can be tolerant of being topologically disjoint with the target 
          LSP (providing the differential delays are small, ie <<1s), 
          reachability of the application originator being the only hard 
          requirement. Similarly, different OAM applications will have 
          different return path requirements, and a hybrid of using all the 
          planes of the operational domain (according to the application) may 
          be significantly simpler and more operationally tractable than 
          significant modifications to current usage to fill in connectivity 
          gaps at the specific label level. 
           
          This is a key point, LSPs are currently by definition uni-
          directional (bi-directional to date being a construct of multiple 
          uni-directional LSPs), so for any non-ubiquitous deployment of MPLS 
          connectivity, some modification of operational procedure to provide 
          for OAM messaging will be required for the few applications that 
          need it. Strict symmetry of connectivity at a specific label level 
          is not guaranteed. 
           
          In any type of sparse usage scenario (e.g. provisioned LSPs or use 
          exclusively for TE) there will not be an inherent any-to-any 
          connectivity in the user plane, and there may not be a control 
          plane. Additional artificial constructs such as a "reverse 
          notification tree" [CHANG] have been proposed to address this 
          although these introduce additional operational complexity and a 
          requirement for OAM for the OAM connectivity. 
           
          In an implicit MPLS topology (e.g. LDP DU), any to any connectivity 
          will typically exist, or will be easily available with minor 
          alterations to operational procedure (LSRs advertise selves as 
          FECs). This would continue to be true for an integrated model in 
          which TE and an implicit topology were combined. 
           
          In any type of multi-provider MPLS topology, the scenario is more 
          complex, as for numerous reasons a provider may not wish to 
          provision/advertise external connectivity to their LSRs. Similarly, 

            
          Allan et. al.          Expires April 2002                   Page 15 

                        A Framework for MPLS User Plane OAM    November 2001 
           
          for security reasons, providers may wish to apply some degree of 
          policy or filtering of OAM traffic at operational domain boundaries. 
           
          User plane OAM messaging should be designed to leverage as much 
          "free connectivity" as can be obtained in the network, while 
          ensuring the constructs have sufficient extensibility to ensure the 
          corner cases are covered. 
           
          Within the operational domain of a single provider, it is relatively 
          easy to envision that a combination of user plane, and control plane 
          functionality will ensure that a user plane return path is 
          frequently available (although it may be topologically disjoint from 
          the target LSP). This is less so for inter provider scenarios. Here 
          there are a number of potential obstacles such as: 
          - disjoint control plane 
          - disjoint addressing plan 
          - requirements for policy enforcement and security 
          - impacts to scalability of ubiquitous visibility of individual LSRs 
          across multiple operational domains. 
           
          There are a number of approaches to providing inter-domain OAM 
          connectivity, the following is a brief commentary on each: 
           
          1) Reverse Notification Tree (a.k.a using bi-directional LSP) 
          In this method, each LSP has a dedicated reverse path - i.e. the 
          reverse path is established and associated with the LSP at the LSP 
          setup time. This requires binding the reverse path to each LSR that 
          is traversed by the LSP. This method is not scaleable, as it 
          requires doubling the number of LSPs in the network. Moreover each 
          reverse path requires its own OAM. 
           
          2) Global OAM capability 
          Similar to IP v4 to IP v6 migration methodology, this method 
          proposes use of a global operations domain with control-plane, user-
          plane, and management-plane that interact with control-plane, user-
          plane, and management-plane of individual operations domains. This 
          method requires commitment and buy-in from all network operators. 
           
          3) Inter-domain OAM gateway 
          This method proposes use of a gateway like functions at LSRs that 
          are at operations domain boundaries. OAM gateway like functions 
          includes capabilities to correlate OAM information from one 
          operations domain to another and permit inter-carrier 
          sectionalization problems to be resolved.  
           
          Specification of inter-domain OAM gateway capability would appear to 
          be the most realistic solution.  
        
       12.Security Considerations 
        
          Support for intra-provider user plane OAM messaging does not 
          introduce any new security concerns to the MPLS architecture.  
          
            
          Allan et. al.          Expires April 2002                   Page 16 

                        A Framework for MPLS User Plane OAM    November 2001 
           
          Though it does actually address some that already exist, i.e. 
          through rigorous defect handling operator's can offer their 
          customers a greater degree of integrity protection that their 
          traffic will not be misdelivered (for example by being able to 
          detect leaking LSP traffic from a VPN). 
           
          Support for inter-provider user plane OAM messaging introduces a 
          number of security concerns as by definition, portions of LSPs will 
          not be in trusted space, the provider has no control over who may 
          inject traffic into the LSP. This creates opportunity for malicious 
          or poorly behaved users to disrupt network operations. Attempts to 
          introduce filtering on target LSP OAM flows may be problematic if 
          flows are not visible to intermediate LSRs. However it may be 
          possible to interdict flows on the return path between providers (as 
          faithfulness to the forwarding path is not a return path 
          requirement) to mitigate aspects of this vulnerability. 
        
       13. A summary of what can be achieved. 
        
          This draft identifies useful MPLS OAM capability that potentially 
          could be provided via user plane OAM functions. In particular with 
          respect to automatic fault detection and failure handling. This 
          draft suggests that it may be possible to provide this capability 
          for any level in the label stack and across the full set of 
          topological constructs available in the MPLS architecture. This "any 
          level"/"any construct" applicability is a key requirement. 
           
          This draft also identifies that many aspects of performance 
          management are problematic without modifications to operational 
          procedure. Any type of comparative measurement between the ingress 
          and egress requires a 1:1 cardinality, or the ability of the egress 
          to uniquely determine the ingress for each measured unit of 
          communication, something that LSP merge, PHP and possible use of per 
          platform label space at the measured LSP level undermine. Services 
          requiring performance management functionality will not be able to 
          utilize the full set of constructs in the MPLS architecture at the 
          service level.         

       14. References 
        
          [CHANG] Owens et.al., "A Path Protection/Restoration Mechanism  
            for MPLS Networks", draft-chang-mpls-path-protection-02.txt,  
            IETF work in progress, November 2000. 

          [HEINANEN] Heinanen, J., "Directory/LDP Based Ethernet VPNs", 
            draft-heinanen-dirldp-eth-vpns-01.txt, IETF work in progress,
            November 2001        

          [HIERARCHY] Lai et.al. " Network Hierarchy and Multilayer    
            Survivability", draft-ietf-tewg-restore-hierarchy-00.txt, IETF 
            Work in Progress, September 2001 
           

          Allan et. al.          Expires April 2002                   Page 17 

                        A Framework for MPLS User Plane OAM    November 2001 
           
          [ICMP] Bonica et. al. "ICMP Extensions for MultiProtocol Label  
            Switching", draft-ietf-mpls-icmp-02.txt,  
            IETF Work in Progress, August 2000.            

          [KOMPELLA] Kompella et.al. "MPLS-based Layer 2 VPNs",  
            draft-kompella-mpls-l2vpn-02.txt, IETF Work in Progress,  
            December 2000 
           
          [MARTINI]Martini et.al. "Transport of Layer 2 Frames Over  
            MPLS", draft-martini-l2circuit-trans-mpls-06.txt, IETF Work  
            in Progress, May 2001 
           
          [MPLSDIFF] Le Faucheur et.al. "MPLS Support of Differentiated  
            Services", draft-ietf-mpls-diff-ext-09.txt, IETF Work in  
            Progress, April 2001 
           
          [2547] Rosen, E. Rekhter, Y., "BGP/MPLS VPNs", IETF RFC 2547,  
            March 1999 
           
          [HARRISON-MECH] Harrison et.al. "OAM Functionality for MPLS 
            Networks", draft-harrison-mpls-oam-00.txt, February 2001 
           
          [HARRISON-REQ] Harrison et.al. "Requirements for OAM in MPLS 
            Networks", draft-harrison-mpls-oam-req-01.txt, November 2001 
           
          [SWALLOW] Swallow, G. and Goguen, R., "RSVP Label Allocation for 
            Backup Tunnels", draft-swallow-rsvp-bypass-label-01.txt, 
            November 2000  
           
          [TTL] Agarwal, P., and Akyol, B., "TTL Processing in MPLS Networks",  
            draft-agarwal-mpls-ttl-01, October 2001 

        
       15. Author's Addresses 
        
          David Allan 
          Nortel Networks              Phone: 1-613-763-6362 
          3500 Carling Ave.            Email: dallan@nortelnetworks.com 
          Ottawa, Ontario, CANADA 
           
          Mina Azad 
          Nortel Networks 
          3500 Carling Ave.            phone: 1-613-763-2044  
          Ottawa, Ontario, CANADA      Email: mazad@nortelnetworks.com 
           
          Enrique G. Cuevas 
          AT&T  
          Room D3-2B25                 Phone: +1 732 420 3252  
          200 S. Laurel Avenue         E-mail: ecuevas@att.com  
          Middletown, NJ 07748 USA 

           
          Allan et. al.          Expires April 2002                   Page 18 

                        A Framework for MPLS User Plane OAM    November 2001 
           
          Neil Harrison  
          British Telecom              Phone: 44-1604-845933  
          Heath Bank                   Email: neil.2.Harrison@bt.com  
          Iugby Road, Harleston  
          South Hampton, UK 
           
          Sanford Goldfless
          Lucent Technologies
          55 Fairbanks Rd.
          Marlborough, MA 01752 USA    Email: sgoldfless@lucent.com

          Arun Punj
          Marconi Communications
          1000 Marconi Drive, 
          Warrandale - PA - 15086      Email: Arun.Punj@marconi.com

          Marcus Brunner
          Network Laboratories - NEC Europe Ltd.
          Adenauerplatz 6              Phone: +49 (0)6221/ 9051129
          D-69115 Heidelberg, Germany  Email: brunner@ccrle.nec.de

          Chou Lan Pok 
          SBC Technology Resources, Inc.                   
          4698 Willow Road,            Phone: 925-598-1229 
          Pleasanton, CA 94583         Email: pok@tri.sbc.com    

          Wesam Alanqar 
          Sprint 
          9300, Metcalf Ave,           Phone: +1-913-534-5623 
          Overland Park, KS 66212      wesam.alanqar@mail.sprint.com                       


          Allan et. al.          Expires April 2002                   Page 19