S. Barkai Internet Draft G. Stupp draft-barkai-scmp-00 Sheer Networks Expires: 1 July 2003 30 December 2002 Service Centric Management (SCM) Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference mate- rial or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Copyright Notice Copyright (C) The Internet Society (2002). All Rights Reserved. Abstract With the proliferation of IP based internetworking services to mass business and consumer markets carriers are faced with operational challenges in an extent never before experienced in previous LAN MAN or WAN environments. To accommodate these new internetworking envi- ronments, carrier workflow management applications for service Ful- fillment, Assurance, and Billing (FAB) need to undergo major a tran- sition. In essence applications need to transition from being per Barkai [Page 1] Internet Draft December 2002 technology centric to being service centric, as historic correspon- dence between technology and service is no longer true. Applications do not need to change in nature, since they still need to support inventory, CRM, order and other workflows. However, applications need to include a service centric aspect as basis for these work- flows. Service Centric solutions can be shared by a wide range of management and workflow applications, rather than be re-invented by each one. Service Centric Management solutions must be distributed in order to cope with the massive scale and complexity challenges of the underly- ing networks. Service Centric Management Protocol (SCMP) and distri- bution should be standardized to facilitate multi-vendor interoper- ability in this emerging space, allow for potential future embedding within NE, and to allow for Service Centric Management primitives to extend across Inter Carrier Interfaces (ICI) and inner carrier regu- latory bounds. 1. Introduction With the proliferation of IP based internetworking services to mass business and consumer markets carriers are faced with operational chal- lenges in an extent never before experienced in previous LAN MAN or WAN environments. In order to deliver end-to-end services carriers are extending millions of access lines - DOCSIS,DSL, or GPRS, aggregating them using ATM and Ethernet domains, and transporting them over WDM, SONET, and MPLS back- bones. A wide range of tunnelling protocols such as PPP, L2TP, and IP-Sec are being used to map packets in and out of access, edge and core technology domains. For example, millions of mobile wireless data services are implemented using BSC based GPRS, connected to DSL multiplexers, connected to ATM switches, LAC-LNS BRAS aggregators, PE routers, P core routers and opti- cal backbones. In addition SNCDP, GTP, and LT2P tunnels are extensively used across these various domains and thus add to the diversity and com- plexity of the network. To accommodate these new internetworking environments, carrier workflow management applications for service Fulfillment, Assurance, and Billing (FAB) need to undergo major a transition. In essence they need to tran- sition from being per technology centric to being service centric. Typical carrier management and workflow applications include: Barkai [Page 2] Internet Draft December 2002 o Inventory Management o Trouble Ticketing Management o Order Management o SLA Management o etc. Since most services are becoming cross-domain and multi-technology, car- rier applications can no longer relay on the historical correspondence between service and technology. Applications do not need to change in nature, since they still need to support inventory, CRM, order and other work- flows. However, applications need to include a service centric aspect as basis for these workflows. Service Centric Management aspects and basis include: o Comprehensive End-To-End Service Resources Reconciliation o Consistent End-To-End Service Configuration and Activation o Correlated End-To-End Service Monitoring and Detection The transition of carrier management and workflow applications from being Per Technology to being Service Centric poses great challenges. It has been proved exceedingly difficult to implement management systems that include service centric management aspects and withstands the huge pressures of end-to-end network mass and complexity. We claim that there is a set of challenges shared by most management applications. The absence of a general solution causes fragmentation, loose integration, performance compromises, technology stove piping, fragile structures and unprofitable carrier operations. OMS CRM TTM |\ /|/ /| | \/ / \/ | | /\/| /\ | |/ /\|/ \| TDM ATM IP Figure 1: Technology centric integration mesh is expensive and does not reflect IP over ATM over TDM service. Barkai [Page 3] Internet Draft December 2002 2. Service Centric Management Fabric (SCMF) In this paper we suggest that a comprehensive end-to-end service centric management basis over large diverse networks, by a range of carrier man- agement applications, is attainable. We suggest that current compro- mises are not a necessary evil, and that standard Service Centric solu- tions can be shared by a wide range of management and workflow applica- tions, rather than re-invented by each one. We also suggest that shared and standard Service Centric solutions need to be distributed. Just as we don't expect a single server to provide IP connectivity to millions of users (we distribute the task to switches and routers), the task of comprehensive service centric end-to-end func- tions can only be performed by a distributed Service Centric Management Fabric (SCMF). The pressures of end-to-end complexity and scale are simply to great for any centralized entity. The SCMF is therefore realized by multiple Service Centric Management Units (SCMUs), whose number is proportional to the size of the network. By employing Service Centric Management Protocol (SCMP) the SCMUs form a scalable-shared SCMFabric. The SCMF is not intended to be a FAB management application. Rather, it offers a set of primitives that can be used by a wide range of manage- ment applications. By offering these primitives the SCMF decouples the overwhelming burden of network size and complexity from management applications. In a reference SCMP implementation existing management systems such as trouble ticketing, CRM, inventory, and order management systems where successfully integrated. Moreover, 3rd parties developed new and unan- ticipated applications as well, taking advantage of the same SCMF primi- tives. Any standard, application oriented, north bound API such as TMF 608/513 can be used to access the SCMF primitives. Further discussion on this topic will be done in a separate paper. South bound manager-agent ori- ented API are used between the SCMF and the network elements. These can be standard SNMP and/or CLI commands used extensively today. OMS CRM TroubleTM InventoryM SLAM .. | | | | | ______________________________ | SCMF | |____________________________| | | | | | BBand TDM ATM IP VoXX Barkai [Page 4] Internet Draft December 2002 Figure 2: Service Centric Management Fabric. North clients server primitives API. South manager agents FCAPs API. 3. Service Centric Management Primitives The SCMF offers a set of primitives that support a wide range of service management and FAB applications, existing and new ones. It is expected that additional primitives will be evolving as time progresses and addi- tional application bottlenecks can be factored into the distributed SCMF. To date we have identified the following primitives: Reconciliation - The discovery and up-to-date knowledge of all existing network resources and services configured in the Network Elements (NEs). This includes: o All NE interfaces such as Ethernet, SONET, ATM, IP etc. o Interface stacking such as IP over ATM over SONET over WDM over fiber, or IP over HDLC over fiber. o All forwarding entities such as multiplexing, switching, and rout- ing, dot1D, BGP, NNI etc defined over the interface stacks. For example, bridging between 5 stacks, routing between bridges and 2 other stacks. o All cross NE relationships in the network, e.g., far-end interfaces and next hop routing or bridging, in every interface and every layer. We call this generalization of the physical topology the interoperability topology of the network. Reconciliation is a continues process as it needs to reflect in near- real-time HW and SW configuration changes in the network. Activation - The design, assign, and configuration of differentiated services in the network. Service profiles may specify any aspect, stan- dard or proprietary, of any interface and any forwarding entity, in any of the NEs along the calculated path. The calculation of optimal path and service profile constraints is performed in near-real-time. Activa- tion can be invoked in parallel in multiple locations in the SCMFabric. Integrity must be maintained especially in cases where multiple Barkai [Page 5] Internet Draft December 2002 activated paths are crossed asynchronously. Detection - The sensing of network operation conditions. The root isola- tion of a malfunction, disconnection or overload failures. Data path tracing of the fault to its side-effect and propagation to impacted ser- vices. The suppression of side-effect alarms and bundling of ripple impacts with the root failure without manual rule programming or intro- duction of network foreign computation at the SCMF level. 4. Service Centric Management Units (SCMUs) SCMUs play mostly an administrative roll in the SCM solution. Their main function is to hold a set of SCMAgents, which communicate via SCMP within the SCMU or between the SCMUs. The SCMU main task is to support the SCM hardware workstation platforms, allow for administrative IP or IP-Sec connectivity and allow for SCMF general administration. In reference implementations the SCMUnits are extremely important for maintaining solution's software versions, security, density, and redun- dancy. These mechanisms will be discussed in a separate context. In the reference SCMP implementation we bundle about 1000 SCMAgents per SCMUnit covering about 100K ports per unit. In reference implementations we also find it best to limit direct IP or IP-Sec SCMU administrative connectivity to about 10 SCMUnits (about 10K NEs, or 1M ports). Beyond this number it is best to have "Core" SCMUs relay SCMP messages to "Edge" SCMUs. For example for a 10M port solution it is best to mesh 10 core SCMUs each connected to an edge group of additional 10 SCMUs, and so on. North client-server interface | _|____ _______ ____________________ | | | | | | | SCMA SCMA | | SCMU | | SCMU | | SCMU | |______| |______| |___________________| | | South manager-agent interface Figure 3: SCMFabric Barkai [Page 6] Internet Draft December 2002 5. Service Centric Management Agents (SCMAs) The SCMAgents are the logical makings of the SCMFabric. Each SCMAgent may cover a single network element, where legacy domain managers may function as an SCMA for the entire domain. The SCMA coverage is for the purpose of end-to-end service management functionality only and does not need to cover the complete NE device management functionality. SCMAs are also referred to as autonomous agents since like routing agents their primary communication is with other similar SCMAgents and not with a manager. The SCMAgents have the following states while operating as part of the SCMFabric: o Initialization State - When the (right) SCMAgent initializes, it uses standard CLI and/or SNMP to learn the interfaces, stacking, and forwarding configuration of the NE. (right SCMAgent determined by SCMU). This step is repeated whenever a configuration change is detected. Each interface is assigned a SCMXID, which must be unique within the entire SCMFabric. In the reference SCMP implementation we use an algorithmic method to produce SCMXIDs and we use a graph for representing interface, stacking, and forwarding NE relationships in the SCMAgent. This implementation is optional. _ _ ------|R|-|_| | | _ _ _ _ |_|-|B|-|_| |_| | | | _ _ _ |_| |_| |_| Figure 4: Example SCMA representation of Bridging 2 stacks, and routing bridge to 3rd stack o Acquaintance State - after it initializes, the SCMA is part of the administrative network of the SCMFabric. However, in order to Barkai [Page 7] Internet Draft December 2002 function and help perform most of the SCMF primitive functions, it also needs to become part of the interoperability network of the SCMFabric. The interoperability network of the SCMFabric follows the interoperability topology of the various managed network lay- ers. To become part of the interoperability network, the SCMAgent com- putes a unique signature for every interface that may have a peer in another network element, and as consequence, in a peer SCMA. There may be multiple signature types (SCMSigT) for any given type of interface (IANA). The computation of signatures may involve pro- prietary intellectual property that needs to be shared by interop- erating manufacturers of SCMus. Trivial signatures such as local and remote interface addresses carried by ILMI, BGP, or CDP will be specified in a different con- text. After SCMXID, SCMSigT, and SCMSigValue are published (on the admin- istrative SCMF Network) and discovered by a remote SCMA, the SCMA- gent becomes topologically connected to the acknowledging remote SCMA, and joins the interoperability SCMF Network. This connection holds until network conditions change or interfaces in the NEs are disabled, removed or disconnected. Each SCMA may be connected to single/multiple SCMAs via single/mul- tiple interfaces and SCMXIDs. _ _ _ _ _ ------|R|-|_| ................... |_|-|R|-|_| | | | | _ _ _ _ _ _ _ _ _ |_|-|B|-|_| |_| |_|-|S|-|_| |_| |_| | | | | | | | _ _ _ _ _ _ _ |_| |_| |_| ... |_| |_| ... |_| |_| Figure 5: Example 3 SCMAs Acquaintance(doted) Brouter, Switch, Router o Flow State - In this state the SCMA reacts to messages received via the interoperability and administrative links from the agent to the Barkai [Page 8] Internet Draft December 2002 rest of the fabric. The SCMA is expected to receive flow messages, infer their local significance and local service management required actions, and terminate or pass it to single/multiple peer SCMAs. flow>> <SCMA <---------------------->SCMA | Interoperability Network| | | Administrative Network | | | | | .........SCMF ........................SCMF Figure 6: SCMAs in the administrative and interoperability networks. Activation, detection, and reconciliation queries generate multiple types of flows across the SCMFabric. For example an activation flow of find-best-path may be received by multiple SCMA interfaces and routed to other SCMAs with a local calculation. Similarly a detection flow may be received indicating a primary or secondary fault in a peer SCMA, impact further calculated by local SCMA, and terminated or routed to other SCMAs. Specific flows, as well as states that avoid loops or perform split horizon or other routing methodologies, are discussed separately. Activation flows are typically initiated by SCMAgents covering the end-points of a service. These flows run through the SCMF core along the topological links between SMCAgents. They will typically include a find stage and than reserve and commit stages. Detection flows are typically initiated from the SMCAs sensing a root fault and are propagated up the stacks, across forwarding entities to Peer-Agents and finally to the end-points of the impacted service. They typically include a verification, expand and regress stages. Detailed discussion of flow states and message formats will be done separately. Experience in reference SCMP implementation shows that activation flows, due to their parallel and non blocking nature, have managed to reach the full capacity of the network to absorb configuration. Likewise, detection flows yielded 1000s to 1 ratio of suppression between root fault and impacted services and side- effects. Again, due to their parallel nature, they have been able to absorb all the events the network can generate. Barkai [Page 9] Internet Draft December 2002 _ _ _ _ _ Service <<<<<|_|<|_| |_|>|_|>|_|>>>Service | ^ ^ | _ _ _ _ _ _ _ _ _ |_|-|_|-|_| |_| |_|-|_|-|_| |_| |_| | | ^ | | ^ | _ _ _ _ _ _ _ |_| |_| |_|<<<<<|_| |_|<|_| |_| Figure 7: Example SCMA Operation in SCMF x Detection Flow _ _ _ _ _ Service >>>>>>|_||_| |_|>|_|>|_|>>>Service ^ V ^ | _ _ _ _ _ _ _ _ _ |_|>|_|-|_| |_| |_|-|_|-|_| |_| |_| ^ | V | | ^ | _ _ _ _ _ _ _ x>>|_| |_| |_|>>>>>|_| |_|>>>>|_| |_| Figure 8: Example SCMA Operation in SCMF Activation x Flow 6. Service Centric Management Protocol Messages SCMP messages are received by the SCMAs from other SCMAs on the SCMF interoperability network formed by acquaintances. SCMP messages are also received by SCMAs on the SCMF administrative network. The latter are typically queries and flows initiated by the SCMF application server adaptation. In order not to limit the evolution of existing queries and flows and creation of new types of queries and flows, as well as not to impound the autonomous nature of each SCMA behavior, at this stage a relatively broad SCMP message frame is laid out. SCMP Message{ SCMXID source SCMXID dest SCMXFT flow type SCMXFT flow subtype SCMXVB variable bind list } Barkai [Page 10] Internet Draft December 2002 When flowing over the SCMF topological network the source and dest field are required to correspond to the SMCXID of the acquainted interfaces of the SCMAs. These are recognized by the SCMF transport and delivered to right SCMAs. In order not impose entire reference implementation of SCMP prior to further analysis flow types will be loosely described. o Flow type: Acquaintance. Subtypes: Publish, Acknowledge, Hand-shake. SCMA required to: - Publish local SMXID Interfaces Signatures. - Acknowledge matching Signature with local interface SCMXID. - Hand-shake and acknowledgement to establish acquaintance. o Flow type: Reconciliation Query Subtypes: Relational expression. SCMA required to: - Perform local part of query - Adjust and augment variable binding - Pass to other acquainted SCMAs o Flow type: Activation Subtypes: Find, Reserve, Commit SCMA required to: - Address service profile and constraints in variable bind list - Forward augmented path metrics with added local costs - Forward augmented path metrics with local reservation - Forward augmented path commits after local NE configuration o Flow type: Detection Subtypes: Verify, Expand, Retract SCMA required to: - Verify sensed flow not side-effect with acquainted SCMAs - Expand impact propagation to acquainted SCMAs - Retract back to root retracting flow form end-points Further discussion on SCMP message syntax and semantics will be done after further standard analysis. 7. Service Centric Management Fabric Benefits By supporting the above primitives in a distributed manner the SCMF sig- nificantly improves the performance, and simplifies the integration of Barkai [Page 11] Internet Draft December 2002 service management applications. Operations done centrally today at sequential S-Time, are performed by the SCMF in parallel P-Time. The theoretical asymptotes for network of size N with proportional activation and fault rates are: o Reconciliation: From S-Time N Log N , to P-Time Log N. o Activation: From S-Time N Log N, to P-Time Log N. o Detection: From S-Time N Log N to P-Time Log N. In practice, because of high log basis (log N reflects the number of hops in the network, which is usually 4-8) and structural constraints of existing centralized systems it follows that the performance improvement is much higher. In practice we get: o Reconciliation N Resources: S-Time N Square, Fixed P-Time. o Activation N Services: S-Time N Square, Fixed P-Time. o Detection N Impacts: S-Time N Square, Fixed P-Time. As an example, verification of a VPN with 2000 sites for connectivity and leak security (each peer test 1 sec.) is completed by the reference SCMF within minutes while a naive centralized test may take days. The additional benefit is the partition of mass complexity into a set of well defined and contained smaller problems with simpler solutions that emerege into a service wide solution. 8. Service Centric Summary In today's networks, where services are not single technology based and their number rises to millions per network, new standard based protocol can alleviate significant operational pain-points and allow for a shared distributed solution to till now unsolved problems. The standardization process is aimed to build detailed supported indus- try basis for a distributed service centric management methodology. The Service Centric Management Protocol (SCMP) should be standardized to facilitate multi-vendor interoperability in this emerging space, to allow for potential future embedding within NE, and to allow for Service Barkai [Page 12] Internet Draft December 2002 Centric Management primitives to extend across Inter Carrier Interfaces (ICI) and inner carrier regulatory bounds. The SCMP framework adopts scalable concepts from network control planes such as, "Hello" routing packet, "RSVP" activation packets, and OAM cell flows. The SCMP applies these methodologies to generate uniform fabric for end-to-end service management that acts, "thinks", and scales like a network. The outcome is therefore capable of dealing with the huge operational challenges that today's networks cre- ate. 9. AUTHOR ADDRESS Sharon Barkai Sheer Networks Azrieli Center I Round Tower 132 Petach Tikva rd. Tel Aviv, 67021 ISRAEL Tel. +972-3-6074111 Fax. +972-3-6074112 EMail: Sharon_Barkai@sheernetworks.com Barkai [Page 13]