Internet Draft Andy Bierman Cisco Systems, Inc. 21 June 2002 Network Management Observations Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 [RFC2026]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Distribution of this document is unlimited. Please send comments to the SMIng WG mailing list . 1. Copyright Notice Copyright (C) The Internet Society (2002). All Rights Reserved. 2. Abstract This memo contains observations on the progress of standards based network management efforts within the IETF. In particular, it describes Internet Draft Network Management Observations June 21, 2002 technical and process oriented deficiencies which have delayed the creation and adoption of standards for network device configuration, and offers recommendations for correcting these deficiencies. 3. Table of Contents 1 Copyright Notice ................................................ 1 2 Abstract ........................................................ 1 3 Table of Contents ............................................... 2 4 Overview ........................................................ 2 5 Observations .................................................... 3 5.1 Why CLI is So Successful ...................................... 3 5.1.1 Importance of ASCII ......................................... 4 5.1.2 Affects of Console Management ............................... 4 5.1.3 Command Oriented Transaction Model .......................... 5 5.1.4 Scripting Tools ............................................. 5 5.2 Why SNMP Isn't Working For Configuration ...................... 6 5.2.1 SNMP Set Isn't Simple ....................................... 6 5.2.2 Management Data Definitions ................................. 7 5.2.3 Transport Protocol .......................................... 7 5.3 How We Can Improve the NM Standards Process ................... 8 5.3.1 Increase Customer Input ..................................... 8 5.3.2 Recognize The Diverse Customer Base ......................... 9 5.3.3 Speedup the Standards Process ............................... 9 5.3.4 Reduce Uncoordinated Standards Efforts ...................... 10 5.3.5 Increase Configuration Standards Awareness .................. 11 5.4 Recommended Strategic Technology Initiatives .................. 11 5.4.1 New Version of the SMI ...................................... 11 5.4.2 Management Protocol Enhancements ............................ 11 5.4.3 Network Wide Management ..................................... 13 5.4.4 More High Level Management Features ......................... 13 6 Acknowledgements ................................................ 14 7 Normative References ............................................ 14 8 Security Considerations ......................................... 15 9 Author's Address ................................................ 15 10 Full Copyright Statement ....................................... 16 4. Overview There has been a great deal of progress in the last 14 years in the area of standards based network monitoring. The Simple Network Management Protocol (SNMP) has matured and been widely deployed for this purpose. A significant number of managed objects have been added to the Management Information Base (MIB) over time as well. Expires December 21, 2002 [Page 2] Internet Draft Network Management Observations June 21, 2002 However, there has not been significant advancement in the development of standard management technologies for network device configuration over the same time period. Instead, network devices are predominately configured via a proprietary command line interface (CLI). There are several factors contributing to this situation, not all of them technical issues. It is not possible to identify a single problem area responsible for this lack of progress, and therefore not possible to identify a simple fix to the problem. It is also not useful to rank the importance or damage caused by specific problems. This memo contains observations on the positive and negative aspects of the standards based technology and processes developed within the IETF for network management, and offers some recommendations for improvement in this area. It focuses on CLI and SNMP technology, since this represents the vast majority of the deployed solutions for configuration management of network devices. It also focuses on network device configuration, since this is the most crucial aspect of network management that is poorly addressed by standards based solutions. 5. Observations There are several contributing factors preventing the advancement of standards based configuration. These observations are divided into four sections: 1) Why CLI is so successful 2) Why SNMP isn't working for configuration 3) How we can improve the NM standards process 4) Recommended strategic technology initiatives 5.1. Why CLI is So Successful The most prevalent technology used for the configuration of network devices is the proprietary CLI. The most obvious reason for this fact is "Because it's there". Operators need a consistent API that will always be available, which is located in the device, and which exists from the very first release of a feature. CLI has maintained its top position, even though it is proprietary. It is used as a human interface and a programming interface, even though as an API it is relatively unstable. There is no consistent data model, no coherent change control process, no well structured documentation, no consistent error codes. It lacks all of the features that a supposedly successful API (such as SNMP) offers, yet it is still the most widely Expires December 21, 2002 [Page 3] Internet Draft Network Management Observations June 21, 2002 used approach by operators. Obviously, the positive attributes must outweigh the deficiencies. So what is CLI doing right? Some of the reasons include: - ASCII encoding - similar to unix shell - session oriented (over TCP) - always available, especially out-of-band - relatively easy for vendors to implement - command-oriented, high-level transaction model 5.1.1. Importance of ASCII The ASCII encoding of CLI commands is significant. CLI commands can be collected together and maintained in a simple text file. This provides a self-documenting, concise representation of the desired state of the networking device. This file (or portions of it) can be easily edited, emailed, searched, compared to other configuration files, cut-and- pasted, and transferred to and from the CLI parser. An operator can perform all these functions with mature, easy-to-use, cheap (or even free), widely available software tools. This is important, especially since there is a natural tendency in the operator community avoid learning new tools, buying expensive tools, trusting 'closed format' tools, and relying on external management devices which will not be available if the inband communication to the managed device fails. 5.1.2. Affects of Console Management During initial device configuration and during network outages, an operator needs to examine device state and change device configuration via an out-of-band connection. This important task requires significant training and an advanced skill set. Traditionally, operators connect a dumb terminal directly to the console port and interact with the device via a human friendly CLI. Because the training costs involved in maintaining network devices are significant, it is undesirable to train operators to use different toolsets for inband and out-of-band management. There is also a desire to limit reliance on complex tools beyond the simple ASCII processing software and hardware traditionally used to manage a device via the console port. The comfort level with CLI, which is oriented for human interaction, is impacted by the type of customer. The need for scalable and automated tools for management increases as the tolerance for service disruption Expires December 21, 2002 [Page 4] Internet Draft Network Management Observations June 21, 2002 risk decreases, and as the number of managed devices increases. 5.1.3. Command Oriented Transaction Model The CLI uses a command oriented transaction model, as opposed to the data oriented transaction model used by SNMP, and operators seem to prefer this higher level abstraction. A CLI parser acts upon one command at a time, and each command must be complete. One downside is that there is no proper distinction between a 'create' command and an 'edit' command, so the notion of a complete command varies in relation to the current state of the managed device. For 'create' commands, all mandatory parameters must be present or the command will fail. However, the same command may also be used to edit existing state, in which case only the parameters present will be changed (as expected). Vendors like this simple transaction model because there are relatively few corner cases to support, no complicated rollback to support, and no dribble-in input which creates temporary intermediate state to maintain. While the command line represents the lowest level transaction, a higher layer abstraction is also used by operators. It is recognized that one or more commands are required to complete the state change for an arbitrary feature. This ordered list of commands can be called a 'configlet'. One or more configlets make up a configuration file, and the configuration file represents the entire desired configuration state of the managed device. Usually, the configuration file contains a section for each network interface on the device. This transaction model and file structure is generally supported by all vendors, even though the command syntax varies for each vendor. 5.1.4. Scripting Tools There is widespread use of scripting tools, which interact with the CLI interface, for the purpose of automating configuration management tasks. The problem with this approach is the inherent instability of the CLI interface, which is designed for humans. Change control for proprietary CLIs is not as rigorous as the change control applied to MIB definitions (either proprietary or standard). There is also no expectation of reusability of such scripts across vendors, or sometimes even different products from the same vendor. There are significant advantages to scripting tools however. Once an operator learns a particular scripting toolset, such as Perl or Expect, there are few additional training costs beyond the knowledge of specific CLI dialects, which the network operator must know anyway in order to manage the device from the console. Even when a script breaks due to Expires December 21, 2002 [Page 5] Internet Draft Network Management Observations June 21, 2002 instability of the CLI from one version to the next, it is often a trivial matter to repair the script. Also, these scripts operate on the same commands whether inband or console port access is used, so they can be utilized even when the device cannot be reached inband. It is desirable to somehow apply the rigorous change control found in MIBs, as well as the common semantics offered by MIBs, to the scripting tool environment. Other common features, such as consistent error codes, consistent CLI parser behavior, and a core set of common parser commands would also greatly improve the usefulness of scripting tools. 5.2. Why SNMP Isn't Working For Configuration Although SNMP is being used for configuration for some features and platforms, it is not widely used for general configuration of network devices. Even still, it is the most widely used technology except proprietary CLI for this task. There are a number of factors why SNMP has not replaced CLI as the primary technology for configuration. It has been said that the reason SNMP is not used for configuration is that it lacks security. Although this is a valid concern, it turns out that security is one of several pre-requisites for widespread adoption. It may not even be the most important reason. It should be noted that until recently, CLI used telnet as an application protocol, and passwords were passed in cleartext, just as community strings are passed in all versions of SNMP except SNMPv3. Perhaps if SNMPv3 had been completed before the widespread adoption of SSH, its use would be more widespread, but that did not happen. 5.2.1. SNMP Set Isn't Simple Contrary to the name, development and maintenance of SNMP agent and application technology is not simple, compared to CLI. The primary reason is that the protocol state machine for Set operations is much more complicated than for CLI. The protocol allows for arbitrarily complex transactions, in which multiple partial and unrelated rows may be passed to the agent, and the agent is expected to complete these transactions (as best it can) in an all-or-none fashion, and as if simultaneously. Instead of the command as the basic transaction unit, a single parameter to a command is the basic transaction unit. This unwarranted complex state machine is much more difficult to design and test than comparable CLI code accomplishing the same task. To make matters worse, there is no distinction in the protocol between create, edit, and delete operations. Instead these protocol operations Expires December 21, 2002 [Page 6] Internet Draft Network Management Observations June 21, 2002 are encoded as data, represented by a RowStatus MIB object in each row of a configurable table. There are other reasons SNMP application development is more complicated than CLI scripting tools, such as a lack of high-level elements of procedure for configuration. MIBs do not consistently specify the creation order and other high level operational requirements to complete a given functional task. CLI documentation tends to be much more task oriented, and devices are expected to support a specific procedure to complete a given task. 5.2.2. Management Data Definitions The Structure of Management Information (SMI) used within the IETF for MIB definitions is partly responsible for the high cost of SNMP development. The only mechanism for data reuse in the SMI is the refinement of base types (textual conventions). The data structure definition capability is limited to simple arrays of scalar data types. The expression of complex data structures is accomplished by creating associated arrays of scalar objects, connected by common index components. This practice leads to data structures which are difficult to read and difficult to implement. This is especially harmful for configuration, because the additional complexity of multiple writable tables causes creation order dependencies. 5.2.3. Transport Protocol SNMP messages are transported over UDP, which has traditionally limited the maximum message size to an unreasonably small number. This has impacted MIB and application design. The 'createAndWait' RowStatus state exists primarily to cope with this limitation, allowing configuration creation operations to be split across two or more transactions. MIBs are also designed with this size constraint in mind. Individual objects are sometimes arbitrary sized so a single instance will always fit in a 484 octet payload. UDP does not guarantee that packets will be delivered in order, or that only one copy of of the packet will be delivered. This adds unwarranted complexity to the design of MIBs, applications, and agents. Application design is also more complicated because retrieval of large (high-level) objects, such as an entire routing table, is achieved with several GetNext or GetBulk transactions. It is costly and inefficient to handle fragmentation and reassembly in the application layer. If Expires December 21, 2002 [Page 7] Internet Draft Network Management Observations June 21, 2002 large messages were supported (over TCP), then application design would be simplified. New solutions oriented protocol operations to handle retrieval of dynamic tables, such as a copy-then-transfer operation, would also simplify application design. There is a cost to be realized in the agent for this simplicity in the application, but the cost is worth the potential benefits. 5.3. How We Can Improve the NM Standards Process There are several steps that can be taken to improve the effectiveness of network management standards. 5.3.1. Increase Customer Input For a variety of reasons, the standards development process for network management has failed in recent years to adequately solicit and respond to the needs of customers, which include the engineers who develop networking products and the operators who use those products. Software developers have complained that SNMP agents are too costly to develop and maintain, relative to CLI technology. Application developers have complained that a lack of a consistent data model and clear elements of procedure make it difficult to write common code that works for multiple platforms. Network operators have simply stopped paying attention to SNMP standards for configuration, and few enable SNMP writes in their network. The result is that SNMP based software tools for configuration are not widely available, nor are they widely deployed by network operators. Over the years, operators continued to use proprietary CLI based tools, and SNMP standards writers proceeded to create technology without much input from these customers. Other standards efforts (besides SNMP) exist in this space, but nothing that even approaches the deployment level of SNMP based tools for network configuration. It is possible that other standards for network management could be more successful than SNMP, but none have succeeded so far. It is therefore most important that SNMP standards writers make a serious effort to get the network operators involved in the standards process, and be willing to re-think assumptions about product requirements formulated in the absence of customer input. Expires December 21, 2002 [Page 8] Internet Draft Network Management Observations June 21, 2002 5.3.2. Recognize The Diverse Customer Base There is a tendency to characterize the needs of network operators as if they were a single like-minded group, with a single set of priorities and requirements. This is far from the truth. Network operators can roughly be divided into two groups, consisting of Enterprise and Service Provider customers. However, these classifications are less and less useful as enterprise networks get larger and service providers offer more application oriented services. It may be more useful to classify members of this diverse group with a continuum related the their tolerance for risk. At one extreme are large Internet backbone operators who tend to utilize a relatively homogeneous set of products, and are very concerned about even the smallest network device outage, since a single device affects a large number of end users. At the other extreme are small or medium size enterprise operators, who tend to utilize a relatively heterogeneous set of products and are less concerned about the outage of a single device, since a single device affects a relatively small number of users. Another way customers can be differentiated is by the number of managed devices controlled within a single administrative domain, which affects their need for scalable management solutions. At one extreme there are backbone operators managing a relatively small number of core routers, and at the other extreme are broadband service providers managing thousands of similar devices, such as cable modems. The impact of this diversity is that there are significant differences in their configuration requirements, and it not optimal (or possible) to satisfy these requirements with a single one-size-fits-all solution. We should not expect that all management features will be utilized by all operators. 5.3.3. Speedup the Standards Process The process by which standard network management interfaces are created is non-optimal. There has been a great reluctance to create mechanisms for configuration (such as writable MIB objects). There are two main reasons for this phenomenon: Standards are hard It is often very time consuming to get a working group to agree on the management model and then the manageable knobs that should exist for a given technology. Such efforts are often neglected Expires December 21, 2002 [Page 9] Internet Draft Network Management Observations June 21, 2002 altogether or deferred for a future release. Delays in development of standards mean that proprietary solutions will be created by vendors in the interim. Once a vendor (or customer) application developer has created tools which utilize this interim proprietary solution, there is little incentive to devote additional resources to replace the proprietary mechanisms with standard mechanisms later. This tends to defer the creation and deployment of the standard mechanisms forever. Low expectations for standard solutions There is often little to no customer expectation that a standard management interface be created for a given technology. There is the perception that development of such an interface will delay products, and such development can be omitted or done later. However, delays in standard mechanisms often mean they will not be deployed (see above). 5.3.4. Reduce Uncoordinated Standards Efforts One reason for the difficulty in creating management applications is the lack of a overall coordinated effort across working groups. This is partly due to the lack of reusability features in the SMI, and lack of a coherent data model, but it is also due to a lack of a mindset that stresses and enforces reusable definitions. This attitude can be partially attributed to an understandable reluctance to complicate or expand a working group's specific charter to include more generalized solutions. It is also related to the reluctance of a working group to place documents on the standards track that contain normative references to other documents outside the direct control of the working group. There are also no efficient mechanisms available to allow working groups to investigate and track documents in other working groups. It is difficult for individuals to follow the large number of Internet Drafts produced each year. There is no centralized authority that actively attempts to coordinate efforts which are related, or rigorously require working groups to leverage outside work. In addition to technical improvements to increase reusability, the IETF needs to improve the standards process to better facilitate development of reusable management interfaces. Tools which allow people to easily determine the MIB modules exist that are relevant to a particular subject (e.g., a MIB module navigation tool) are needed. Expires December 21, 2002 [Page 10] Internet Draft Network Management Observations June 21, 2002 5.3.5. Increase Configuration Standards Awareness The most important change that can occur is for individual working groups to create configuration management interfaces (e.g., writable MIB objects) as early in the standards process as possible. This will require increased effort and dedication, but the end result will be an increase in the quantity and quality of management applications. 5.4. Recommended Strategic Technology Initiatives The following technology initiatives are suggested to improve the quality of IETF standards for network configuration. They hopefully build on the strengths of CLI and SNMP-based network management. 5.4.1. New Version of the SMI The SMING working group is actively attempting to improve the capability, readability, and reusability of the SMI by introducing hierarchical object naming and aggregate object definitions. For many years, the IETF has attempted to define MIBs in a manner that did not suggest any particular implementation strategy. This resulted in data definitions that are too abstract, which do not correspond well to the data structures actually present in any implementations. This practice should end, and common data structure constructs, such as arrays, structures, and unions should be introduced into MIBs. Nesting of aggregate data structures is also important to increase the usefulness of MIBs. It also also important that high level elements of procedure are carefully considered and explained in MIB documents. If possible, a single mandatory high-level procedure should be documented for each configurable feature. Agent implementations should not be prevented from supporting additional high-level procedures. (E.g, a mandatory procedure may call for a create-and-go type of operation, but this does not prevent an implementation from supporting a create-and-wait type of operation as well.) 5.4.2. Management Protocol Enhancements A new management protocol is needed, which leverages as many features of SNMP as possible, while providing new capabilities. The following new features should be included: Expires December 21, 2002 [Page 11] Internet Draft Network Management Observations June 21, 2002 TCP for the Transport Protocol A session-oriented transaction model and large message sizes can be more easily supported if UDP is replaced with TCP as the transport protocol. Transfer of Aggregate Objects There is a need to move large aggregate objects in addition to the individual MIB objects that SNMP can manipulate, such as the aggregate data structures that will be possible with the new SMI improvements. SubTree Oriented Bulk Retrieval The current GetBulk protocol operation is not optimized for the retrieval of a particular MIB sub-tree. New operations to retrieve the specified (GetSubTree) or the lexinext subtree (GetNextSubTree) are needed to allow an application to more easier transfer a large group of MIB objects. These special operations are also needed in a TCP environment, in which data is streamed back to the application, instead of waiting for a new 'start' condition after every packet. XML Encoding The new protocol should utilize XML encoded messages, rather than ASN.1/BER encoded messages. XML can be easily parsed by humans and machines. XML tool support for a number of disciplines is growing rapidly, while ASN.1/BER is not used in any application except SNMP. Replace RowStatus with Protocol Operations The RowStatus mechanism should be replaced with explicit protocol operations for create, edit, and delete. This can simplify agent and application implementations. There is also a growing consensus that RowStatus is too complicated. It is common for agent developers to support a minimum set of RowStatus enumerations (e.g., createAndGo and delete). As aggregate data objects are introduced into the SMI, use of RowStatus for configuration will get even more complicated. Error handling code may benefit from distinct create and edit operations. PDU Chaining It would be beneficial to create complex protocol operations from two or more simple operations, which can be thought of as 'PDU chaining' or 'chunk' operations. This would allow for a transaction model similar to that used in a CLI. Instead of requiring a series of packets, or requiring the agent to process a large number of varbinds at once, configuration operations can be treated as an Expires December 21, 2002 [Page 12] Internet Draft Network Management Observations June 21, 2002 ordered sequence of distinct, but related commands. Rollback and corner case logic would be easier to design if the transaction model was enhanced in this manner. PDU Chaining Options for All-or-none, Stop-on-error Complex operations such as 'delete, then recreate' could be constructed in a single packet, using chained PDUs with certain options. An all-or-none option would require the agent to validate all chained protocol operations before performing any of them (this is a best-effort, not foolproof validation). Another option to stop (or continue) on any chained protocol operation error would also be useful to emulate CLI transaction behavior. Query Transaction A protocol operation which provides an SQL SELECT type of operation is needed. Simple GetNext or GetBulk operations require that all possible search criteria are accounted for, and all criteria must be present in INDEX components. More complex search criteria (e.g., return ifInOctets and ifOutOctets for all interfaces that have ifOperStatus=='up(1)' and ifType=='fastEther(62)') are needed to provide faster retrieval, especially for managed devices that contain a large number of interfaces or other common monitored attributes. 5.4.3. Network Wide Management Tools are needed which allow an operator to manage groups of devices as a single logical entity or manage a network wide service in a single operation, instead of a series of steps on multiple forwarding devices. There may be some standards work needed to facilitate this type of technology, but an important first step is to reduce the complexity and diversity of the management operations for an arbitrary device or service. Concurrently, steps should be taken immediately to identify the requirements for network-wide management, so support for such operations can be built into the standard mechanisms used to manage a single device. 5.4.4. More High Level Management Features Additional high level features that will make network management easier include: Expires December 21, 2002 [Page 13] Internet Draft Network Management Observations June 21, 2002 Configuration Templates The primary motivation for introducing template capabilities in some fashion is the same for configuration management as it is for software development -- information hiding, controlled reusability, and tiered access control to specific parameters. Additional capabilities, such as the ability to upload, download, and modify templates may be needed to support this feature. Wildcard Operations Wildcarding allows a single set of commands to be applied to multiple instances (e.g., rows) in a single high-level protocol operation. This reduces the bandwidth required, and allows configuration to be applied without explicitly specifying every instance that needs to be affected. Additional error codes may be needed to support this feature. Named Configurations Many network devices support the ability to save and restore more than a single configuration file. At a minimum, it is common to differentiate between the running configuration and the configuration that will be loaded on the next reboot. The SNMP management framework should support this common feature. It is not clear what protocol enhancements are needed for this support. Better Integration With CLI Security Better integration with CLI oriented access control and authentication technology is needed. Operators are reluctant to deploy SNMP Security because it requires separate security administration. This reluctance would be reduced if investments in existing tools and technology for CLI security could be leveraged. 6. Acknowledgements Some comments in this memo have been previously articulated by various people in various forums, such as IETF meetings, and the dilbert- vision@snmp.com mailing list. 7. Normative References [RFC2026] Bradner, S., "The Internet Standards Process -- Revision 3", RFC 2026, Harvard University, October, 1996. Expires December 21, 2002 [Page 14] Internet Draft Network Management Observations June 21, 2002 8. Security Considerations This memo discusses current network management trends and does not introduce any new security threats. 9. Author's Address Andy Bierman Cisco Systems, Inc. 170 West Tasman Drive San Jose, CA USA 95134 Phone: +1 408-527-3711 Email: abierman@cisco.com Expires December 21, 2002 [Page 15] Internet Draft Network Management Observations June 21, 2002 10. Full Copyright Statement Copyright (C) The Internet Society (2002). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Expires December 21, 2002 [Page 16]