INTERNET-DRAFT Supratik Bhattacharyya Gianluca Iannaccone Sprint ATL Christophe Diot Intel June 1 2003 Deployment of inter-operable and cost-effective monitoring infrastructure in ISP networks Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. The key words "MUST"", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC 2119]. Abstract This document identifies issues and concerns in monitoring ISP networks. It outlines the components of a monitoring infrastructure designed to support ISP requirements. It discusses deployment and inter-operability issues. Related IETF working groups are identified. The goal of this document is to open a discussion on whether there should be a BOF addressing this issue at the next IETF (Vienna, July Bhattacharyya et al [Page 1] INTERNET-DRAFT June 1 2003 2003). 1. Introduction As the Internet continues to grow rapidly in size and complexity, it has become increasingly clear that its evolution is closely tied to a detailed understanding of network traffic. Network traffic measurements are invaluable for a wide range of tasks such as network capacity planning, traffic engineering and fault diagnosis. IP networks are designed with the goal of providing high availability and low delay/loss while keeping operational complexity low. Meeting these goals is a highly challenging task and can only be achieved through a detailed understanding of the network. Monitoring and measuring traffic in IP networks is difficult for a number of reasons. First, the designers of IP networks have traditionally attached less importance to network monitoring and resource accounting than to issues such as distributed management, robustness to failure and support for diverse services and protocols [1]. Thus IP network elements (routers and end-hosts) have not been designed to retain detailed information about the traffic flowing through them and IP protocols typically do not provide detailed information about the state of the underlying network. This poses the problem of adding enhanced monitoring and measurement capabilities to existing equipment and/or integrating special purpose monitoring equipment into existing networks. In addition, IP protocols have been designed to automatically respond to congestion (e.g., TCP) and failures (e.g., routing protocols such as IS-IS/OSPF). This makes it hard for a network administrator to track down the cause of a network failure or congestion before the network itself takes corrective action. Finally, the Internet is organized as a set of loosely connected networks (Autonomous Systems) that are administered independently. Hence the operator of a single network has no control over events occurring in other networks it exchanges traffic with. However, a network operator can gain some knowledge of the state and problems of other networks by studying the traffic exchanged with those networks. This document highlights some concerns and issues in monitoring ISP networks. The goal is to foster discussions about the work needed in the IETF to address these issues, and how this work should be organized under the aegis of different working groups. The major question to be raised by this document is whether we need a new working group in the Operation area of the IETF to work on the deployment of an inter- operable monitoring infrastructure for ISPs. Bhattacharyya et al [Page 2] INTERNET-DRAFT June 1 2003 2. Challenges posed by Operational Monitoring Various types of measurement data needs to be collected to support the monitoring applications [1]. We classified them in two broad categories: (i) aggregate information need to be collected at coarse time-scales and reported on a regular basis (e.g. SNMP, flows, routing tables) ; (ii) packet-level traces to analyze and understand a specific phenomenon. There are a number of implementation challenges in order to capture, process, summarize and export data at the required level of granularity at the time that it is needed. Some of these problems are being addressed in different IETF working groups whereas some others have not been. The question we ask here is whether a new working group is needed to undertake the following activities: (i) define a framework for monitoring needed to support day-to-day operations in IP networks, (ii) identify existing and on-going work in the IETF on various aspects of the framework and ensure that this work guarantees inter-operability among ISPs, and (iii) provide clear guidelines to equipment vendors on what infrastructure is needed to support monitoring in ISP networks. 3. Related IETF Working Groups The IP Performance Metrics (IPPM) Working Group [3] has been chartered to develop a set of standard metrics that can be applied to the quality, performance, and reliability of Internet data delivery services. The focus of this group has been on defining metrics based on active measurements such as one way delay, round-trip delay, link bandwidth capacity, etc. However, active measurements alone are insufficient to address the scale and complexity involved in continuously monitoring large ISP networks. Also, with the evolution of the technology and of the understanding of IP networks, new metrics have emerged. The Packet Sampling (PSAMP) working group [4] is chartered to define a standard set of capabilities for network elements to sample subsets of packets by statistical and other methods. The capabilities should be simple enough that they can be implemented ubiquitously at any link rate. They should be rich enough to support a range of existing and Bhattacharyya et al [Page 3] INTERNET-DRAFT June 1 2003 emerging measurement-based applications, and other IETF working groups where appropriate. While the work in PSAMP addresses a critical aspect of an operational monitoring framework, it can benefit from the definition of a set of metrics to be derived from the sampled and filtered packet data. It is very likely that different metrics will require different sampling techniques. Defining a set of metrics that are of common interest to many ISPs will ensure that PSAMP-capable monitoring systems (routers and/or special-purpose systems) have the capability to support the derivation of these metrics from collected data. The IP Flow Information Export (IPFIX) Working Group [5] has defined an architecture for flow information export. This includes flow definitions, a metering process at the observation point with sampling/filtering capabilities, an export process to export the data, and an export protocol for communication between the observation points and the collection stations. The two-level monitoring system envisaged in Section 2 fits well within the scope of the IPFIX architecture. However, ISPs need to develop a better understanding of their own monitoring needs and provide feedback to the IPFIX Working Group in order to ensure that the IPFIX architecture and flow export protocol meet their needs. There are several open issues, e.g., what are some commonly useful metrics, what is the volume of information that can be exported in practice, what features are needed for the protocol that control the interaction between the observation points and the collection stations, etc. ISPs need to work toward answering these questions to ensure that systems based on the IPFIX architecture meet their operational monitoring needs. 4. Discussion There are several challenges that ISPs face in order to use monitoring to ease the management of their network. Some of these, such as packet sampling/filtering and IP flow information export are being addressed by IETF working groups. However, the work in these groups would greatly benefit from knowledge about real-world experience in monitoring ISP networks. In addition, there are a number of open issues: (i) Inter-operability The extent to which the monitoring infrastructure of different ISPs need to inter-operate needs to be understood. This will involve communication among ISPS to specify requirements for monitoring data exchange, define metrics of common interest, etc. Bhattacharyya et al [Page 4] INTERNET-DRAFT June 1 2003 (ii) Storage, analysis and aging The storage and analysis of exported information presents a significant challenge for ISPs. This includes designing large storage systems (e.g., storage area networks) and harnessing processing power to analyze the collected information on a continuous basis. Moreover, ISPs need to determine how to age historical data that is retained for long-term planning. (iii) Control Plane Given the diverse information needs of ISPs and the wide range of tasks to be supported by monitoring, there needs to be a sophisticated control protocol between the observation points and collection stations. This protocol is primarily required for dynamically configuring the sampling/filtering/summarization processes at the observation points. It may also be used to coordinate communication between multiple observation points and collection stations, or between collection stations themselves. ISPs need to converge on a set of requirements on which the design of such a control plane can be based. ISPs will clearly benefit from a process that facilitates the sharing of their experiences and requirements, leading to a faster deployment of ubiquitous monitoring and management infrastructure. This process could also benefit equipment vendors by specifying on what is needed to support the monitoring needs of ISPs. We propose to organize a BOF meeting at the 57th IETF (Vienna, July 2003) to discuss the need for a new working group whose charter would be to (i) identify the missing parts in an operational monitoring framework, (ii) define what needs to be standardized in order to guarantee inter-operability among multiple ISPs, and (iii) provide guidelines to routing or monitoring equipment vendors to help them meet the requirements of the monitoring infrastructure. We believe that this approach is essential to ease and expedite the design of a standardized and comprehensive monitoring infrastructure. 5. References: [1] S. Bhattacharyya et al. "Network Measurement and Monitoring: A Sprint Perspective". Internet draft draft-bhattacharyya-monitoring- sprint-01. Work in Progress. Bhattacharyya et al [Page 5] INTERNET-DRAFT June 1 2003 [2] G. Iannaccone et al. "Monitoring very high speed links". In Proceedings of First ACM Sigcomm Internet Measurement Workshop (IMW), November 2001. [3] IP Performance Metric http://www.ietf.org/html.charters/ippm- charter.html [4] Packet Sampling http://www.ietf.org/html.charters/psamp- charter.html [5] IP Flow Information Export. http://www.ietf.org/html.charters/ipfix-charter.html 7. Authors' Address: Supratik Bhattacharyya Gianluca Iannaccone Sprint Advanced Technology Labs 1 Adrian Court Burlingame CA 94010 USA {supratik,gianluca}@sprintlabs.com Christophe Diot Intel 15 JJ Thomson Avenue Cambridge CB3 0FD UK christophe.diot@intel.com Bhattacharyya et al [Page 6]