INTERNET-DRAFT Supratik Bhattacharyya Expires September 1 2003 Gianluca Iannaccone Sue Moon Nina Taft Christophe Diot Sprint ATL March 1 2003 Network Measurement and Monitoring : A Sprint Perspective Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. The key words "MUST"", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC 2119]. Abstract This document provides an overview of network measurements and monitoring from the perspective of Sprint Advanced Technology Labs. It starts with a detailed discussion of the benefits of monitoring for an ISP's backbone and the type of measurements required for various tasks. It describes the tools and techniques currently in Bhattacharyya et al [Page 1] INTERNET-DRAFT March 1 2003 use, and then outlines what else is needed to meet the challenges of monitoring an operational network. 1. Introduction As the Internet continues to grow rapidly in size and complexity, it has become increasingly clear that its evolution is closely tied to a detailed understanding of network traffic. Network traffic measurements are invaluable for a wide range of tasks such as network capacity planning, traffic engineering and fault diagnosis. The Sprint IP backbone is designed with the goal of providing high availability and low delay/loss while keeping operational complexity low. Meeting these goals is a highly challenging task and can only be achieved through a detailed understanding of the network. Monitoring and measuring traffic in IP networks is difficult for a number of reasons. First, the designers of IP networks have traditionally attached less importance to network monitoring and resource accounting than to issues such as distributed management, robustness to failure and support for diverse services and protocols [1]. Thus IP network elements (routers and end-hosts) have not been designed to retain detailed information about the traffic flowing through them and IP protocols typically do not provide detailed information about the state of the underlying network. This poses the problem of adding enhanced monitoring and measurement capabilities to existing equipment and/or integrating special purpose monitoring equipment into existing networks. In addition, IP protocols have been designed to automatically respond to congestion (e.g., TCP) and failures (e.g., routing protocols such as IS-IS/OSPF). This makes it hard for a network administrator to track down the cause of a network failure or congestion before the network itself takes corrective action. Finally, the Internet is organized as a loosely connected networks (Autonomous Systems) that are administered independently. Hence the operator of a single network has no control over events occurring in other networks it exchanges traffic with. However, a network operator can gain some knowledge of the state and problems of other networks by studying the traffic exchanged with those networks. This draft presents Sprint ATL's perspective on the measurement and monitoring of IP networks. Sprint operates one of the largest IP backbones in the world, consisting of Points-of-Presence (PoPs) in several continents connected by long-haul links at speeds of OC-48 and OC-192. The underlying technology of the backbone is IP over DWDM. Sprint's monitoring requirements are diverse, and range from network design and capacity planning to traffic engineering and customer feedback. Currently some of these requirements are fulfilled using Bhattacharyya et al [Page 2] INTERNET-DRAFT March 1 2003 standard tools and techniques such as ping, traceroute and SNMP supplemented by a few commercial and proprietary tools. However the problem of building a comprehensive and integrated monitoring infrastructure to address all of Sprint's needs is far from solved. Several questions remain about the tasks that can benefit from monitoring, suitable time-scales, granularity of collected information, router-level support, design of specialized monitoring infrastructure, etc. This draft articulates Sprint's monitoring needs and challenges in order to provide a basis for the systematic development of monitoring tools, techniques and infrastructure. Although most of the discussion is inspired by experiences with Sprint's IP network, we believe that a significant portion can be generalized to other IP backbones of similar size and design. We start with a classification of tasks involved in operating an ISP backbone and how they may benefit from monitoring and traffic measurements. Based on this, we discuss current tools and capabilities and how they can aid in some of these tasks. This allows us to identify additional requirements and open challenges. The draft reaches two main conclusions. First a two-tier monitoring system is needed to comprehensively address the monitoring needs of an ISP. The first level consists of continuous coarse-grained monitoring on a network-wide basis. The second level consists of on-demand fine-grained monitoring at certain points in the network. The second conclusion of the draft is that the biggest challenges in building the above monitoring system is to develop sophisticated infrastructure and sampling techniques needed for packet-level measurements. 2. Benefits of monitoring In this section we discuss the importance of monitoring for IP backbones in terms of the various types of tasks that can benefit from an accurate knowledge of the network and the flow of traffic across it. 2.1. Topology Design Some of the important goals in designing the topology an ISP network are predictable performance, stable behavior and protection against failures. Knowledge about network traffic and the performance of equipment (failures, interoperability, etc.) is essential for meeting these goals. ISP backbones typically consist of a set of PoPs connected by high- speed links. The connectivity between a given set of PoPs (and hence the Pup-to-Pop topology) should be based on the traffic exchanged between every pair of Sops. For example, there is little justification for adding a direct link between two Pois that exchange a negligible amount of traffic. On the other hand, adding a direct link between two Pope that exchange large volumes of traffic may actually improve end- Bhattacharyya et al [Page 3] INTERNET-DRAFT March 1 2003 to-end latency. Information about traffic flows between Sops can be obtained by building Pup-level traffic matrices. A traffic matrix is a representation of traffic flow between an ingress point and an egress point (or a set of egress points) in a network. It is built by matching traffic demands entering the network at different ingress points with routing information. Statistical inference techniques based on the traffic volume of individual network links and routing information may also be used to build traffic matrices [2]. At several points in this isconfigurationsdocument, we will point out the importance of traffic matrices at various levels of aggregation and time granularity [3]. Today's IP backbones need to be designed to meet very stringent loss, delay and availability requirements. Fine- grained analyses of packet delay and loss across individual routers can help in determining the router requirements to achieve the required delay/loss requirements across a given network path. Similarly an understanding of the burstiness of traffic and a router's ability to handle traffic bursts is essential for dimensioning buffer capacities. Although router performance can be evaluated in a laboratory testing environment, measurement data from a router in an operational network at the granularity of packets (or flows) provides a far more exact basis for network design decisions. Network providers attempt to provide predictable performance to customers even when faced with equipment failures, software configurations and m. Large scale outages, involving multiple links and resulting in the loss of a significant portion of network capacity are not uncommon. However, an understanding of network dynamics during such failures (e.g., which links fail at the same time, how routing protocols shift the flow of traffic) is valuable for providing better protection with the same level of resource provisioning. In addition, the network must be well-protected against shorter duration failures that arise due to problems such as router crashes, software bugs and misconfigurations. The first step in doing so is to characterize the frequency of such events and their impact (e.g., routing protocol reconvergence times) of such failures. Such information can be gathered by collecting router logs, network-wide routing update messages, etc. Subsequent analysis of this monitoring data can identify weaknesses in the design of networking equipment and hence lead to design improvements. 2.2. Capacity Planning and Forecasting Traffic measurements and routing information are key to effective capacity planning. Measurements can identify traffic bottlenecks which may then be removed by upgrading the capacity of some links and/or creating a new path by adding links or routers. If an ISP can Bhattacharyya et al [Page 4] INTERNET-DRAFT March 1 2003 successfully predict the links that would be affected by adding a new customer or a peering link, it can plan ahead and upgrade those links and/or adjust routing configurations to avoid congestion. Successful capacity planning can be achieved only by accurately forecasting the growth of traffic in the network. Inaccurate forecasts can cause a network to oscillate between having excessive capacity to inadequate capacity, which in turn affects the predictability of performance. Statistical forecasting techniques that predict growth in traffic based on past traffic data is a central component of the forecasting process. The granularity at which measurements have to be collected for forecasting purposes depends on how far into the future we want to forecast. For example, traffic summaries on the time-scale of weeks or months may be appropriate for monthly or bi-monthly forecasts. On the other hand, monthly or bi-annual measurements may be appropriate for yearly forecasts. The ability to forecast traffic itself depends on an understanding of dominant trends (e.g., daily, weekly, etc.), and this understanding can only be gained through analysis of measurement data. Various statistical techniques such as ARIMA models and wavelet analysis can be then used to obtain fairly accurate forecasts [4]. 2.3. Operations and Management The daily operation and management of ISP backbone is performed by a small group of human operators. Their tasks can be divided into two broad categories : 2.3.1. Traffic Engineering: The goal of traffic engineering is to transport traffic across a network in such a way as to optimize resource utilization and achieve high performance. There is an intrinsic coupling between the traffic that a network has to carry and routing policies that determine how this traffic is carried across the network. For example, consider an ISP that uses a link- state protocol such as IS-IS/OSPF for intra-domain routing. Knowledge of the traffic matrix, i.e. traffic demands between pairs of ingress and egress points in the network, is crucial for setting IS-IS link weights and for evaluating the suitability or performance of any routing protocol. Also, traffic measurements inform the network operator about how changes in link weights affect the flow of traffic. Proper configuration of routing protocols also requires knowledge about traffic flows. For example, when a network operator configures BGP policies to select one out of many possible exit points for traffic to a remote network, it helps to know of how much traffic is headed towards that network. Bhattacharyya et al [Page 5] INTERNET-DRAFT March 1 2003 Network measurements can assist an operator in blocking off unwanted traffic. For example, an ISP is usually unwilling to be the transit path between two other ISPs of the same size (e.g. tier-1 ISPs). By classifying traffic according to ingress/egress points and/or BGP prefixes, an operator may be able to quickly identify and block such unwanted transit traffic. Traffic measurements can provide input for designing traffic engineering approaches to cope with sudden instabilities and changes. Some of these changes may simply be day-to-day variations in traffic patterns. A network operator can exert proactive control in this case by studying the pattern of these variations and planning ahead for them (e.g., assigning different sets of link weights during day and night). On the other hand, unexpected changes can happen due to factors beyond the control of the operator such as a flash crowd due to an extraordinary Internet-wide event, or excessive instability in the global routing infrastructure. In this case, the network operator can only exert reactive control, e.g., to diverting traffic to lightly loaded paths if a path through suddenly get congested. 2.3.2. Fault Diagnosis and Troubleshooting: A significant part of a network operator's duties is to identify, diagnose and fix network problems that arise on a day-to-day basis. Detailed and timely feedback about the state of the network is invaluable for detecting problems quickly, identifying the causes and taking corrective action. Note that a large part of troubleshooting done by an operator in today's network is reactive, i.e. based on a customer complaints. Obtaining feedback from network elements (routers, etc.) and specialized monitoring equipments can not only speed up the response time for these complaints (therefore improving customer satisfaction), but can also enable operator to be more "pro-active" in preventing the problems before customers notice them. For example, the abilities to track the path of packets across a network or to trace back a packet to its points of entry into the network is important in tracing and potentially stopping denial-of- service attacks. Although this capability is non-existent in traditional IP protocols, techniques have been proposed recently to do this [6,7]. Moreover, router and routing protocol misconfigurations may be identified by tracing the trajectory of a packet across the network and checking if the packet conforms to its expected path. In addition to data traffic, routing updates (such as BGP and IS-IS messages) are a valuable source of information for studying the behaviour of routing protocols, tracking link/router failures, Bhattacharyya et al [Page 6] INTERNET-DRAFT March 1 2003 understanding routing protocol convergence time, and also for troubleshooting. Moreover correlating routing updates with traffic data can is essential for gaining insights into various problems such as the origin of routing loops, the extent to which equipment failure disrupts packet delivery, sudden link overload or a sudden spike in a router's CPU utilization. 2.4. Customer-Driven Activities An ISP should be able to provide the following services to a customer: 2.4.1. Service-Level-Agreements: Most of today's SLAs are based on three metrics - loss, delay and port availability. Loss and delay are computed network-wide averages over a fairly long time period, (e.g. a month). The term "port" refers to the point at which a customer attaches to the provider's network (typically an interface on a router). Availability is measured in terms of the fraction of time that this port is operational. One of the reasons for the lack of more sophisticated SLAs is the lack of proper infrastructure to measure traffic and provide concrete evidence about adherence to SLAs. Many providers circumvent this problem by offering pay-outs to customers who complain about lack of adherence to SLAs. Other providers attempt to distinguish their SLAs in terms of the how they handle SLA violations, e.g., proactively notifying customers or offer monetary compensation. While compensating customers generally has a negative financial effect on an ISP's revenues, it certainly improves customer satisfaction. It is therefore imperative for an ISP to (i) engineer the network so that SLAs are met and (ii) determine SLA breaches accurately via continuous and detailed network measurements. Currently, operators measure loss and delays in their network using standard tools such as ping and traceroute or some custom active probing tool. However, such a simple approach is insufficient for introducing more complex metrics such as service availability or producing detailed statistics such as averages on different time- scales, measures of variance, median values. Moreover, it is envisioned that in the near future, an operator may be required by customers to provide delay, loss or service availability statistics between specific points in their (e.g., delays between every pair of PoPs). This will certainly require more sophisticated tools and more extensive measurements. Passive techniques such as trajectory sampling[6] may help in that case to reduce the volume of active Bhattacharyya et al [Page 7] INTERNET-DRAFT March 1 2003 probes needed. 2.4.2. Blocking attacks : Customers often request their providers to block denial-of-service attacks launched at their networks, and to identify the source of the attacks. There are legal issues involved in this problem which we will not address in this draft. However, the outcome of these issues is that an ISP is restricted to detecting and blocking attacks based only on transport-level information (e.g., TCP and IP headers). Network operators today have few tools at their disposal to provide even this limited level of service. Typically an operator, on receiving a customer complaint about a possible attack, blocks ALL of the traffic destined to the customer's network. It then attempts to determine the source(s) of the attack by doing off-line analysis of the customer traffic. The ready availability of monitoring infrastructure and supporting analysis tools [6,7] would greatly help an operator in performing early detection (e.g., identifying spoofed source addresses) and greatly reduce that a customer's network availability is disrupted. For example, if an operator has the ability to capture traffic flow volumes between pairs of BGP prefixes or pairs of links, then anomalous changes in the volumes of one or more of these flows could serve as an alarm for possible attacks. 2.4.3. Customer Traffic Engineering: A customer may request information about how an ISP carries its traffic to and from other networks, such as the hop count and delays on the routes used by the customer's traffic, the egress point from the ISP's network, etc. This enables the customer to verify that packets are not being routed along network paths with large latencies. A multi-homed customer may be interested in the amount of traffic that it exchanges with the provider at every access point in order to determine the appropriate amount of bandwidth it needs to buy from the provider at each access point. A customer may want to know the breakup of incoming traffic by BGP prefixes so that it can set BGP policies to improve load balancing across its multiple access links. If a customer uses its provider's network to set up a Virtual Private Network (VPN) among multiple sites, it may want to know the total traffic exchanged between every pair of sites. Many of these statistics can be collected by customers themselves by installing monitoring equipment in their networks or by injecting active probes into the provider's network. However, a provider could increase customer satisfaction by supplying this information. To do so, it has to monitor its own network extensively. Bhattacharyya et al [Page 8] INTERNET-DRAFT March 1 2003 3. Requirements for Operational Monitoring In this section, we first discuss the types of measurement data that an ISP needs to collect and which of them are needed for each of the task categories from the previous section. Based on this, we identify a two-tier monitoring system for an ISP backbone. We conclude with a discussion on current capabilities and to what extent they can fulfill the monitoring needs of an ISP. Measurement data can be broadly classified as follows: a. Packet-level: This captures information at the granularity of IP- level packets. Packet-level traces can be collected at links and/or routers. Each such trace consists of all (or a subset of) packets that traverse a a link or a router. In some cases, only a fixed portion of every packet may be captured. In any case, it is crucial to attach a very accurate and fine-grained (e.g., microsecond) time-stamp to each packet at the time of collection. b.Flow-level : Loosely speaking, a flow consists of a set of packets at an observation point (e.g., a link) grouped together on the basis of a common property. For example, packets could be grouped into flows based on the five-tuple - source IP address, destination IP address, source port, destination port and protocol number. Flow-level measurements consist of information that is common to packets in a flow (e.g., destination address, destination port, etc.) or some property derived from the packets (e.g., total volume, flow duration, etc.). Flow-level information is inherently more aggregated than packet-level and does not yield certain kinds of details such as inter-packet spacing, delay or jitter. However, its advantage lies in the ease of storage and manipulation due to the much smaller volume of data collected. c. Routing: This consists of and protocol messages that are exchanged between routers and snapshots of routing tables collected from routers (e.g,, IS-IS, OSPF, I-BGP, etc.) and external, typically BGP. d. Path-Level: This is information about the path traversed by a packet (or a set of packets) through an ISP network, and includes information on the path itself (i.e., router hops) and the quality of the path (e.g., loss, delay, jitter, etc.). The standard technique for gathering path traces is traceroute [8] but other techniques such as IP traceback [7], trajectory sampling [6] have been proposed recently for the same purpose. e. Network Element-Specific: This covers a wide array of information that is specific to a specific piece of network equipment, e.g., the Bhattacharyya et al [Page 9] INTERNET-DRAFT March 1 2003 utilization of a single link, configuration of a router, router CPU utilization, etc. The most common way of collecting a variety of such information is SNMP. Such data, when collected on a network-wide basis can provide valuable information about the overall state of the network. Table 1 specifies the minimal data requirement for each of the task categories identified in Section 2. -------------------------------------------------------------------- Packet Flow Routing Path Network Element Trace Specific -------------------------------------------------------------------- Topology Design N Y N N Y --------------------------------------------------------------------- Capacity Planning N Y N N Y Forecasting --------------------------------------------------------------------- Traffic Y Y Y N Y Engineering --------------------------------------------------------------------- Fault Diagnosis Y Y Y Y Y Troubleshooting --------------------------------------------------------------------- SLAs Y Y N N N --------------------------------------------------------------------- Blocking Attacks N Y Y Y N --------------------------------------------------------------------- Customer TE Y Y Y Y N --------------------------------------------------------------------- Note that the above table does not capture the time granularity of information to be collected. In general, tasks such as network design and capacity planning require data on coarser time-scales such as days, weeks or months, while other tasks such as traffic engineering, fault tolerance and SLAs may require much information on a much finer time-scale, e.g., hours, minutes or seconds. To summarize, we observe two broad kinds of needs : aggregate information on relatively coarse time-scales and relatively fine- grained information (e.g., packet-level) on finer time-scales that need to be collected only at certain times. Accordingly we identify the need for a two-tier monitoring system to comprehensively address all the monitoring needs of an ISP : Bhattacharyya et al [Page 10] INTERNET-DRAFT March 1 2003 a. continuously collect aggregated statistics on relatively long time- scales (e.g., minutes or hours) from the entire network. This includes information about network elements such as average link utilization, router CPU loads, routing information such as periodic dumps of routing tables, etc. In addition, on-line processing needs to performed on packet level data to extract and summarize various types of information. For example, the size of each packet can be extracted from the IP header to periodically generate packet size distributions. Another possibility is to compress packet-level data into flow records. [21] has shown that a flow trace is between 3 and 4 times shorter than the equivalent packet trace. These flow records may be exported periodically to a central collection station. The monitoring system must also be capable of generating various statistics based on the flow records. Since the volume of flow statistics is relatively small, they can be exported frequently to a central collection station and archived there for a long time. This information is sufficient for monitoring network-wide performance under normal conditions and for detecting anomalies or abnormal behavior. Moreover, frequent export of this information from the monitoring systems will provide a network operator with an up-to-date view of the network. b. Collect fine-grained data (i.e., packet-level) on demand from specific network elements. This requires packet-level data collection at line speeds for a finite duration. Since it is impossible to predict when such fine-grained information is necessary, a monitoring system must continuously capture packet-level data but only retain a limited amount of past data on a local disk. Additionally, there should be a mechanism to export this data to a remote repository on demand. At all times, the local disk should retain all packet records for at least one full day. This will enable a network operator to download and analyze fine-grained packet data in case abnormalities are detected or in the aftermath of an exceptional event such as a large-scale attack or widespread outage. This data can also be downloaded occasionally for analyzing network characteristics such as single-hop delay [22] that rely on packet-level information. 4. Current Measurement Capabilities In this section we describe current network measurement capabilities in order to identify the challenges in building the two-level monitoring infrastructure proposed in the previous section. 4.1. Flow-level information State-of-art routers provide support for gathering information about traffic flowing through them at the level of flows [11,13]. As Bhattacharyya et al [Page 11] INTERNET-DRAFT March 1 2003 mentioned before, flow-level data yields more aggregated information than packet-level, but is easier to store and manipulate. However, a number of issues remain open in collecting flow-level information on routers. Information related to a flow is typically maintained in a memory cache for the lifetime of a flow and the cache is flushed on termination of the flow. A high-speed link with speeds of OC-12/48 or higher may contain millions of flows per minute which may stress the memory availability of a router interface. Another problem lies in the current implementations of flow-based monitoring facilities on state- of-art routers. While such flow-level monitoring is offered as an optional features by most router vendors, the implications of turning these features on is poorly understood. Anecdotal evidence indicates that flow-level monitoring can severely impact the essential functions of a router (packet forwarding, updating routing information, etc.) by increasing CPU loads, and may even cause routers to crash. While such problems may be addressed through better implementations, the memory and CPU requirements of monitoring high-speed interfaces (OC-48 and above) are still open questions. For example, it was observed that initial implementations of Netflow [11] significantly impacted router performance during TCP SYN attacks. All of the above problems have resulted in very limited use of existing flow-based monitoring capabilities by operators of large networks. These facilities are usually turned on for short periods of time on a few router under special circumstances. 4.2. Routing Information State-of-art routers also log a variety of events such as interface failures, software reboots, routing updates, etc. These logs provide useful hints to operators for day-to-day troubleshooting. However, it is difficult to collect such logs on a continuous basis. In practice, operators collect router logs for diagnosing specific problems such as why an interface is failing repeatedly, why there is a sudden change in the route that a customer's packets are taking, etc. However, other tools have been developed to passively collect routing updates that are propagated network-wide. Such logs are invaluable in capturing routing dynamics and for troubleshooting. 4.3. Path-Level Information : Ping [8] is a widely used facility based on the ability of an Internet node to send an "ICMP Echo Reply" when it receives an "ICMP Echo Request" packet from another host. Its only use is in determining reachability from one node to another and to get a rough estimate of the round-trip time between the two nodes. It can used by an operator Bhattacharyya et al [Page 12] INTERNET-DRAFT March 1 2003 to determine the whether a router in the network (or one of its interfaces) is "alive". Traceroute [8] is also well-known and widely used. It is somewhat more powerful than ping in that it reveals the route that a packet takes from one Internet host to another. Traceroute also provides a rough estimate of the round-trip time to every network node along the path. An operator may use traceroute to trace the path between any two given points in a network. Ping and traceroute are certainly useful in getting a ready and basic sense of routing and reachability across a backbone. However they provide a very limited set of abilities and are clearly insufficient to meet all the monitoring requirements of an ISP. Moreover, ping and traceroute are essentially tools to measure reachability, not performance. While they may be used to obtain rough estimates of latencies and loss rates along certain paths in a backbone, such estimates may not be statistically significant. First, when active probes are used to measure the quality (e.g., delay/loss) of a network path, the interval between successive probes has to be set carefully in order to allow for statistically meaningful interpretation of the results. Routers usually throttle the rate at which they respond to ICMP echo requests and this can interfere with a measurement method that is based on a series of pings or traceroutes. Moreover, traceroute does not provide any information about the reverse path back to the source. Given that Internet routing is largely asymmetric, there is no guarantee that successive traceroutes out from the same host traverse the same round-trip path back to the host. This makes it hard to interpret the measurements obtained. Further, routers and end- hosts are often configured to rate limit (or not respond altogether) to ICMP echo requests. Finally, network operators often respond unfavorably to a large number of pings/traceroutes directed to hosts in their network since it could be the indication of a denial-of- service attack. 4.4. Network Element-Specific Information The Simple Network Management Protocol (SNMP) is the de facto standard in the Internet for collecting a wide range of statistics (e.g., faults, accounting, performance, etc.) from individual network elements to collection stations [9,10]. The biggest advantage of SNMP is that it is widely deployed, and enables network operators to generate a quick snapshot of the overall state of their network in terms of variables such as link utilizations, packet losses, router CPU utilizations, etc. A centralized repository of SNMP information can be used as the back-end Bhattacharyya et al [Page 13] INTERNET-DRAFT March 1 2003 system for visual tools that an operator can use to continuously monitor a network. Other tools can be designed on top of SNMP data to sound alarms and draw the attention of operators in the event of faults or anomalies. Moreover SNMP data collected over months or years (such as link utilization) can be used as input to network forecasting tools. One of the disadvantage of SNMP lies in the time-scale of information - information can be obtained only on the time-scale of minutes or longer. Moreover SNMP does not provide any information about traffic through the network, e.g., point-to-point demands, the mix of protocols and applications, the breakup of traffic on a per BGP prefix, the variation in packet delay across the network, etc. As discussed in Section 2, such information is essential for a variety of networking tasks. In summary, SNMP (or its enhancements) can play an important role in the coarse-grained component of our two-level monitoring infrastructure. 4.5 Packet-level Information Measuring traffic at the granularity of individual packets yields very fine-grained information. However it also introduces the challenges of capturing large volumes of data at very high-speeds and then storing and manipulating the collected data. In particular, collecting every packet on every interface on a modern high-speed router is a daunting task. Two approaches have been used so far to address this issue. The first one is to use "port-mirroring' [13], where every packet on an incoming interface can be written out to a "monitoring" interface, in addition to being routed to the output interface. However this approach potentially requires one additional monitoring interface per router interface - a prohibitively expensive option since it would constrain an ISP to using no more than half of the available interfaces on a router for connecting to customers and peer networks. If instead, a single monitoring interface was added for every group of say N interfaces, then the monitoring interface would have to support a packet forwarding speed that is N times each interface. The problem becomes tractable only if a subset of packets on each interface is captured. This introduces the requirement for sampling which is discussed later in the section. The second approach is exemplified by [14], where a special purpose monitoring equipment is used to tap the optical signal on a link and Bhattacharyya et al [Page 14] INTERNET-DRAFT March 1 2003 capture a fixed part of every packet. Each packet is stamped with a highly accurate GPS-synchronized time-stamp at the time of capture. While there are several benefits to the information that is thus captured, the greatest challenges are the infrastructural cost and the dynamic nature of operational networks. These monitoring systems have to be installed inside PoPs where space is extremely limited and prohibitively expensive. Furthermore, extreme care has to be taken when installing and operating these systems so as not to accidentally disrupt network operations. Finally operational networks are in constant evolution with links and routers being reconfigured, commissioned or decommissioned. This makes the maintenance and management of the monitoring systems an operational nightmare. 5. Implementation Issues for a Two-level Monitoring Infrastructure The discussion in the previous section shows that at present there is a limited amount of capabilities for collecting each of the five broad classes of monitoring information. We also observed that the biggest challenge lies in collecting, storing and exporting packet-level data. In this section, we examine two key implementation issues for a two- level monitoring system. 5.1. Packet Capture and Processing at Line Speeds There are several challenges in performing information processing on packet records and exporting the results. First, the computation must be simple - at OC-192 speeds (10 Gbps), a new packet arrives every 240 ns on average (assuming 300 byte packets). This allows only 360 instructions per packet on the fastest processors available today. The number of instructions per packet reduces to 90 for OC-768 links. Second, given the space and power constraints in a backbone PoP, it is impossible to have multiple systems performing specialized monitoring tasks for the same link. Instead we need a single system that is highly configurable and can collect information for a wide-range of network operational tasks. Finally, the system must be robust to denial of service attacks. A system that summarizes packet data into flow records needs to keep track of all active flows at any given instant. During an attack, the number of active flows may increase by orders of magnitude, thereby overwhelming the memory and processing capabilities of the system. To guard against this, the system must either sound an alarm or start filtering data intelligently when there is a surge in the number of active flows. Bhattacharyya et al [Page 15] INTERNET-DRAFT March 1 2003 5.2 Location of Packet-Capture Functionality The second challenge of packet-level capture lies in determining the most appropriate location of packet-capture functionality. There are two possibilities - adding this functionality to a router, or building a special-purpose monitoring system such as [14]. Packet-capture on routers is preferable for a number of reasons. First if we are interested in how long packets are queued inside a router, it is difficult to use an external monitor. Second, if we are interested in the paths taken by packets, we would need to know the ingress and egress ports on a router. Again, this is difficult to do with an external monitoring system unless an up-to-date snapshot of the router's routing table is available. Third, it is invasive to splice a link and insert a passive monitor. On the other hand, if we wish to add monitoring capabilities inside a router, we will encounter problems of scalability in terms of data rate, storage and processing. For example, aggregate data rates of backplanes are as high as a few hundred gigabits per second today, and will exceed terabit per second soon. This creates enormous challenges for processing and storage elements. Furthermore, routers are generally built to have the highest capacity for a given size and power consumption. Monitoring equipment can be bulky and can consume a lot of power. In addition, newer routers used switched backplanes rather than shared backplanes. As a results not all packets are seen at any single point of the backplane. Routers should possess the capability for capturing all packets on a given interface, even if this capability is used only at certain interfaces on a demand-basis. A possible approach is to replicate each packet on the router interface card and hold this copy temporarily in a memory specifically allocated for this purpose. While a packet is held on the interface card, different kinds of processing could be performed on it. For example, certain fields in the packet header could be extracted. Or the packet could be assigned to a "flow" based on some property that it shares with other packets (e.g., source or destination address), and the per-flow information (e.g., total byte count) may be updated. The packet may also be correlated with the routing information already available on the router interface. While the above approach is attractive, there are three serious limitations on its feasibility at present. First, there are several open questions about the complexity of operations to be performed on a packet while it is being held on an interface card - the number of processor cycles available for a packet, the length of time for it has to be stored, the size of memory required, etc. The second concern is about the export of collected information [18,19]. As pointed out in Bhattacharyya et al [Page 16] INTERNET-DRAFT March 1 2003 Section 4.5, capturing every packet on every interface on a router has inherent scaling problems, and can be prohibitively expensive. As line speeds increase, capturing even a portion of every packet can tax the hardware capabilities of today's routers. Finally, the volume of data potentially generated for high-speed links (e.g., OC-48 and above) raises serious concerns about the feasibility of widespread deployment. On the other hand, collecting packet traces at high-speed links with special-purpose monitoring systems is also difficult. PCI bus throughput is already challenged at OC-48 [14], and the PCI bus is crossed twice for any data transfer - once from the capture board to the main memory, and a second time from the main memory to the hard disk. Memory access speeds do not increase as quickly as link speeds. In fact, memory access speeds have not increased much in the past five years, and we cannot rely on technology improvements in this area for next generation monitoring tools. Finally, disk array speed cannot keep up with bandwidth. At OC-192 link speed, a packet-level trace would require a disk bandwidth of roughly 250 Mbytes/sec (assuming 300 byte packets and 64 byte packet records). However, it is unlikely that routers will include extensive monitoring function anytime in the near future. Therefore it is important to consider the development of passive monitoring infrastructure external to routers. The passive monitoring systems designed and deployed by Sprint in its IP backbone [14] demonstrates the feasibility of packet- capture at very high-speeds (OC-3 to OC-192). However, it must be enhanced to satisfy all the requirements of the proposed two-level infrastructure (Section 3). 6. The Role of Sampling The use of sampling techniques can bring network-wide packet capture into the realm of feasibility by thinning the stream of packets to be captured and partially alleviating the problem associated with packet capture at very high link speeds [5]. As link speeds continue to increase, a very limited number of instruction cycles are allowed per packet on routers (as discussed in Section 5.1). Performing some limited computations on a subset of packets captured on a link may be feasible, but doing so for every packet may not be. Sampling reduces the memory/processor requirements on a router as well the amount of information to be exported from a router interface to an off-board collection station. This in turn reduces the scalability requirements for systems to store and analyze the collected data. Given the size of an ISP backbone, the collection of every packet on every router interface and link is simply infeasible. However, with a Bhattacharyya et al [Page 17] INTERNET-DRAFT March 1 2003 reasonably low sampling rate, an ISP may be able to collect a subset of packets (or information derived from them) on a network-wide basis. While sampling is not essential for enabling packet-level monitoring, there are severe limitations on packet capture and processing at line speeds in the absence of sampling. Therefore sampling is a key facilitator for the proposed two-level monitoring infrastructure. Previous work on sampling showed that packet-triggered sampling is better suited for packet size distribution and inter-packet times than time-triggered one [15]. More recent work by Choi et. al. [17] takes the variability of packet sizes into consideration, and proposes an adaptive sampling technique with bounded error for byte counts. Charging based on usage, where customers are charged based on their connection time or bandwidth consumption, may soon become a common practice in the Internet. Duffield et al [6] have designed an algorithm to preferentially sample dominant traffic contributors and to use the sampled data for charging. Most routers implement a very simple packet-triggered sampling technique. The basic idea is to take every n-th packet only, and use in flow-based accounting such as NetFlow. As the number of packets accounted for decreases, the overhead of maintaining flow states decreases. Router vendors often recommend sampling rates no more than 100:1 to 1000:1, which demonstrates how severe performance degradation can be by such overhead. Very little work has been done in identifying suitable sampling techniques for operational networks. It is very likely that different performance metrics will require different sampling techniques. Also, as network characteristics evolve, sampling techniques would also require constant revision. The first step towards designing suitable sampling techniques is to identify important network tasks that require monitoring data and the type and granularity of data required for each. Next, different sampling techniques need to be evaluated for each metric of interest. A representative set of complete packet traces (where all packets are collected) is required to evaluate the relative performance of the techniques. Finally, the feasibility of implementing these techniques on routers has to be examined. As an example, consider the design of sampling techniques for estimating the 99th percentile of delay distribution between any two routers in a network. Assume that prior knowledge of the delay distribution is available through the collection and analysis of entire packet traces [22]. The problem is to determine the number of random packets samples that are needed in order to obtain the 99th percentile of the delay distribution with a desired accuracy. [23] has determined this sample size as a function of the delay distribution, desired accuracy, etc. Bhattacharyya et al [Page 18] INTERNET-DRAFT March 1 2003 Once a suitable random sampling rate has been computed, the actual measurement can be performed in one of two ways - passive and active. For passive measurements, monitoring systems are installed at the two ends of the network path being measured. At each end, packets are sampled randomly using the computed sample size. Let us refer to these two points as A and B, where packets travel from A to B. Only a subset of the packets captured at observation point A will travel downstream to B while the rest will travel on different paths to other destinations. Therefore, the packets captured at points A and B have to be matched to determine the exact set of packets that traverse the path being measured. A delay distribution is then computed based on these packets. There are three serious difficulties with the passive measurement approach. First, monitoring systems with synchronized time clock have to be installed at two ends of the path(s) to capture packets. Second, there may be very little or no match between the packets sampled at the two ends. Therefore the degree of packet matching has to be factored into the computation of a suitable random sample size. Furthermore, the captured packets have to be transmitted to a central collection station to match identical packets from two measurement points and to compute the corresponding delay. An active measurement approach avoids the above problems. In addition, it can be easily configured to measure round-trip delay or one-way delay. In the latter case, probe packets are injected at point A and collected at point B. The rate of generation of these probe packets is based on the random sample size computed [23]. The 99th percentile of the delay distribution is then computed based on the delay experienced by the probes. [23] shows that an active measurement approach yields a 99th percent delay estimate that is comparable to that obtained from a passive approach with random sampling. However, the active approach also has drawbacks. First, the injection of the active probes should not be so invasive as to alter the delay characteristics of the traffic flowing along the measurement path. Moreover, the probe characteristics need to be representative of all packets on the measurement path. Therefore the probe generation process has to carefully account for factors that influence delay characteristics such as packet size, the existence of multiple routes between two end points, etc. Finally, probes need to be generated and captured either by specialized monitoring systems or by routers. Specialized systems involve overheads such as space and power consumption, maintenance, etc. as in the passive monitoring case. When probes are generated by routers using a facility such as Cisco's Service Assurance Agent (SAA) [24], care must be taken not to overburden the router's CPU with probe generation and processing. Otherwise active probing will interfere with important tasks that a router's CPU normally Bhattacharyya et al [Page 19] INTERNET-DRAFT March 1 2003 handles (e.g., processing routing updates). 6. Conclusions In this draft, we have discussed the requirements for monitoring as part of the design, management and operation of an IP backbone network. We have identified the need for a two-tier monitoring infrastructure consisting of a coarse-grained component for continuous network-wide monitoring and fine-grained component for on-demand monitoring. We have identified some key implementation issues for the proposed two- level monitoring system. This includes support of monitoring functionality on routers and the design of special monitoring systems capable of capturing and analyzing packet-level information. We also discussed the importance of sampling techniques in facilitating network- wide monitoring. We believe that the issues of hardware design and sampling techniques must be addressed in depth for monitoring to become an integral part of backbone design, management and operations. 7. References: [1] D.D. Clarke. "The Design Philosophy of the DARPA Internet Protocols". In Proc. ACM SIGCOMM August 1988. [2] A. Medina et al. "Traffic Matrix Estimation: Existing Techniques and New Directions". In Proc. ACM SIGCOMM August 2002. [3] A. Medina et al. " A Taxonomy of Traffic Matrices". In Proc. SPIE ITCOM Conference on Scalability and Traffic Control in IP Networks. [4] M. Grossglauser and J. Rexford. "Passive Traffic Measurements for IP Operations". To Appear in INFORMS 2002. [5] N. Duffield et al. "A Framework for Passive Packet Measurement". Internet Draft draft-duffield-framework-papame-01. [6] N. Duffield and M. Grossglauser. "Trajectory Sampling for Direct Traffic Observation". In Proc. ACM SIGCOMM September 2000. [7] S. Savage et al. "Practical Network Support for IP Traceback". In Proc. ACM SIGCOMM September 2000. [8] W. R. Stevens. "TCP/IP Illustrated Volume 1". Addison-Wesley, 1994. Bhattacharyya et al [Page 20] INTERNET-DRAFT March 1 2003 [9] J. Case et al. "A Simple Network Management Protocol (SNMP)". IETF Request for Comments 1098. [10] W. Stallings. "SNMP, SNMPv2, SNMPv3, and RMON 1 and 2". 3rd edition. Addison-Wesley 1999. [11] "Cisco Netflow". http://www.cisco.com/warp/public/732/netflow/index.html. [12] "Sampled Netflow". http://www.cisco.com/univercd/cc/td/doc/product/software/ios120/120newft/120limit/120s/120s11/12s_sanf.htm [13] "Traffic Sampling and Forwarding Overview". http://www.juniper.net/techpubs/software/junos53/swconfig53-policy/html/sampling- overview.html [14] C. Fraleigh et al. "Design and Deployment of a Passive Monitoring Infrastructure". In Proc. Passive and Active Measurement Workshop 2001, April 2001, Amsterdam. [15] K. C. Claffy et al. "Application of Sampling Methodologies to Network Traffic Characterization". In Proc. ACM SIGCOMM 1993. [16] N. Duffield et al. "Charging from Sampled Network Usage". In Proc. ACM SIGCOMM Internet Measurement Workshop 2001 November, 2001. [17] B. Choi et al. "Adaptive Random Sampling for Load Change Detection". Extended Abstract in Proc. ACM SIGMETRICS June, 2002. [18] N. Brownlee, C. Mills and G. Ruth. "Traffic Flow Measurement: Architecture", RFC 2722. [19] IP Flow Information Export. http://www.ietf.org/html.charters/ipfix-charter.html [20] J. Quittek et al. "Requirements for IP Flow Information Export". Internet Draft draft-ietf-ipfix-reqs-03. Expires December 2002. [21] G. Iannaccone et al. "Monitoring very high speed links". In Proceedings of First ACM Sigcomm Internet Measurement Workshop (IMW), November 2001. [22] D. Papagiannaki et al. "Analysis of Measured Single-Hop Delay from an Operational Back bone Network." In Proceedings of IEEE Infocom 2002. [23] B.Y. Choi et al. "Practical Delay Measurements in an IP Network". Sprint ATL Technical Report TR03-ATL-022810. Bhattacharyya et al [Page 21] INTERNET-DRAFT March 1 2003 [24] "Cisco Service Assurance Agent (SAA) User Guide". http://www.cisco.com/warp/public/732/Tech/nmp/saa/saatech.shtml 7. Authors' Address: IP Group Sprint Advanced Technology Labs One Adrian Court Burlingame CA 94010 USA ipgroup@sprintlabs.com http://www.sprintlabs.com Bhattacharyya et al [Page 22]