ARMD BOF M. Karir Internet Draft J. Rees Intended status: Informational Track Merit Network Inc. Expires: January 2012 July 10, 2011 Address Resolution Statistics draft-karir-armd-statistics-01.txt Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This Internet-Draft will expire on January 10, 2012. Copyright Notice Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the BSD License. Karir Expires January 10, 2012 [Page 1] Internet-Draft ARMD Statistics July 10, 2011 Abstract As large scale data centers continue to grow with an ever-increasing number of virtual and physical servers there is a need to re- evaluate performance at the network edge. Performance is often critical for large scale data center scale applications and it is important to minimize any unnecessary latency or load in order to streamline the operation of services at such large scales. To extract maximum performance from these applications it is important to optimize and tune all the layers in the data center stack. One critical area that requires particular attention is the link-layer address resolution protocol that maps an IP address with the specific hardware address at the edge of the network. The goal of this document is to characterize this problem space in detail in order to better understand the scale of the problem as well as to identify particular scenarios where address resolution might have greater adverse impact on performance. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 0. Table of Contents 1. Introduction...................................................3 2. Terminology....................................................3 3. Factors That Might Impact ARP/ND Performance...................4 3.1. Number of Hosts...........................................4 3.2. Traffic Patterns..........................................4 3.3. Network Events............................................4 3.4. Address Resolution Implementations........................4 3.5. Layer 2 Network Topology..................................5 4. Experiments and Measurements...................................5 4.1. Experiment Architecture...................................5 4.2. Impact of Number of Hosts.................................8 4.3. Impact of Traffic Patterns................................8 4.4. Impact of Network Events..................................9 4.5. Implementation Issues....................................10 4.6. Experiment Limitations...................................10 5. Emulating Address Resolution Behavior.........................11 6. Conclusion and Recommendation.................................11 7. Manageability Considerations..................................11 8. Security Considerations.......................................11 9. IANA Considerations...........................................12 Karir Expires January 10, 2012 [Page 2] Internet-Draft ARMD Statistics July 10, 2011 10. Acknowledgments..............................................12 11. References...................................................12 Authors' Addresses...............................................12 Intellectual Property Statement..................................13 Disclaimer of Validity...........................................13 1. Introduction Data centers are a key part of delivering Internet scale applications. Performance at such large scales is critical as even a few milliseconds or microseconds of additional latency can result in loss of customer traffic. Data center design and network architecture is a key part of the overall service delivery plan. This includes not only determining the scale of physical and virtual servers but also optimizations to the entire data center stack including in particular the layer 3 and layer 2 architectures. One aspect of data center design that has received some close attention is link-layer address resolution protocols such as Address Resolution Protocol (ARP - IPv4) and Neighbor Discovery (ND - IPv6). The goal of these protocols is to map an IP address of a destination node with the hardware address of the network interface for that node. This address resolution occurs at the edge of the network. In general, both ARP and ND are query/response protocols. In order to maximize performance it is important to understand the behavior of these protocols at large scales. In particular, we need to understand what the performance implications of these protocols might be in terms of the number of additional messages that they generate as well the resulting load on devices on the network that must then process these messages. 2. Terminology ARP: Address Resolution Protocol ND: Neighbor Discovery ToR: Top of Rack Switch VM: Virtual Machines Karir Expires January 10, 2012 [Page 3] Internet-Draft ARMD Statistics July 10, 2011 3. Factors That Might Impact ARP/ND Performance 3.1. Number of Hosts Every host on the network that attempts to send/receive traffic will produce some base level of ARP/ND traffic. The overall amount of ARP/ND traffic on the network will vary with the number of hosts. In the case of ARP, all address resolution request messages are broadcast and these will be received and processed by all nodes on the network. In the case of ND, address resolution messages are sent via multicast and therefore may have a lower overall impact on the network even though the number of messages exchanged is the same. 3.2. Traffic Patterns The traffic pattern can have a significant impact on the level of ARP/ND traffic in the network. Therefore we would expect ARP/ND traffic pattern to vary significantly based on the data center design as well as the application mix. The traffic mix determines how many other nodes a given node needs to communicate with and how frequently. Both of these directly influence address discovery traffic on the network. 3.3. Network Events Several specific network events can have a significant impact on ARP/ND traffic. One example of such an event is machine failure. If a host that is frequently accessed fails, it could result in much higher ARP/ND traffic as other hosts in the network continue to try to reach it by repeatedly sending out additional address resolution messages. Another example is Virtual Machine migration. If a VM is migrated to a system on a different switch, VLAN, or even geographically different data center, it can cause a significant shift in overall traffic patterns as well as ARP/ND traffic. Another particularly well-known network event that causes address resolution traffic spikes is a network scan. In a network scan, one or more hosts internal or external to the edge network attempt to connect to a large number of internal hosts in a very short period of time. This results in a sudden increase in the amount of address resolution traffic in the network. 3.4. Address Resolution Implementations As with any other protocol, the activity of address resolution protocols such as ARP/ND can vary significantly with specific implementations as well as the default settings for various protocol parameters. ARP cache timeout is a common parameter that has a Karir Expires January 10, 2012 [Page 4] Internet-Draft ARMD Statistics July 10, 2011 direct impact on the amount of address resolution traffic. Older versions of Microsoft Windows would use a default value of 2 minutes for this parameter, however Windows Vista and Windows 2008 implementations changed this to be a random value between 15 seconds and 45 seconds. This parameter defaults to 60 seconds for Linux and 20 minutes for FreeBSD. The default value for Cisco routers and switches is 4 hours. For ND, one relevant parameter is the prefix stale time, which determines when old entries can be aged out. This value is 30 days for Cisco, and 60 seconds for Linux. The overall address resolution traffic in a data center will vary based on the mix of various ARP implementations that are present. 3.5. Layer 2 Network Topology The layer 2 network topology within a data center can also influence the impact of various address resolution protocols. While ARP traffic is broadcast and must be processed by all nodes within that broadcast domain, a well designed layer 2 topology can limit the size of the broadcast domain and the amount of address resolution traffic. ND traffic on the other hand is multicast and might potentially increase the load on the directly connected layer 2 switch if the traffic pattern spans across broadcast domains. 4. Experiments and Measurements 4.1. Experiment Architecture In an attempt to quantify address resolution issues in a data center environment we have run experiments in our own data center, which is used for production services. We were able to leverage unused capacity for our experiments. The data center topology is fairly simple. There are a pair of redundant access switches which pass traffic to and from the data center. These switches connect to the top of the rack switches which in turn connect to blade switches in our Dell blade chassis. The entire hardware platform is managed via VMware's vCloud Director. In total we have access to 8 blades of resources on a single chassis, which is roughly 3TB of disk, 200GB of RAM and 100GHz of CPU. The network available to us is a /22 network block of IPv4 space and a /64 of IPv6 address space in a flat topology. Using this resource pool we create a 500-node testbed based on Centos 5.5. We use custom command and control software that allows us to control these nodes for our experiments. This allows us to issue commands to all nodes to start/stop services and traffic generation scripts. We also use a custom traffic generator agent in Karir Expires January 10, 2012 [Page 5] Internet-Draft ARMD Statistics July 10, 2011 order to generate both internal and external traffic via wget commands to various hosts. The command and control software uses UDP broadcast messages for communication so that no additional address resolution messages are generated that might affect our measurements. Each of the 500 nodes is given a list of other nodes that it must contact at the beginning of an experiment. This is used to affect the traffic patterns for a given experiment. In addition each experiment determines traffic rate by specifying the inter-communication delay between attempts to contact other nodes. The shorter the duration the more the traffic that will be generated. The nodes all run dual IPv4/IPv6 stacks. A packet tap attached to a monitor port on the access switch allows us to monitor the arrival rate of ARP and ND requests and replies. We also monitor the CPU load on the access switch at two-second intervals via SNMP queries [STUDY]. Figure 1. shows our experimental setup. Karir Expires January 10, 2012 [Page 6] Internet-Draft ARMD Statistics July 10, 2011 External External | | | | | | | | +---+---------+ +---------+---+ +------------+ | Data_Agg_1 | | Data_Agg_2 | | Packet |_____| Cisco | | Cisco | | Tap | | Catalyst | | Catalyst | +------------+ | 4900M | | 4900M | +---+----+---++ +---+---+--+--+ | | \ | | | | | \ | | | / \ \ | | |_______ / \ \ | |_______ | / \ \____|___________|_ | ________________/ \_________|__________ | || | | || || +-----|-------------Dell Enclosure 1----------|--------+ .. .. |+----+-----+ +----------+ +----------+ +----------+| .. .. || Cisco |__| Cisco |__| Cisco |__| Cisco || .. .. || Catalyst | | Catalyst | | Catalyst | | Catalyst || || 3130 | | 3130 | | 3130 | | 3130 || |+-++++++++-+ +-++++++++-+ +-++++++++-+ +-++++++++-+| | |||||||| |||||||| |||||||| |||||||| | |1-+||||||+-8 1-+||||||+-8 1-+||||||+-8 1-+||||||+-8| | 2-+||||+-7 2-+||||+-7 2-+||||+-7 2-+||||+-7 | | 3-+||+-6 3-+||+-6 3-+||+-6 3-+||+-6 | .. .. | 4-++-5 4-++-5 4-++-5 4-++-5 | .. .. +------------------------------------------------------+ .. .. +------+_________________________|| || | En.2 |__________________________| || +------+ || +------+____________________________|| | En.3 |_____________________________| +------+ Karir Expires January 10, 2012 [Page 7] Internet-Draft ARMD Statistics July 10, 2011 4.2. Impact of Number of Hosts One of the most simple experiments is to determine the overall baseline load that is generated on a given network segment when a varying number of hosts are active. While the absolute numbers might vary on a large number of factors, what we are interested in here is how the traffic scales as different numbers of hosts are brought online given all other factors being held constant. Our experiment therefore simply changes the number of active hosts in our experiment setup from one run to the next and we measure address resolution traffic on the network. The number of hosts is increased from 100 to 500 in steps of 100. The results indicate that address resolution traffic scales in a linear fashion with the number of hosts in the network. This linear scaling applies both to ARP as well as ND traffic though raw ARP traffic rate was considerably higher than ND traffic rate. For our parameters the rate varied from 100 to 250pps of ARP traffic and from 25pps to 200pps for ND traffic. There is a clear spike in CPU load on the access switch in the beginning of each experiment, which can reach almost 40 percent. We were not able to discern any increase in this spike across experiments. 4.3. Impact of Traffic Patterns Traffic patterns can have a significant impact on the amount of address resolution traffic in the network. In order to study this in detail we constructed two distinct experiments, the first of which simply increased the rate at which nodes were attempting to communicate with each other, while the second experiment controlled the number of active versus inactive nodes in the traffic exchange matrix. The first experiment uses all 500 nodes in our experiment and increases the traffic load for each run by reducing the wait time between communication events. The wait time is reduced from 50 seconds to 1 second over a series of 6 runs by roughly halving the duration for each run. All other parameters remain the same across experiment runs. Therefore the only factor we are varying is the total number of nodes a single node will attempt to communicate within a given interval of time. Once again we observe a linear scaling in ARP traffic volumes ranging from 200pps for the slowest experiment to almost 1800pps for the most aggressive experiment. The linear trend also holds for ND traffic, which increases from 50pps to 1400pps across different runs. Karir Expires January 10, 2012 [Page 8] Internet-Draft ARMD Statistics July 10, 2011 The goal of the second experiment is to determine the impact of active versus inactive hosts in the network. An inactive host in this context means one for which an IP address has been assigned, but there is nothing at that address so that ARP requests and all other packets are ignored. All 500 hosts are involved in traffic initiation. The pool of targets for this traffic starts out being the same 500 hosts that are initiating. In subsequent runs we vary the ratio of active to inactive target hosts, from 500/0 to 400/100 in steps of 100. This experiment showed roughly a 60% increase (220-360 pps) in traffic for the IPv4 (ARP) case and about an 80% increase (160-290 pps) for the IPv6 case. In a slight variation on the second experiment all 500 nodes attempt to contact all other hosts plus an additional varying number of inactive hosts in steps of 100 up to a maximum of 400. In this experiment we see a slight linear increase as the total number of nodes in the traffic matrix increases for both ARP and ND. We ran these experiments for IPv4 only, IPv6 only, and simultaneous IPv4 and IPv6. ARP and ND traffic seemed to be independent of each other. That is, the ARP and ND traffic rates and switch CPU load depend on the presented traffic load, not on the presence of other traffic on the network. One final experiment attempted to determine what the maximum additional load of ARP/ND traffic might be in our setup. For this purpose we configured our experiment to use all 500 nodes to communicate with all 500 other nodes one at a time as fast as possible. We were able to observe ARP traffic peak of up to 4000pps and a maximum CPU load of 65% on the access switch. 4.4. Impact of Network Events Network scanning is commonly understood to cause significant address resolution activity on the edge of the network. Using our experimental setup we attempted to repeatedly scan our network both from the outside as well as within. In each case we were able to generate ARP traffic spikes of up to 1400pps and ND traffic spikes of 1000pps. These are also accompanied by a corresponding spike in CPU load at the access switch. Node failures in a network also have the ability to significantly impact address resolution traffic. This effect depends on the particular traffic patter and the number of other hosts that are attempting to communicate with the failed node. All nodes will repeatedly attempt to perform address resolution for the failed node and this can lead to significant increase in ARP/ND traffic. We are Karir Expires January 10, 2012 [Page 9] Internet-Draft ARMD Statistics July 10, 2011 able to show this via a simple experiment that creates 400 active nodes which all attempt to communicate with nodes in a separate group of 80 nodes. For each experiment run we then shutdown hosts in the target group of 80 nodes in batches of 10 each. We are able to demonstrate that ARP traffic actually increases in this scenario from an overall rate of 200pps to 300pps. Another network event that might result in significant changes in address resolution traffic is the migration of VMs in a data center. We attempted to replicate this scenario in our somewhat limited environment by placing one of our 8 blades in maintenance mode, which forced all 36 VMs on that blade to migrate to other blades. However, as our entire experimental infrastructure is located within a single rack we do not notice any changes in ARP traffic during this event. Many hypervisors remove the problem of virtual machine migration by assigning a MAC address to a VM, and then a kernel switching module handles all address resolution, accepting and sending packets for all the MAC addresses of its virtual machines through a determined host interface. In other words, the hypervisor responds to the appropriate traffic for the VMs it contains. It behaves as a router for the Layer 2 traffic it is exposed to. 4.5. Implementation Issues Protocol implementations and default parameter values can also have a significant impact on the behavior of address resolution traffic in the network. Parameters such as cache timeout values in particular determine when cached entries are removed or need to be accessed to ensure they are not stale. Though these parameters are unlikely to be modified the variation in these for different systems can impact ARP/ND traffic when different systems are present on a given network in varying numbers. Our experimental setup did not explore this issue of mixed environments or sensitivity of ARP/ND traffic to the various protocols parameters. 4.6. Experiment Limitations Our experimental environment though fairly typical in the hardware and software aspects probably only represents a very limited small data center configuration. It is difficult to thoroughly instrument very large environments and even smaller experimental environments in a lab might not be very representative. We believe our architecture is fairly representative and provides us with useful insights regarding the scale and trends of address resolution traffic in a data center. Karir Expires January 10, 2012 [Page 10] Internet-Draft ARMD Statistics July 10, 2011 One very significant limitation that we came across in our experiments was the problems of using all 500 nodes in a high load scenario. When all 500 nodes were active simultaneously our architecture would run into a bottleneck while accessing disk storage. This limitation also prevents us from attempting to scale our experiments for more than 500 nodes. This also limited us in what experiments we could run at the maximum possible load. Our experimental testbed shared infrastructure, including network access switches, with production equipment. This limited our ability to stress the network to failure, and our ability to try changes in switch configuration. 5. Scaling Up: Emulating Address Resolution Behavior on Larger Scales Based on the data collected from our experiments we have built an ARP/ND traffic emulator that has the ability to generate varying amounts of address resolution traffic on a network with varying address ranges. This gives us the ability to scale beyond 500 VM nodes in our experiments. Our software emulator can be used to directly test the impact of such traffic on nodes and switches in the network at much larger scales. Preliminary results show a good match between the testbed and the emulator for both traffic rates and switch load over a wide range of presented traffic load. We have calibrated the emulator from the testbed data and will use the emulator to run experiments at scales that would otherwise be impractical in the real network available to us. 6. Conclusion and Recommendation In this document we have described some of our experiments in determining the actual amount of address resolution traffic on the network under a variety of conditions for a simple small data center topology. We are able to show that ARP/ND traffic scales linearly with the number of hosts in the network as well as the traffic interconnection matrix. In addition we also study the impact of network events such as scanning, machine failure and VM migrations on address resolution traffic. We were able to show that even in a small data center with only 8 blades and 500 virtual hosts, ARP/ND traffic can reach rates of thousands of packets per second, and switch CPU loads can reach 65% or more. We are able to utilize the data from our experiments to build a software based ARP/ND traffic emulation engine that has the ability to generate address resolution traffic at even larger scales. The Karir Expires January 10, 2012 [Page 11] Internet-Draft ARMD Statistics July 10, 2011 goal of this emulation engine is to allow us to study the impact of this traffic on the network for large data centers. 7. Manageability Considerations This document does not add additional manageability considerations. 8. Security Considerations This document has no additional requirement for security. 9. IANA Considerations None. 10. Acknowledgments We want to acknowledge the following people for their valuable discussions related to this draft: Igor Gashinsky, Kyle Creyts, Warren Kumari. This document was prepared using 2-Word-v2.0.template.dot. 11. References [ARP] D.C. Plummer, "An Ethernet address resolution protocol." RFC826, Nov 1982. [ND] T. Narten, E. Nordmark, W. Simpson, H. Soliman, "Neighbor Discovery for IP version 6 (IPv6)." RFC4861, Sept 2007. [STUDY] Rees, J., Karir, M., "ARP Traffic Study." MANOG52, June 2011. URL http://www.nanog.org/meetings/nanog52/presentations/Tuesda y/Karir-4-ARP-Study-Merit Network.pdf Authors' Addresses Manish Karir Merit Network Inc. 1000 Oakbrook Dr, Suite 200 Ann Arbor, MI 48104, USA Phone: 734-527-5750 Email: mkarir@merit.edu Karir Expires January 10, 2012 [Page 12] Internet-Draft ARMD Statistics July 10, 2011 Jim Rees Merit Network Inc. 100 Oakbrook Dr, Suite 200 Ann Arbor, MI 48104, USA Phone: 734-527-5751 Email: rees@merit.edu Intellectual Property Statement The IETF Trust takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in any IETF Document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Copies of Intellectual Property disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement any standard or specification contained in an IETF Document. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity All IETF Documents and the information contained therein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION THEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. Karir Expires January 10, 2012 [Page 13]