Network Working Group E. Marocco Internet-Draft Telecom Italia Intended status: Informational V. Gurbani Expires: May 6, 2009 Bell Laboratories, Alcatel-Lucent November 2, 2008 Application-Layer Traffic Optimization (ALTO) Problem Statement draft-marocco-alto-problem-statement-03 Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on May 6, 2009. Copyright Notice Copyright (C) The IETF Trust (2008). Abstract A significant part of the Internet traffic today is generated by peer-to-peer applications used, for example, for file sharing, realtime communications and live media streaming. Such applications often deal with large amounts of data in direct peer-to-peer connections, but they usually have little knowledge of the underlying network topology. As a result, they may choose their peers based on measurements and statistics which, in some situations, may lead to Marocco & Gurbani Expires May 6, 2009 [Page 1] Internet-Draft ALTO Problem Statement November 2008 suboptimal choices. This document describes problem related to optimizing traffic generated by peer-to-peer applications through the use of network-layer information, provides a representative set of use cases that may exhibit this problem, and outlines considerations that have to be taken in account when arriving at equitable solutions. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Research or Engineering? . . . . . . . . . . . . . . . . . 4 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. The Problem . . . . . . . . . . . . . . . . . . . . . . . . . 6 4. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.1. File sharing . . . . . . . . . . . . . . . . . . . . . . . 7 4.2. Cache/Mirror Selection . . . . . . . . . . . . . . . . . . 7 4.3. Live Media Streaming . . . . . . . . . . . . . . . . . . . 8 4.4. Realtime Communications . . . . . . . . . . . . . . . . . 8 4.5. Distributed Hash Tables . . . . . . . . . . . . . . . . . 8 5. Solution Considerations . . . . . . . . . . . . . . . . . . . 8 5.1. ALTO Service Providers . . . . . . . . . . . . . . . . . . 8 5.2. Discovery of ALTO servers . . . . . . . . . . . . . . . . 9 5.3. User Privacy . . . . . . . . . . . . . . . . . . . . . . . 9 5.4. Topology Hiding . . . . . . . . . . . . . . . . . . . . . 9 5.5. Coexistence with Caching . . . . . . . . . . . . . . . . . 9 6. Security Considerations . . . . . . . . . . . . . . . . . . . 10 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 11 8. Informative References . . . . . . . . . . . . . . . . . . . . 11 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 12 Intellectual Property and Copyright Statements . . . . . . . . . . 13 Marocco & Gurbani Expires May 6, 2009 [Page 2] Internet-Draft ALTO Problem Statement November 2008 1. Introduction A significant part of the Internet traffic today is generated by peer-to-peer (P2P) applications used, for example, for file sharing, realtime communications and live media streaming [WWW.cachelogic.picture] [WWW.wired.fuel]. Different from the client/server architecture, P2P applications access resources (e.g. files or media relays) distributed across the Internet and exchange large amounts of data in connections that they establish directly with nodes sharing such resources. One advantage of P2P systems arises from the fact that the resources such systems offer are often made available through multiple replicas. Yet applications generally do not have reliable information of the underlying network and thus have to select among available instances based on information they deduce from empirical measurements which, in some situations, lead to suboptimal choices. For example, popular metrics based on round trip time estimation sometimes used for initial sources selection (i.e. before actual data transmissions begin, when goodput values are unknown) perform quite badly for file sharing applications as they tend to ignore bandwidth and reliability of underlying links, which have much more influence than delay on file transfers. Many of the existing P2P systems are based on an overlay network consisting of direct connections peers establish among themselves; such connections, obviously, do not account for the underlying network topology. In addition to simply achieving suboptimal performance, such networks can lead to congestions and cause serious inefficiencies. As shown in [ACM.fear], traffic generated by popular P2P applications often cross network boundaries multiple times, overloading links which are frequently subject to congestion [ACM.bottleneck]. Recent studies [ACM.ispp2p] [WWW.p4p.overview] [ACM.ono] have shown that if Internet Service Providers (ISP), network operators or third parties in general provide reliable network information such as topology and/or bandwidth to P2P applications, it would be possible to greatly increase application performance, reduce congestion and optimize the overall traffic across different networks. This document gives the problem statement of optimizing traffic generated by P2P applications using information provided by a separate party. The rest of this document is structured as follows. Section 3 introduces the problem more formally. Section 4 describes some use cases where both P2P applications and network operators would benefit from a solution to such a problem. Section 5 describes the main Marocco & Gurbani Expires May 6, 2009 [Page 3] Internet-Draft ALTO Problem Statement November 2008 issues to consider when designing such a solution. 1.1. Research or Engineering? At the time of writing, several solutions have been proposed to address the problem described in this document, both inside and outside the IETF accompanied by encouraging simulation and field test results [I-D.bonaventure-informed-path-selection] [ACM.ispp2p] [WWW.p4p.overview]. Such solutions have been proposed independently, but all consists of two essential parts: o a discovery mechanism which can be used by a P2P application to find a reliable information source; o a protocol used by P2P applications to query such sources in order to retrieve the information needed to perform better-than-random selection of the endpoints providing a desired resource. It is not easy to foresee how such solutions would perform in the Internet, but a more accurate evaluation would require representative data collected from real systems by a critical mass of users. However, wide adoption will probably never happen without an agreement on a common solution based on open standard. 2. Definitions The following terms have special meaning in the definition of the Application-Layer Traffic Optimization (ALTO) problem. Application: A distributed communication system (e.g., file sharing) that uses the ALTO service to improve its performance (or quality of experience) while reducing resource consumption in the underlying network infrastructure. Applications may use the P2P model to organize themselves, or they can be simple client-server based, or even a hybrid of both. Peer: A specific participant in an application. Colloquially, a peer refers to a participant in a P2P network or system, and this definition does not violate that assumption. If the application is based on a client-server or hybrid model, then the usage of the terms "client" and "server" imparts enough context for dis- ambiguity. Resource: A piece of content (e.g. a file or a chunk of a file) or a server process (e.g. for relaying a media stream or for perfoming a computation) which can be accessed by applications. In the ALTO context a resource is often available in several equivalent replicas, shared by different peers. Marocco & Gurbani Expires May 6, 2009 [Page 4] Internet-Draft ALTO Problem Statement November 2008 Resource Identifier: An application layer identifier used to identify a resource, no matter how many replicas thereof exist. Resource Provider: For P2P applications, a resource provider is a specific peer that provides some resources. For client-server or hybrid applications, a provider is a server that hosts a resource. Resource Consumer: For P2P applications, a resource consumer is a specific peer that needs to access resources. For client-server or hybrid applications, a consumer is a client that needs to access resources. Transport Address: All address information that is needed by a resource consumer to access the desired resource at a specific resource provider. This information usually consists of the resource provider's IP address, and it may include other information, such as a transport protocol identifier or port numbers. Overlay Network: A virtual network consisting of direct connections on top of another network, established by a group of peers. This logical structure, which can be used to implement distributed applications, may benefit from guidance from the ALTO service. Resource Directory: An entity which is separate from the resource consumer, and which assists a resource consumer to identify a set of resource providers. In P2P applications, the resource directory may be referred to as a P2P tracker. Some applications do not use this concept and do the address mapping directly in the resource consumer. Host Location Attribute: Information which is related to the location of a host in the network topology. The ALTO service gives recommendations based on this information. A host location attribute may consist, for example, of an IP address, an address prefix or address range that contains the host, an autonomous system (AS) number, or any other localization attribute. These different options may provide different levels of detail. Depending on the system architecture, this may have implications on the quality of the recommendations ALTO is able to provide, on whether recommendations can be aggregated, and on how much privacy-sensitive information about users might be disclosed to additional parties. ALTO Service: If several resource providers are able to provide the same resource, the ALTO service gives guidance to a resource consumer or resource directory, on which resource provider(s) to select, in order to optimize its performance (or quality of experience) while minimizing resource consumption in the underlying network infrastructure. ALTO Server: A logical entity that provides interfaces that can be used to query the ALTO service. Marocco & Gurbani Expires May 6, 2009 [Page 5] Internet-Draft ALTO Problem Statement November 2008 ALTO Client: The logical entity that sends ALTO queries. Depending on the architecture of the application it may be embedded in the resource consumer or in the resource directory. ALTO Query: A message sent from an ALTO client to an ALTO server, which requests guidance from the ALTO Service. ALTO Reply: A message sent from an ALTO server to an ALTO client, which contains guiding information from the ALTO service. ALTO Transaction: An ALTO transaction consists of an ALTO query and the corresponding ALTO reply. Local Traffic: Internet traffic which stays within the network infrastructure of one Internet Service Providers (ISP). This type of traffic usually causes the least costs for the ISP. Peering Traffic: Internet traffic exhcanged by two Internet Service Providers whose networks are directly connected. Apart from infrastructure and operational costs, peering traffic is usually free, within the contract of a peering agreement. Transit Traffic: Internet traffic exchanged on the basis of economic agreements between Internet Service Providers (ISP). An ISP generally pays a transit provider for the delivery of traffic flowing between its network and networks that are not directly connected. 3. The Problem Network engineers have been facing the problem of traffic optimization for a long time now and have already designed mechanisms like MPLS [RFC3031] and DiffServ [RFC3260] to deal with it. The problem they address consists in finding (or setting) optimal routes for packets traveling between specific source and destination addresses and based on requirements such as low latency, high reliability, and priority. Such solutions are usually implemented at the link and network layers, and tend to be almost transparent. At best, applications can only "mark" the traffic they generate with the corresponding properties. However, P2P applications that are today posing serious challenges to Internet infrastructures, do not benefit much from the above techniques and "cooperating" with external services aware of the network topology could greatly optimize the traffic they generate. In fact, when a P2P application needs to establish a connection, the logical target is not a host, but rather a resource (e.g. a file or a media relay) generally available in multiple instances on different peers; selection of the closest one -- or, in general, the best from an overlay topological proximity -- has much more impact on the overall traffic than the route followed by its packets to reach the endpoint. Marocco & Gurbani Expires May 6, 2009 [Page 6] Internet-Draft ALTO Problem Statement November 2008 Optimization of the peer selection is particularly important in the initial phase of the process. Consider a P2P protocol such as BitTorrent, where a querying peer receives a list of candidate destinations where a resource resides. From this list, the peer will derive a smaller set of candidates to connect to and exchange information with. In another example, a streaming video client may be provided with a list of destinations from which it can download content from. In both cases, the use of topology information in an early stage will allow applications to improve their performance and will help ISPs make a better use of their network resources (in particular, reducing the transit traffic on interdomain links). Addressing the Application-Layer Traffic Optimization (ALTO) problem means, on the one hand, deploying an ALTO service to provide applications with information regarding the underlying network and, on the other hand, enhancing applications in order to use such information to perform better-than-random selection of the endpoints they establish connections with. 4. Use Cases 4.1. File sharing File sharing applications allow users to search for content shared by other users and download it. Typically, search results consist of many instances of the same file (or chunk of a file) available from multiple sources; the goal of an ALTO solution would be to help peers find the best ones according to the underlying networks. On the application side, integration of ALTO functionalities may happen at different levels. For example, while in the completely decentralized Gnutella network selection of the best sources is totally up to the user, in systems like BitTorrent and eDonkey, central elements (i.e. trackers or servers) act as mediators. Therefore, in the former case, optimization would require modification in the applications, while in the latter it could just be implemented in some central elements. 4.2. Cache/Mirror Selection Providers of popular content like media and software repositories usually resort to geographically distributed caches and mirrors for load balancing. Selection of the proper mirror/cache for a given user is today based on inaccurate geolocation data, on proprietary network location systems or often delegated to the user himself. An ALTO solution could be easily adopted to ease such a selection in an automated way. Marocco & Gurbani Expires May 6, 2009 [Page 7] Internet-Draft ALTO Problem Statement November 2008 4.3. Live Media Streaming P2P applications for live streaming allow users to receive multimedia content produced by one source and targeted to multiple destinations, in a realtime or near-realtime way without recurring to multicast. Such applications typically participate in the distribution of the content, acting as both receivers and senders; the goal of an ALTO solution would be to help peers to find the best sources and the best destinations for media flows they receive and relay. 4.4. Realtime Communications P2P realtime communications allow users to establish direct media flows, usually to place audio and video calls, or to have text chats. In the basic case, media would flow directly between the two endpoints; however, in the general case, a significant portion of communications between users with limited access to the Internet (e.g. users behind NATs, firewalls or HTTP proxies) need to be relayed by other elements. Such media relays are distributed over the Internet -- in some cases co-located with applications with a public address; the goal of an ALTO solution would be to help peers to find the best relays. 4.5. Distributed Hash Tables Distributed hash tables (DHT) are a class of overlay algorithms used to implement lookup functionalities in popular P2P systems, without recurring to centralized elements. In such systems, peers maintain addresses of other peers participating in the same DHT in a routing table, sorted according to specific criteria. An ALTO solution would provide valuable information for DHT algorithms which, in order to reduce path latency of distributed queries, include round trip time estimations among such criteria [SIGCOMM.resprox]. 5. Solution Considerations This section introduces some aspects to keep in consideration when designing an ALTO service to provide applications with information they can use to perform better-than-random peer selection. 5.1. ALTO Service Providers ALTO services can be provided by at least three different kinds of entity: 1. Network operators: usually have full knowledge of the network they administer and are aware of the topology and policies that transit and peering traffic are subject to; Marocco & Gurbani Expires May 6, 2009 [Page 8] Internet-Draft ALTO Problem Statement November 2008 2. Third parties: entities different from the network operators, but which may have collected network information. Examples of such entities are content delivery networks (like Akamai) which control wide and highly distributed infrastructures, or companies providing an ALTO service on behalf of ISPs (and thus acquire the information from the ISPs themselves); 3. User communities: running distributed algorithms, for example for estimating the topology of the Internet. 5.2. Discovery of ALTO servers As a direct consequence of the totally decentralized architecture of the Internet, it seems almost impossible to centralize all information P2P applications may need to optimize traffic they generate. Therefore, any solution for the ALTO problem will need to specify a mechanism for applications to find a proper ALTO server to query. It is important to note that, depending on the implementation of the ALTO service, an ALTO server could be a centralized entity for example deployed by the network operator as well as a volatile node participating in a distributed algorithm. 5.3. User Privacy Information provided by the ALTO client querying the ALTO server could help increase the level of accuracy in the replies. For example, if the querying client indicates what kind of application it is using (e.g. realtime communications or bulk data transfer), the server will be able to indicate priorities in its replies accomodating the requirements of the traffic the application will generate. However, it is important that for using an ALTO service the application does not have to disclose information it may consider sensible. 5.4. Topology Hiding Operators can play an important role in addressing the ALTO problem, but they generally consider network information they own to be confidential; therefore, in order to succeed and achieve wide adoption, any solution should provide a method to help P2P applications in peer selection without explicitly disclosing topology of the underlying network. 5.5. Coexistence with Caching A common approach to optimizing traffic generated by applications which require large data transfers is based on caching techniques. Marocco & Gurbani Expires May 6, 2009 [Page 9] Internet-Draft ALTO Problem Statement November 2008 In some cases, such techniques have proven to be extremely effective in both enhancing user experience and saving network resources; however, they have two main limits in respect to the solutions based on provision of topology information: 1. Application specificity: since a cache is meant to replace the source of the content being accessed -- either explicitly or transparently -- it must be able to speak the same protocol with the querying peer. For this reason, caching solutions can be reasonably adopted only for most popular applications (e.g. HTTP and BitTorrent). 2. Content awareness: since caches need to actually store the content being delivered, they are subject to legal threats whenever the user does not have the right to access or distribute such content. This limitation makes caching approaches unusable in today's popular file sharing systems. In general, solutions based on provision of topology information need not to interfere with caching; to the contrary, if ALTO service used by applications is aware of the presence of chaches, it can point them out in its replies with higher priorities and thus achieve greater optimization. 6. Security Considerations The approach proposed in this document requires P2P applications to delegate a portion of their routing capability to third parties, giving them a significant role in systems where that would be otherwise excluded. In the case where an ALTO solution is deployed by the network operator, it is conceivable that the P2P community would consider it hostile because the operator could, for example: o redirect applications to corrupted mediators providing malicious content; o track connections to perform content inspection; o apply policies based on criteria other than network efficiency (for example, to avoid peering points regulated by inconvenient economic agreements). However, ALTO is completely optional for P2P applications and its purpose is to help improve performance of such applications. If, for some reason, it fails to achieve this purpose, it would simply fail to gain popularity and would not be used. Even in cases where the ALTO service provider would decide to maliciously alter results returned by queries only after the solution has gained popularity (i.e. it behaves for a while to become popular Marocco & Gurbani Expires May 6, 2009 [Page 10] Internet-Draft ALTO Problem Statement November 2008 and then starts misbehaving), it would be fairly easy for P2P application maintainers and users to revert to solutions that are not using it. After all, it would all come down to change some application settings in cases where the protocol is implemented inside the client and upgrading centralized elements for architectures like BitTorrent and eDonkey. 7. Acknowledgments Vinay Aggarwal and the P4P working group conducted the research work done outside the IETF. Emil Ivov, Rohan Mahy, Anthony Bryan, Stanislav Shalunov, Laird Popkin, Stefano Previdi, Reinaldo Penno, Dimitri Papadimitriou, Sebastian Kiesel, and many others provided insightful discussions, specific comments and much needed corrections. Thanks in particular to Richard Yang for several reviews. 8. Informative References [ACM.bottleneck] Akella, A., Seshan, S., and A. Shaikh, "An Empirical Evaluation of WideArea Internet Bottlenecks", Proceedings of ACM SIGCOMM, October 2003. [ACM.fear] Karagiannis, T., Rodriguez, P., and K. Papagiannaki, "Should ISPs fear Peer-Assisted Content Distribution?", In ACM USENIX IMC, Berkeley 2005. [ACM.ispp2p] Aggarwal, V., Feldmann, A., and C. Scheideler, "Can ISPs and P2P systems co-operate for improved performance?", In ACM SIGCOMM Computer Communications Review (CCR), 37:3, pp. 29-40. [ACM.ono] Choffnes, D. and F. Bustamante, "Taming the Torrent: A practical approach to reducing cross-ISP traffic in P2P systems", Proceedings of ACM SIGCOMM, August 2008. [I-D.bonaventure-informed-path-selection] Saucez, D. and B. Donnet, "The case for an informed path selection service", draft-bonaventure-informed-path-selection-00 (work in progress), February 2008. Marocco & Gurbani Expires May 6, 2009 [Page 11] Internet-Draft ALTO Problem Statement November 2008 [RFC3031] Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol Label Switching Architecture", RFC 3031, January 2001. [RFC3260] Grossman, D., "New Terminology and Clarifications for Diffserv", RFC 3260, April 2002. [SIGCOMM.resprox] Gummadi, K., Gummadi, R., Ratnasamy, S., Gribble, S., Shenker, S., and I. Stoica, "The impact of DHT routing geometry on resilience and proximity", Proceedings of ACM SIGCOMM, August 2003. [WWW.cachelogic.picture] Parker, A., "The true picture of peer-to-peer filesharing", . [WWW.p4p.overview] Xie, H., Krishnamurthy, A., Silberschatz, A., and R. Yang, "P4P: Explicit Communications for Cooperative Control Between P2P and Network Providers", . [WWW.wired.fuel] Glasner, J., "P2P fuels global bandwidth binge", . Authors' Addresses Enrico Marocco Telecom Italia Via G. Reiss Romoli, 274 Turin 10148 Italy Email: enrico.marocco@telecomitalia.it Vijay K. Gurbani Bell Laboratories, Alcatel-Lucent 1960 Lucent Lane Naperville, IL 60566 USA Email: vkg@alcatel-lucent.com Marocco & Gurbani Expires May 6, 2009 [Page 12] Internet-Draft ALTO Problem Statement November 2008 Full Copyright Statement Copyright (C) The IETF Trust (2008). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Intellectual Property The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Acknowledgment Funding for the RFC Editor function is provided by the IETF Administrative Support Activity (IASA). Marocco & Gurbani Expires May 6, 2009 [Page 13]