Network Working Group Young Lee (Huawei) Internet Draft Dave McDysan (Verizon) Intended Status: Informational Ning So (UTD) Greg Bernstein (Grotto) Tae Yeon Kim (ETRI) Kohei Shiomoto (NTT) Oscar Gonzalez de Dios (Telefonica) October 20, 2010 Problem Statement for Network Stratum Query draft-lee-network-stratum-query-problem-01.txt Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on April 20, 2011. Copyright Notice Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of Lee, et al. Expires April 20, 2011 [Page 1] Internet-Draft Network Stratum Query October 2010 publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Abstract This document describes the general problem of network stratum query for application optimization. Network Stratum query is an ability to query the network from an application controller such as those used in Data Centers so that application controller decisions such as server assignment or virtual machine instantiation/migration can be performed with better knowledge of the underlying network conditions. As application servers are distributed geographically across Data Centers, many application-related decisions such as which server to assign a new client or where to instantiate/migrate virtual machines will suffer from sub-optimality unless the underlying network conditions are factored in the decision process. The lack of network awareness may result in not meeting the end-user service objective for some key applications like video gaming/conferencing that require stringent latency and bandwidth requirement. Table of Contents 1. Introduction...................................................3 2. Network and Application Contexts...............................4 3. Problem Statement..............................................7 3.1. Limitation of existing probing schemes....................7 3.2. Lack of vertical query schemes............................8 3.3. Limitation of SNMP MIB network monitoring techniques......8 3.4. Lack of abstraction mechanisms............................8 4. High-level requirements........................................9 4.1. Application Profile.......................................9 4.2. Network Load Data to be queried..........................10 4.3. A Whole Network Query capability.........................10 4.4. Data Synchronization Mechanism...........................10 4.5. Responses to NS Query from network to application........11 5. Security Considerations.......................................11 6. IANA Considerations...........................................11 7. References....................................................12 7.1. Informative References...................................12 Author's Addresses...............................................13 Intellectual Property Statement..................................13 Disclaimer of Validity...........................................14 Lee Expires April 20, 2011 [Page 2] Internet-Draft Network Stratum Query October 2010 1. Introduction Cross Stratum Optimization is a joint optimization effort in allocating resources to end-users that involves both the Application Stratum and Network Stratum. The application stratum is the functional block which manages and controls application resources and provides application resources to a variety of clients/end-users. Application resources are non-network resources critical to achieving the application service functionality. Examples include: application specific servers, storage, content, large data sets, and computing power. Data Centers are regarded as a tangible realization of the application stratum architecture. The network stratum is the functional block which manages and controls network resources and provides transport of data between clients/end-users to and among application resources. Network Resources are resources of layer 3 or below (L1/L2/L3) such as bandwidth, links, paths, path processing (creation, deletion, and management), network databases, path computation, admission control, and resource reservation capability. Application services by their very nature utilize application resources (e.g., servers, storage, memory, etc...) in Data Centers, and the underlying network resources provided by LANs, MANs, and carrier's transport networks. By "data center" is any location in the network where applications resources, such as servers or storage, are aggregated. We include in this definition extremely large data centers with 10,000 or more servers, all the way down to small points of presence co-located in network carrier facilities. As the application servers are distributed geographically across many Data Centers, decisions such as server assignment or new virtual machine instantiation/migration will suffer from sub-optimality unless the underlying network conditions are factored in the decision process. The lack of network awareness may result in not meeting the end-user service objective for some key applications like video gaming/conferencing that require stringent latency and bandwidth requirement. This document describes the general problem of network stratum query (NS Query) in Data Center environments. Network Stratum query is an ability to query the network from application controllers in Data Centers so that application server assignment or virtual machine instantiation/migration decision would be jointly performed based on both the application resource/load status and the network resource/load status. Lee Expires April 20, 2011 [Page 3] Internet-Draft Network Stratum Query October 2010 The NS query is different from what is called "horizontal network queries" performed as part of network management. These horizontal queries are carried out by an entity (network management systems) within the network and would have fairly complete access to network information. NS Query can be thought of as a two-stage query that consists of: . First Stage: A vertical query from an application entity (i.e., the Application Control Gateway (ACG) in Data Center) to an entity representing the network (i.e., the Network Control Gateway (NCG))for highly summarized and abstracted network related information; and . Second Stage: Internal "horizontal queries" at the network work layer along with summarization and abstraction of the network information in a form that preserves network confidentiality and significantly reduces the amount of information that needs to be transferred. The raw information needed to perform this summarization/abstraction is defined in existing and emerging network management standards and protocols (SNMP, Netflow, sFlow, IPPM, IGP, RIB, etc...). NS Query would not necessarily standardize how the above "internal horizontal queries" and summarization would be performed but would illustrate how such processes can be accomplished via standards, emerging standards or common commercial practices. 2. Network and Application Contexts Figure 1 shows a typical application architecture where an end-user (the point of consuming resource) needs to be connected for its application (e.g., gaming) to a server located in one of the geographically distributed data centers. Lee Expires April 20, 2011 [Page 4] Internet-Draft Network Stratum Query October 2010 ,-----. --------------- ---------- / App \ | DC 1 | | End-user |. . .>( Control ) | o o o | | | \ / | \|/ | ---------- `-----' | O | | ----- --|------ | | | | | --------------------------|-- | / PE1 | \ | / ...................O \ -------------- | | . | | o o o DC 2 | | | PE4 . PE2 | | \|/ | ----|---O.........................O---|---|---O | | . | | | | . PE3 | -------------- \ ..........O Carrier / \ | Network / ---------------|------------- | --------|------ | O | | /|\ | | o o o | | DC 3 | --------------- Figure 1. Data Center Architecture Figure 1 shows that the user application can be served by any of the servers in DC1, DC2 or DC3. When the initial request arrives to the application controller, the controller (aka, a global load balancer) would ideally assign an "optimal" server based on both server resource/load status and the network resources/load status. This server assignment decision today, however, is limited due to the lack of network awareness in this decision making process in the application. For example, the application controller needs to find a good server that can serve the client. Assume that this particular application requires x amount of minimum bandwidth guarantee and with less than y ms of latency limit. The route that serves Data Center 1 traffic to the end-user (PE1 - PE4) may not have enough capacity at a moment of service instantiation and therefore the service objective of the end- user may not be satisfied had such route been taken. Lee Expires April 20, 2011 [Page 5] Internet-Draft Network Stratum Query October 2010 On the other hand, there may be good servers available in Data Centers 2 and 3 and their routes (PE2-PE4 and PE3-PE4) may have enough capacity to meet the service requirement. This example illustrates the benefit of and the need for the joint optimization across the application and network strata. NS Query is the ability to query the network from an application controller to collect a certain level of network information. No such mechanisms exist in the today's Internet Protocol technologies. Figure 2 shows the context of NS Query in a more detail within the overarching data center architecture shown in Figure 1. +--------------------------------------------+ | +-------------+ | | | Application | | | | Controller | Application Overlay | | +------|------+ (Data Centers) | | | | ---------- | ------|------- -------------- | | End-User | | | Application |. . . .| Application | | | |. . . >| | Control | | Processes | | ---------- | | Gateway (ACG)| -------------- | | | | -------------- | | ------------- . . . . | Application | | | /\ | Related Data | | | || -------------- | ----------||-------------------------------- || || Network Stratum Query (First Stage) || +----------||--------------------------------+ | \/ Network Underlay | | | | -------------- ---------------- | | | Network |. . . | Network | | | | Control | | Processes | | | | Gateway (NCG)| ---------------- | | | | ---------------- | | ------------- | Network | | | |------------->| Related Data | | | (Second Stage) ---------------- | +--------------------------------------------+ Figure 2. NS Query Architecture Figure 2 shows key architectural components that enable NS Query capability. The application controller, e.g., global load balancer, Lee Expires April 20, 2011 [Page 6] Internet-Draft Network Stratum Query October 2010 utilizes the Application Control Gateway (ACG) to interface with the network and generate queries to network. The ACG can query various metric values that may contribute to meeting the overall service objective of an application. This is a vertical query (Stage 1). In the network stratum, the Network Control Gateway (NCG) serves the interface to the network stratum. The NCG receives the query request from the ACG, makes use of current and/or historical network information (IPPM, IGP, MIB, TED, etc...), abstracts and summaries this information implemented in the network stratum. This is the horizontal query and summarization stage (Stage 2). Further, the NCG provides the responses to the original query sent from the ACG. The data collected by the NCG needs to be abstracted. This abstraction is needed on two grounds. First, the network does not usually reveal its details to the outside entity. Although the Data Center providers and the carriers are business partners in providing application services to the end- users and to the application providers (e.g., gaming providers), detail network data may not be leaked to the Data Centers, and vice versa. Secondly, detail network data may not be understood by the application. Link or node level data in and of themselves may not help the application to process the detail data. For instance, latency or bandwidth on a link level is too detail for application to handle. Instead, latency or bandwidth on a route level (i.e., PE1 - PE4 in Figure 1) will help the application make its server selection/instantiation decision. The abstraction function needs to be provided by the NCG. Note that NCG plays gate keeper role to information concerning the network. a Within the network the NCG collects and or probes network performance/management data (e.g., IPPM, MIB, etc.) or routing data [MRT] (e.g., LSDB, TED, BGP-RIB, etc.) and others. Once the basic data is collected, the NCG will need to abstract/summary before it sends to the application. 3. Problem Statement 3.1. Limitation of existing probing schemes The current state-of-the art application network awareness schemes for an entity external to the network are based on ping, trace route, or vendor specific probing mechanisms based on the assumption that Lee Expires April 20, 2011 [Page 7] Internet-Draft Network Stratum Query October 2010 the underlying transport network is L3 network and that the routing is simple IP forwarding. In reality, the carrier's routing schemes are likely to include IP tunneling or MPLS tunneling on top of or in place of IP forwarding. In some cases, the actual network may be VPN, MPLS-TE or GMPLS-TE networks where trace route does not work. This implies that network status estimation technique made from application stratum may have limited accuracy. Thus, application resource allocation to end-users can suffer sub-optimality and fail to meet performance objective for the application. 3.2. Lack of vertical query schemes There is no standard "vertical" query scheme that allows an application control gateway in a Data Center to query network stratum in a way suitable for a third party (i.e. an entity "outside" the network). Due to the lack of standard vertical query scheme, there is a limitation on exchanging information between application and network that would increase efficiency of joint optimization across application to network. For instance, the ability to exchange the application profile information (defined in Section 4.1) or network capability information between application and network would increase efficiency of resource allocation across application to network. 3.3. Limitation of SNMP MIB network monitoring techniques SNMP MIB Network Monitoring lacks a whole network query capability. A whole network query is a query to gather information across many boxes simultaneously under the control of a single administration domain (AD) as defined in RFC 1136. A single AD means the single AS or multiple ASes under the control of a single AD. 3.4. Lack of abstraction mechanisms Most of the information needed to provide NS Query is currently available from the network; however, it is not aggregated into a form suitable for use by the application stratum. For example from commonly monitored SNMP based link statistics and current routing tables one can easily compute average available bandwidth and many other statistical performance measures such as packet loss, latency, etc. However, neither the raw SNMP nor routing table data should be delivered to the application stratum since (a) this reveals too much Lee Expires April 20, 2011 [Page 8] Internet-Draft Network Stratum Query October 2010 information concerning the carriers network, (b) presents too much information to transfer to each application. This warrants some work on abstraction from network side to preserve the privacy of network stratum details from the application stratum. 4. High-level requirements This section discusses high-level requirements to support NS Query in the Data Center environments. The ACG plays the key role functioning as an application gateway to network and runs the NS Query. The ACG has access to the end-user profile for the application and the candidate servers' locations locally and remotely located. How the ACG access these information is beyond the scope of this work. 4.1. Application Profile The application Stratum needs to provide the application profile to network as part of a query. Example service profile information that can be useful to network to understand is as follows: . End user IP address; . User access router IP address; . Authentication Profile: Authentication Key; . Bandwidth Profile: Minimum bandwidth required for the application; . Connectivity Profile: P-P, P-MP, Anycast (Multi-destination); . Directionality of the connectivity: unidirectional, bi- directional; . Path Estimation Objective Function: Min latency, etc. Additional profile information can be added depending on the network capability. Lee Expires April 20, 2011 [Page 9] Internet-Draft Network Stratum Query October 2010 4.2. Network Load Data to be queried For a given location mapping information (i.e., from the server location to end-user location), the query from an application can ask the following network load data: . Type of networks and the technical capabilities of the networks; . Bandwidth capabilities and availability; . latency; . jitter; . packet loss; . And other Network Performance Objective (NPO) as defined in section 5 of [ITU-T Y.1541]. Note that this can be asked in a different way. For example, the query can simply ask: . Can you give me a route with x amount of b/w (from server to end-user) within y ms of latency? . Can you give me a route with x amount of b/w (from server to end-user) with no packet loss? 4.3. A Whole Network Query capability Upon the request from application (specifically, the ACG in Figure 2), the network (specifically the NCG in Figure 2) should perform "a whole network query" of information. A whole network query is a query to gather information across many boxes simultaneously under the control of a single administration domain (AD) as defined in RFC 1136. A single AD means the single AS or multiple ASes under the control of a single AD. The scope of a whole network query can include the topology of the network, the bandwidth availability for the routes of interest, the capabilities and congestion of links and routes, and an indication of the contribution to delay and jitter that each link and route will contribute and so on. 4.4. Data Synchronization Mechanism When querying the network there are two general categories of information that are of interest (a) current data, and (b) historical Lee Expires April 20, 2011 [Page 10] Internet-Draft Network Stratum Query October 2010 data. Current data information tells us information about the current state of the network, this data represent current network "state" and over time is subject to change. Application servers would use abstracted versions of this information to make decisions regarding current application activities. Historical data can be used by the application controller to schedule application events at future times. Such historical data must be time stamped so that inferences can be drawn from the data. For example, consider the backup and synchronization of large application databases between two data centers. Not only would we like to know how much bandwidth is available, and importantly when is the best time to perform such synchronization (given there is flexibility on when to perform the backup). 4.5. Responses to NS Query from network to application Given the network query from application, the network should provide the following mechanisms: - For a given location mapping information from application (i.e., from the server location to end-user location) and the gathered information by the second stage query discussed in section 4.3., the network needs to present the requested information in a standard format and respond to the application. The actual abstraction mechanism is beyond the scope of this document. 5. Security Considerations TBD 6. IANA Considerations This informational document does not make any requests for IANA action. Lee Expires April 20, 2011 [Page 11] Internet-Draft Network Stratum Query October 2010 7. References 7.1. Informative References [RFC2261] D. Harrington, et al., "An Architecture for Describing SNMP Management Frameworks," January, 1998. [RFC2265] B. Wijnen, et al., "View-based Access Control Model (VACM) for the Simple Network Management Protocol (SNMP)," January, 1998. [Y.1541] Network performance objectives for IP-based services, February, 2002. [Y.2011] General principles and general reference model for Next Generation Networks, October, 2004. [Y.2012] Functional Requirements and architecture of the NGN, April, 2010. [MRT] L. Blunk, M. Karir, and C. Labovitz, "MRT routing information export format," draft-ietf-grow-mrt, work in progress. Lee Expires April 20, 2011 [Page 12] Internet-Draft Network Stratum Query October 2010 Author's Addresses Young Lee Huawei Technologies 1700 Alma Drive, Suite 500 Plano, TX 75075 USA Phone: (972) 509-5599 Email: ylee@huawei.com Ning So Univerity of Texas at Dallas Email: ningso@yahoo.com Dave McDysan Verizon Business Email: dave.mcdysan@verizon.com Greg M. Bernstein Grotto Networking Fremont California, USA Phone: (510) 573-2237 Email: gregb@grotto-networking.com Tae Yeon Kim ETRI tykim@etri.or.kr Kohei Shiomoto NTT Email : shiomoto.kohei@lab.ntt.co.jp Oscar Gonzalez de Dios Telefonica Email : ogondio@tid.es Intellectual Property Statement The IETF Trust takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in any IETF Document or the extent to which any license under such rights might or might not be available; nor does it Lee Expires April 20, 2011 [Page 13] Internet-Draft Network Stratum Query October 2010 represent that it has made any independent effort to identify any such rights. Copies of Intellectual Property disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement any standard or specification contained in an IETF Document. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity All IETF Documents and the information contained therein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION THEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. Lee Expires April 20, 2011 [Page 14]