IRCAT Working Group Y. Yu Internet Draft Huawei Intended status: Standard Track T.Tsou Expires: December 2011 Huawei R.Li Huawei June 10, 2011 Inter-router Cooperative and Teaming for massive scalability draft-yu-ircat-problem-statement-00.txt Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This Internet-Draft will expire on December 10, 2011. Copyright Notice Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Yu&Tsou&Li Expires December 10, 2011 [Page 1] Internet-Draft IRCAT Problem Statement June 2011 Abstract With new applications emerging such as provider-provisioned cloud service or VPN oriented cloud service, the network equipment of provider faces the new requirements regarding massive scalability. This document describes the challenges when VPN oriented cloud server emerge. Cooperative router group solution is recommended to meet new massive scaling requirement. Cooperative router group addresses new mechanisms which to be developed and further studied. Table of Contents 1. Introduction ................................................. 2 1.1. Challenges of Provider Data Center Service Solution ...... 3 1.2. Warehouse data Center ................................... 3 1.3. Provider Cloud Site PE .................................. 3 1.4. Provider Cloud Site CE .................................. 3 1.5. Scaled Network Virtualization ........................... 4 2. Terminology ................................................. 4 3. Inter Router cooperative Group ............................... 4 3.1. Control Plane and Forwarding Plane Separation ........... 5 3.1.1. Control Plane and Forwarding Plane Separation is necessary ................................................ 5 3.1.2. Unified Router Group Control Plane is necessary ..... 5 3.2. Router Discovery Mechanism .............................. 5 3.3. Topology of Router Group ................................ 5 3.3.1. An Example of interior router inter-connection network6 3.4. Load Migration ......................................... 6 3.4.1. Router Capability Exchange Mechanism is needed ...... 6 3.4.2. Local resource load monitoring ..................... 7 3.4.3. Load Migration ..................................... 7 3.4.4. Calculate Target Router of Load Migration........... 7 3.5. Always In Service ....................................... 7 4. Conclusions and Recommendations .............................. 8 5. Security Considerations ...................................... 8 6. IANA Considerations ......................................... 8 7. Acknowledgments ............................................. 8 8. Reference: .................................................. 8 1. Introduction Enterprise customers now do not need to invest local IT resource instead of leasing cheaper on-demand resources from provider. Provider also gets into new business by providing data center service. AT&T published its CloudNet proposal for its provider data center service solution. This proposal introduces VPN into data center cloud service. Enterprise Yu&Tsou&Li Expires December 10, 2011 [Page 2] Internet-Draft IRCAT Problem Statement June 2011 customers utilize VPN to access the provider's data center. VPN oriented provider data center service not only provides secure access but by using VPN but also separates customer resources in provider data center service by VPN. Challenges of Provider Data Center Service Solution VPN oriented provider cloud service provides can provide flexible resource allocation schemes responding to dynamic changes of cloud and network. VPN also provides transparent connects between enterprise customers and isolations within clouds. However, it also brings higher requirements and let current network equipments face hard time. The Challenges can be summarized as extremely high scalability and high scaled virtualization. Warehouse data Center Today the warehouse data centers can accommodate over 120,000 physical servers. Each server may install multi core CPU and each core is hyper threaded. With a decent virtualization, each server can support up to 20 VMs. So a warehouse data center can maximally accommodate up to 2,400,000 VMs. This means control plane scalability to hold 2.4M VMs. That implies that the network is going to be added 2.4M MAC addresses plus 2.4M IPv4 and IPv6 addresses. If data center is layer3, 2.4M IP host routes are also added. Provider Cloud Site PE Provider's cloud service is provided by a VPN for each customer. This means almost all VPNs of cloud service customer finally aggregate to provide cloud side PE. This creates extremely high scale requirement for PE. For example, cloud service is provided for 500K enterprise customers, 500K VPN must be configured and there are 10K routes in each VPN, so PE at least reach 500K vrfs and 500 million vrf routes. This is far beyond all PE routers' capability in market. Provider Cloud Site CE Provider cloud service requires logical CEs has a virtual instances corresponding to a customer VPN. 500K VPN customer means 500K virtual routing instances in logical CE. Even though all major vendor routers have features of logical router and virtual router, but their scalability can not meet the requirement of provider data center. Yu&Tsou&Li Expires December 10, 2011 [Page 3] Internet-Draft IRCAT Problem Statement June 2011 Scaled Network Virtualization Servers are virtualized into many slices of VMs in provider cloud data center. VMs belong to different customers but share the same physic network. Network virtualization handles how to forward a packet to its right place in a provider data center--- a highly scaled inter-leaving inter-mixing fate-sharing network. Network virtualization becomes very challenging in provider cloud data center. 2. Terminology VPN Virtual Private Network VRF Virtual Routing and Forwarding VPC Virtual Private Cloud CloudNet AT & T proposed a VPN oriented cloud service DE Provider Data Center Edge VM Virtual Machine 3. Inter Router cooperative Group To meet the massive scalability from the newer emerging application, single chassis router or single entity router faces tough time. Even though right now CPU and Memory speed grow very fast, build a giant router is still expensive. Single giant router does not have the flexibility of seamlessly scales from low to high. So unite a group of router into a single router become necessary. All the routers in router group have same router ID and perform as single router. The router group is viewed as single network node in the network. Routers in router group can be heterogeneous or homogeneous; they are not required to be the same type of router. Compare to single giant solution, router group method is practical and much cheaper, easy to deploy. Router group can be seamlessly scale up and down without disrupted the service. If load of router group is low at first, fewer routers can be put into router group initially. Then more routers can be added into group when load goes up. Yu&Tsou&Li Expires December 10, 2011 [Page 4] Internet-Draft IRCAT Problem Statement June 2011 Control Plane and Forwarding Plane Separation 3.1.1. Control Plane and Forwarding Plane Separation is necessary As cooperative router group definition, Routers can be added or removed freely corresponding to the load of router group changes. It is not necessary that a router control plane only manages data plane in it chassis under router group scenario. Router group has more flexibility to utilize all router resources of whole router group. 3.1.2. Unified Router Group Control Plane is necessary It becomes tough way to cooperate so many independent routers in router group. To fully utilize and mange control plane, the introduction of a virtualization to unified router control plane becomes necessary. Router Group performs and can be viewed as a single network node to whole network. United router group control plane has unified router ID and unified addresses to the other network. Each router's control plane is a slice of resource of this unified group control plane. Adding or removing a router just adds or removes a slice of resource from global resource pool. Router Discovery Mechanism When a new router comes into or leaves away from router group, the other routers in group must be aware of this change. Automatic router discovery mechanism is needed 3.3. Topology of Router Group Interior interconnection network among routers can be either layer 2 or layer 3. The interior interconnection MAC addresses (for layer 2 network) or IP addresses (for layer 3 network) are only privately used for internal communication. Other network nodes outside the router group do not see those addresses. In the same time, router group maintains public router ID and a set of public IP addresses. Router group public information is published to other network nodes. Interior router interconnection network takes following functions: 1. Internal router communication. Internal router communication means unicast/multicast communication between (or among) group routers. This part is work pretty well with current protocols. Yu&Tsou&Li Expires December 10, 2011 [Page 5] Internet-Draft IRCAT Problem Statement June 2011 2. External packets forwarding Ingress local packets are all destined to public IP address. Additional mechanism is needed under this scenario. Next section is an example of this case. 3.3.1. An Example of interior router inter-connection network +----------+ +-----------+ +---------+ | A | | B | | C | | (BGP 1)| ----- | (BGP 2) | ------- | (OSPF) | +----------+ +-----------+ +---------+ | LC | | LC | | LC | +----------+ +-----------+ +---------+ Figure 1 Interior router inter-connection example There are 3 routers A, B, C in router group. Their interior IP addresses are 10.1.1.1, 10.1.1.2, 10.1.1.3 respect. Router group has public address is 1.1.1.1. 2 BGP instance are running on router A and B, OSPF instance is running on router C. BGP on router A & B are load balanced, it means half BGP peers are handled by A and second half peers are handled by B. All BGP peers are peering with IP address (1.1.1.1). When BGP packet is punt up via Line card, the destination IP address of packet is 1.1.1.1. If this packet is from the peer who is handled by A, this packet should be forwarded to A. If an OSPF packet punts up, it should be forwarded to C. New mechanisms are needed to be developed here to forward ingress egress packets to right location. Load Migration Router load migration is great advantage of cooperative route group. Before a router runs out of resource, Part of router load can be migrated to other routers who have enough resource to accommodate those loads. To achieve automatic dynamic load migration is cooperative router group, a series of new mechanisms are needed to be developed 3.4.1. Router Capability Exchange Mechanism is needed When a router joins the router group, it should advertise its capability. Capability here router's used and available CPU and memory capacities etc, router group must have a capability database of all routes. Yu&Tsou&Li Expires December 10, 2011 [Page 6] Internet-Draft IRCAT Problem Statement June 2011 Router capability is always changed, so capability database is updated from time to time. Router capability database is used by migration target calculation. 3.4.2. Local resource load monitoring Router should monitor its local resource status. Before resources are run out of trigger the load migration procedure 3.4.3. Load Migration Router local working load can be defined in load migration. - Migrate able and non-migrate able For those migrate able working load, we can also define as: - Split able and non-split able Split able working load can be partial migrated to target router and vice visa. Load migration can be trigged by automatic load monitoring or manually by operator. 3.4.4. Calculate Target Router of Load Migration If a router is required to migrate its partial or entire load, it makes the decision about the best target routers and the amount of working load to be migrated Target router calculation can base on router capability database, local load status and pre configured policy. Migration target calculation is a complex process if the optimal results are wanted, How to simply the calculation compromise with a little bit sub-optimal result? Always In Service Availability is the significant factor of cooperative router group. Internal operations of router group, such as Load migration, data synchronization, should not disrupt the service to outside other network node. Yu&Tsou&Li Expires December 10, 2011 [Page 7] Internet-Draft IRCAT Problem Statement June 2011 4. Conclusions and Recommendations Because of massive scaling requirements of provider cloud service, we have concluded and recommend that: - Cooperative router group solution addresses pretty new machismos which should be developed and needed further study. - Cooperative router group solution addresses pretty new machismos which should be developed and needed further study. - We recommend IETF to develop the new mechanisms addressed by cooperative router group. 5. Security Considerations This document does not add additional Security Considerations. 6. IANA Considerations This document does not add additional Security Considerations. 7. Acknowledgments This document was prepared using 2-Word-v2.0.template.dot. 8. Reference: [1] [CloudNET] T. Wood, The Case for Enterprise-ready Virtual Private Clouds Yu&Tsou&Li Expires December 10, 2011 [Page 8] Internet-Draft IRCAT Problem Statement June 2011 Author's Addresses Yang Yu Huawei Technologies 2300 Central Expressway, Santa Clara, CA 95050, USA Email:yyu@huawei.com Tina Tsou Huawei Technologies 2300 Central Expressway, Santa Clara, CA 95050, USA Email:tena@huawei.com Renwei Li Huawei Technologies 2300 Central Expressway, Santa Clara, CA 95050, USA Email:renwei.li@huawei.com Yu&Tsou&Li Expires December 10, 2011 [Page 9]