Internet Engineering Task Force Y. Yang Internet-Draft T. Tsou Intended status: Standards Track Huawei Technologies Expires: January 5, 2012 July 4, 2011 Inter-Router Cooperative and Teaming Protocols for Massive Scalability draft-yu-ircat-problem-statement-02.txt Abstract To achieve the extremely high scalability, the cooperative router team is a reasonable approach. This document reviews problems that the cooperative router approach might solve. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on January 5, 2012. Copyright Notice Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Yang & Tsou Expires January 5, 2012 [Page 1] Internet-Draft IRCAT Problem Statement July 2011 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Inter Cooperative Router Team . . . . . . . . . . . . . . . . 3 3.1. Examples of the Router topologies . . . . . . . . . . . . 4 4. Problem Statement . . . . . . . . . . . . . . . . . . . . . . 7 4.1. Router Team Membership . . . . . . . . . . . . . . . . . . 7 4.2. Router Capability Dynamic Exchange . . . . . . . . . . . . 7 4.3. Cooperative Router Group Topology . . . . . . . . . . . . 7 4.3.1. An Example of Interior Router Team Network . . . . . . 8 4.4. Cooperative Router Team Task Collaboration . . . . . . . . 9 4.4.1. Local Resources Monitoring . . . . . . . . . . . . . . 9 4.4.2. Migration Source Selection . . . . . . . . . . . . . . 9 4.4.3. Migration Targets Selection . . . . . . . . . . . . . 9 4.5. Unified Router Team Control Plane . . . . . . . . . . . . 10 4.6. Router Control Plane and Forwarding Plane Separation . . . 10 5. Conclusions and Recommendations . . . . . . . . . . . . . . . 11 6. Security Considerations . . . . . . . . . . . . . . . . . . . 12 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 12 9. Informative References . . . . . . . . . . . . . . . . . . . . 12 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 13 Yang & Tsou Expires January 5, 2012 [Page 2] Internet-Draft IRCAT Problem Statement July 2011 1. Introduction To achieve massive scalability, routers are being evolved and growing larger and larger in size. One way to achieve very high scalability is to design and deploy tightly-coupled homogeneous hardware cluster router. Currently, the tightly-coupled cluster router with homogeneous hardware requires the same type of routers from the same vendor. As an alternative, using different routers (maybe from different vendors) to construct a loosely-coupled heterogeneous cluster router by running one or more protocols (based on IETF standards) is a reasonable approach. The key point of such a technology is about Cooperation and Teaming among routers. The cooperation includes, but not limited to, such a scenario in which one router may handle some control plane work on behalf of another router. From the outside, the cooperative router team is required to meet the following minimal requirements: o The group is viewed as a single virtualized router with a router ID and operated as a single network node. o The router is always available and some redundancy and backups are built between routers joining the same router team. o The router is extremely highly scalable as a new router can be inserted to join such a group when needed. 2. Terminology IRCAT:Inter Router Cooperative and Teaming 3. Inter Cooperative Router Team To meet the massive scalability requirements from the newly emerging applications, the single chassis router faces tough time. Even though CPU speed and memory capacity continue to grow at a fast pace, building a giant route is still expensive. The single giant router loses the flexibility of seamlessly scaling from low to high. An alternate paradigm is to create giant routers out of multiple heterogeneous routers in a loosely coupled group. The router can be Yang & Tsou Expires January 5, 2012 [Page 3] Internet-Draft IRCAT Problem Statement July 2011 heterogeneous in the CPU, the type of line cards, their functionality or their origin. For example, some routers in the group can have line cards with 100 Gig cards and others can have 10 Gig cards. The routers participate in a team of routers much like the US football team or a South American football team (soccer). Each of the team members in the router team tell other members about their special capabilities. Just like on the US football team, the team members share about their capabilities (punting or blocking) so the team can win the football game. The team members join/leave the team. This function can be via multicast protocols or other dynamic group joining protocols. The team members also cooperate to provide the service in a distributive fashion. Again, to use the football analogy the team members collaborate to do the task of moving the ball down the field. The team members also tell each other about the dynamic changes in topology. For the purpose of discussion we denote the grouping multiple routers inter router cooperative group. In the cooperative router team, the routers are not required to be of the same type. The Router Member can be heterogeneous or homogeneous routers. 3.1. Examples of the Router topologies The router topologies can be in any type of topology. The next three figures show three examples: ring, partial mesh, and full mesh. These examples are shown in a flat topology, but any type of topology (hierarchy or flat or mixed) should be able to be supported. In each example, the router as a group has an ID and exterior addresses. View from the outside, the rest of the world sees the router team as a single identity. Viewed from the inside, the routers in the router team see each other's Team ID and member ID within the team. Again an analogy from South American Football (soccer) may help. The team is the Red soccer team. The forwarder Pascal has a 14 on his jersey. Figure 1 is the illustration of router team. The router team has a single router ID and a set of exterior addresses as public information. The group router members have private interior router ID and interior addresses for internal interconnection. Yang & Tsou Expires January 5, 2012 [Page 4] Internet-Draft IRCAT Problem Statement July 2011 +--------------------------------------------------------+ | Router Team (The Router ID and Exterior Addresses) | | +---------+ +---------+ | | | | | | | | | RM 1 | ------------------------- | RM 3 | | | | | | | | | | | | | | | +---------+ +---------+ | | |(Interior Address) (Interior| Address)| | | | | | +---------+ +---------+ | | | | | | | | | RM 2 | ------------------------- | RM 4 | | | | | | | | | | | | | | | +---------+ +---------+ | | | +--------------------------------------------------------+ Figure 1: Cooperative Router Team with Ring Topology +--------------------------------------------------------+ | Router Team (The Router ID and Exterior Addresses) | | +---------+ +---------+ | | | | | | | | | RM 1 | ------------------------- | RM 3 | | | | | | | | | | | \ // | | | | +---------+ \ +---------+ | |(Interior| Address) \ // (Interior| Address)| | | // \ | | | +---------+ =========// \---------+---------+ | | | | | | | | | RM 2 | ------------------------- | RM 4 | | | | | | | | | | | | | | | +---------+ +---------+ | | | +--------------------------------------------------------+ Figure 2: Cooperative Router Team with Full Mesh Topology Yang & Tsou Expires January 5, 2012 [Page 5] Internet-Draft IRCAT Problem Statement July 2011 +--------------------------------------------------------+ | Router Team (The Router ID and Exterior Addresses) | | +---------+ +---------+ | | | | | | | | | RM 1 | ------------------------- | RM 3 | | | | | / | | | | | | / | | | | +---------+ / +---------+ | |(Interior| Address) / (Interior| Address)| | | / | | | +---------+ / +---------+ | | | | ------------/ | | | | | RM 2 | ------------------------- | RM 4 | | | | | | | | | | | | | | | +---------+ +---------+ | | | +--------------------------------------------------------+ Figure 3: Cooperative Router Team with Partial Mesh Topology The connections between the router members can be at a single layer (layer 3 or layer 2) or multiple layers (1-4). Cooperative router team is truly cooperative. The Router Members collaboratively agree to work on some set of tasks. The tasks may have characteristics like CPU, memory, and time of day. Or the tasks may have specific goals like NAT or firewall. Let's stretch the football analogy a little further. Sometimes a US football team is on the defensive (like a firewall). Sometimes, a US team is marching down the field toward a goal (sending data). Another technical example of a cooperative router task is to spread the BGP load across multiple routers in a router team. The load of BGP instances are dynamically distributed among Team Members, who have the same router ID and same set of exterior IP addresses. The external BGP peers are all connected to the exterior IP address. A cooperative router team can seamlessly scale up and down without service disruptions. New routers can be added into or removed from the cooperative router team in response to the load of the router team. This describes the vision of the winning team router. However, there are a series of challenges to make a router team cooperative. Some of these problems can be solved by good existing mechanisms. This is similar to a football team's basics of blocking and tackling. Yang & Tsou Expires January 5, 2012 [Page 6] Internet-Draft IRCAT Problem Statement July 2011 The team router will need to develop some new mechanisms to provide new services in the Internet. Like a football team, the new mechanisms create ways to avoid security attacks. 4. Problem Statement 4.1. Router Team Membership When a new router joins a router team, the necessary procedures are automatic bi-directional team discovery, negotiation, and authentication. After becoming a member of the router team, its resources are added into the router team global resources. If a router leaves the router team, the router team must be able to detect its departure and remove its resources from the global resources. A router may make available only some of its resources into the router team. There are many routing protocols such as OSPF, ISIS, have router membership mechanisms, but none of them support router team. 4.2. Router Capability Dynamic Exchange The router team available resources can be changed dynamically all the time. Especially when a router joins or leaves the router team, the router team needs to be aware of the change. The router capability exchange mechanism is needed. Capability here refers to the router's available hardware and software resources. 4.3. Cooperative Router Group Topology There's a lot internal communication among the router members if they are collaboratively working for the same tasks. Each router member has its interior addresses for internal communication. Those addresses are concealed from routers outside of the cooperative router team. A mechanism is needed to learn the interior network topology from other routers. This mechanism needs to be extremely light weight when compared to existing protocols such OSPF and ISIS. It needs to converge extremely fast. Meanwhile, a cooperative router team keeps its public information --- router ID and exterior addresses. The outside routers only know the public information. The packets from outside routers have destination to router team exterior addresses. A new mechanism is needed to forward those packets to proper router members. Section 4.3.1 gives an example of the scenarios. Yang & Tsou Expires January 5, 2012 [Page 7] Internet-Draft IRCAT Problem Statement July 2011 This new mechanism may face the following listed problemsGBPo o The exterior addresses are always configured in the applications of the router members. It is possible that the same exterior IP addresses are on multiple member routers. For example, BGP may have multiple instances which are running on multiple routers under router team scenario. Those BGP instances have the same router ID and the same exterior address to outside peers. The forwarding mechanism should forward those BGP packets to proper router members even though all BGP packets are destined to same exterior address. o The forwarding can be changed dynamically. Because working load can be migrated from one to another router member, the packet forwarding should be updated in corresponding to the migration. 4.3.1. An Example of Interior Router Team Network 10.1.1.1 10.1.1.2 10.1.1.3 +-----------+ +-----------+ +-----------+ | router A | | router B | | router C | | (BGP 1) | ----- | (BGP 2) | ----- | (OSPF) | (|100.1.1.1)| |(100.1.1.1)| |(100.1.1.1 | +-----------+ +-----------+ +-----------+ | LC | | LC | | LC | +-----------+ +-----------+ +-----------+ | | | | | | | | | Figure 4: Interior Router Team Network Router member A, B and C are constructed as a cooperative router team in figure 2. Their interior IP addresses are 10.1.1.1, 10.1.1.2, 10.1.1.3 respectively. The router team has exterior address 100.1.1.1. Two BGP instances are running on the router member A and B. An OSPF instance is running on router member C. Half of the BGP peers are handled by router member A and the other half are handled by router member B. All BGP peers connect to the exterior address 100.1.1.1, and the load are shared by the cooperative router. The packet forwarding needs to address the following scenarios: o BGP packets with a destination address of 100.1.1.1 handled by the BGP instance on router member A is forwarded to router A; o BGP packets with a destination address of 100.1.1.1 and handled by BGP instance on router member B is forwarded to router member B. Yang & Tsou Expires January 5, 2012 [Page 8] Internet-Draft IRCAT Problem Statement July 2011 o OSPF packets are forwarded to router member C. These cooperative functions are dynamic so that the load can be shifted during different processing periods for each protocol. 4.4. Cooperative Router Team Task Collaboration The working load collaboration in router team provides the following capabilities: o Multiple router members can be working with same tasks collaboratively; and o router members who enter resources panic mode (i.e. running out of resources) can migrate a part or all of their working load to other router member(s). It should be possible for the cooperative router team load migration to be automatic and dynamic based on pre-determined policies. 4.4.1. Local Resources Monitoring It is too late if a router member completely runs out of resources and then realizes it can move some working load out. The router member should monitor its local resource status. If it reaches alarm zone, the router member starts the working load migration. 4.4.2. Migration Source Selection TBD. 4.4.3. Migration Targets Selection If a router member needs to offload its work load, it must know where the targets are. Why is this? Like a football team, the cooperative router must know where the members are so they can pass the ball. There are two ways this off load can occur. Way 1: The router member always collects the other member's capability before the work load migration. When the work load migration starts, the router member already has enough knowledge to know the migration targets. Way 2: The router member does not have the other router member's capability. It asks for help by broadcast messages and then waits for the other router members' response. The router member knows the migration targets based on responded information. Yang & Tsou Expires January 5, 2012 [Page 9] Internet-Draft IRCAT Problem Statement July 2011 How do you select where the load goes to? Internally the selection mechanism of the router team selects the member to offload its traffic to. This is like the quarterback selecting the person to pass the football to. A negotiation mechanism is also needed after migration targets are decided. Somehow, the target must know they are being passed the task. 4.5. Unified Router Team Control Plane The reasons for unified router team control plane are: o The members of the cooperative router team each have one or more processors that can be leveraged into a virtual control plane; o The virtual control plane can distribute the computation load as network load changes; o The members can join or leave the virtual control plane dynamically and seamlessly to add/delete resources as needed (for load sharing or resiliency) In this way, the unified router team control plane has a global control plane resource pool that controls the forwarding plane. Just like a football team member, when you join a team your body becomes one of the football team's resources. Your ability to pass things forward becomes the team's ability. 4.6. Router Control Plane and Forwarding Plane Separation This problem is related to the router team control plane virtualization. It is natural in a single router that a control plane only manages its local forwarding hardware. But in cooperative router team, it is possible that a control plane on another router member can manage the local forwarding hardware on behalf of the local router control plane. The forwarding plane can no longer find out its specific control plane, but face the unified router team control plane. Yang & Tsou Expires January 5, 2012 [Page 10] Internet-Draft IRCAT Problem Statement July 2011 ----------------------------------------------- / Control Plane \ / \ / +-----+ +-----+ +-----+ +-----+ \ / | | | | | | | | \ | +------| --- |---| --- |-----| --- |---| --- |-----+ | | | BGP | |B| | | |B| | | |B| | | |B| | | | | +------| --- |---| --- |-----| --- |---| --- |-----+ | | | | | | | | | | | | +------| --- |---| --- |-+ +-| --- |---| --- |-----+ | | | RIB | |R| | | |R| | | | | |i| | | |i| | IFO | | | +------| --- |---| --- |-+ +-| --- |---| --- |-----+ | | | | | | | | | | | | +------| --- |---| --- |-+ +-| --- |---| --- |-----+ | | | OSPF | |O| | | |O| | | | | |I| | | |I| | ISIS| | | +------| --- |---| --- |-+ +-| --- |---| --- |-----+ | | | | | | | | | | | \ | CE1 | | CE2 | | CE3 | | CE4 | / \ +-----+ +-----+ +-----+ +-----+ / \ / \ ---------------------------------------- / --/----------------------------------------\-- <--- / \ ---> / \ / \ | | | Forwarding Plane | \ / ---> \ / <--- \ / ------------------------------------------ Legend: --->, <--- IP traffic --- --- |B| |R| etc. Processes in a Computing Element (CE) --- --- Continuous bar ( Figure 5: Virtualization 5. Conclusions and Recommendations To meet the extremely high scaling requirements from service provider, we have concluded and recommend that a cooperative router Yang & Tsou Expires January 5, 2012 [Page 11] Internet-Draft IRCAT Problem Statement July 2011 team approach is practical and effective approach to meet the extremely high scaling requirements. We denote this approach as Inter Router Cooperative and Teaming (IRCAT). IRCAT forms a new architecture paradigm running on heterogeneous routers to let them construct as a cooperative router team. The draft is to start the discussion of such an architecture and mechanisms which might be possible to achieve it. 6. Security Considerations The security considerations on this architecture paradigm are placed here for discussion. This is an architecture, and IETF deals in mechanism, interfaces and protocols. In the IRCAT model, security can go from "none" to the highest level of security at every task. For the "no security" level, all members of the cooperative router team are fully trusted. A malicious router member can easily bring down the whole router team. The cooperative may face possible DOS attacks from outside network. For the full secure level, all peers, tasks, and capabilities are individually secured. 7. IANA Considerations This document is an architecture document. 8. Acknowledgments The list of people who aided for helpful comments is Susan Hares, Richard Li, Steve Dong, Hong-Tao Yin, and Robert Tao. Cathy Zhou helped to translate the format of this document. 9. Informative References [CloudNET] Wood, "The Case for Enterprise-ready Virtual Private Clouds", 2009, . [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway Protocol 4 (BGP-4)", RFC 4271, January 2006. Authors' Addresses Yang Yu Huawei Technologies 2300 Central Expressway, Santa Clara CA 95050 USA Phone: Email: yyu@huawei.com Tina Tsou Huawei Technologies 2300 Central Expressway, Santa Clara CA 95050 USA Phone: Email: tena@huawei.com Yang & Tsou Expires January 5, 2012 [Page 13]