INTERNET DRAFT Mikkel Thorup Category: Informational AT&T Labs-Research Title: draft-thorup-ospf-harmful-00.txt Date: April 2003 OSPF Areas Considered Harmful Status of this Memo This document is an Internet-Draft and is NOT offered in accordance with Section 10 of RFC2026, and the author does not provide the IETF with any rights other than to publish as an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at: http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at: http://www.ietf.org/shadow.html. Abstract We point out that forcing an area decomposition onto an IP backbone can easily have a detrimental effect on the routing and lead to a substantial increase of information to be flooded in connection with link-failures. 1.0 Introduction Currently it is recommended that a larger IP backbone running OSPF should be divided into areas. We point out here that with the current definition of areas, this can easily have a detrimental effect on routing, increase the amount of information in the network, and cause many more changes to be flooded in connection with link-failures. More precisely, our concerns are (1) the special area routing and (2) the fact that area border routers communicate distances into an area rather than just exposing the area topology. These choices appear to be made with the intent of reducing the amount of information flooded and speeding up route computation, but we will argue that they are more likely to have quite the opposite effect. Based on the potential problems with areas, we contend that it is counter productive that network operators by default try to get OSPF to scale using an arsenal of different types of areas, i.e., areas, stub areas, and not-so-stubby areas, all adding to the complexity of configuring OSPF. If there really are scaling problems with plain OSPF without areas, the router vendors could work on more efficient implementations of plain OSPF, e.g., using incremental shortest paths to deal with link-failures. We note that the similar IS-IS protocol is commonly used single-level, suggesting that plain OSPF is feasible as well. 2.0 Plain OSPF routing In Open Shortest Path First (OSPF) routing [Moy98], each link has a weight, and packets follow shortest paths to their destinations. The weights may either have default values inversely proportional to their capacity or be set more carefully by the network operator. Many works [RFT02,FT00,RR01,LW93,BGW98] have demonstrated that this kind of plain shortest path routing works surprisingly well for distributing the traffic in a network. For OSPF routers to maintain their forwarding tables, each router needs to know the whole network topology, including link weights. The routers can then locally compute shortest paths via each outgoing link to all destinations, and thereby decide on next hops for the forwarding table. In order to provide the topology information for all routers, the links and their weights are flooded in the network, both on a regular basis and in connection with changes. In this draft, we do not qualify the difference between routers, networks, and hosts as they are all treated the same from the perspective of routing. Abusing terminology, we refer to all of these network nodes as routers. 3.0 Division into OSPF areas For large networks, concerns about scalability have lead to the introduction of areas that, among other things, aim at reducing the amount of information and computation. Below, we describe these areas on a high level, pointing out that they are likely to have quite the opposite effect, increasing rather than decreasing the amount of information and computation. In an OSPF network, we can have areas 0,1,.... The area 0 is called the backbone area and has a special role. Routers may belong to one or more areas. Routers belonging to more than one area are called area border routers, whereas routers belonging to a single non-backbone area are called internal routers. All area border routers should belong to the backbone area. For the backbone area one may define virtual links over other areas, implemented as paths over these areas. Each area should be contiguous (the backbone area has to be contiguous, and if one of the other areas is not contiguous, each of its connected components is viewed as a separate area). 4.0 Area routing When we have areas, we do no longer just follow shortest paths. Instead we apply the following rules: 1. A router in the same area as the destination only considers paths within the area. 2. A backbone router that is not in the destination area considers paths with a first segment in the backbone and a second segment in the destination area (if the destination is in the backbone, the last segment is empty). 3. An internal router not in the destination area considers paths with a first segment to the backbone, a second segment in the backbone, and a last segment in the destination area. -u- / \ Area 2 / \ s====a====b====c====d=== Area 0 |\ /| | v--w--x | Area 1 | | y----z----t Figure 1: Area routing 4.1 Funny area routing Figure 1 illustrates some of the peculiarities of area routing. We have backbone area 0 with high-speed high-capacity links, and some areas on either side with much smaller links. Each of these non-backbone areas has two border routers on the backbone. Following the default of setting weights inversely proportional to their capacity, plain shortest path routing will follow the backbone as much as possible, and this is what we want. For example, to get from source s to destination t, with plain OSPF, we will follow the route (1) s-a-b-c-t However, consider the area routing from s to t. The source s is not in the destination area, so we start with rule 3, and from there, we do consider the above shortest path. However, router a is a border router for Area 1 which contains our destination t. Hence, when we get to router a, we switch to rule 1 and divert on a local detour over y to t. Thus, with area routing, we get the route (2) s-a-y-z-t. This route is considered worse than the one in (1) because it uses local links of smaller capacity. To make matters even worse, suppose the local link between a and y fails. This would not affect our plain shortest path route in (1), but with area routing, we would now go from a to v, hence using the route (3) s-a-v-w-x-c-t. In order to avoid funny area routing like above, each area A should be convex in the sense for any source s in A and destination t in A, the area A should contain all shortest paths from s to t in the full network. In the example of Figure 1, this means that each of the non-backbone areas should be extended with the segment of the backbone between its current entry points. This implies that we need a-b-c in Area 1 and b-c-d in Area 2. In particular, we end up with a logical copy of b-c in each of Area 0, 1, and 2. Even if this sharing is possible, it all adds to the complexity of the configuration. Also, getting b and c as new border routers for Area 1 and 2, respectively, adds to the information flooded in the network, as discussed in the next section. 5.0 Distances into areas One of the main ideas behind areas is that the internal topology is not revealed to the rest of the network. Each router in the area computes shortest paths into the area, producing a distance table. The area border routers communicate their distance tables to the rest of the network, which then doesn't need to know the internal topology of the area. 5.1 Areas are likely to increase the information With distance tables, for each internal router r in an area A, the rest of the networks sees a distance to r from each area border router of A. Thus, with distance tables, we get more information on r if the number of area border routers for A is bigger than the degree (number of incident links) of r. Often the average router has limited degree, say 4-6, and then it is limited how much we can win switching from topology to distances (areas with a single border router are kind of special, and will be discussed further in Section 9.1). On the other hand, we may lose a lot. Take, for example, a grid topology as illustrated in Figure 2. | | | | | | | | | -+-+-+-+-+-+-+-+-+- | | | | | | | | | -+-+-+-#-#-#-#-+-+- | | | | | | | | | -+-#-#-#-0-0-#-+-+- | | | | | | | | | -+-#-0-0-0-0-#-#-+- | | | | | | | | | -+-#-0-0-0-0-0-#-+- | | | | | | | | | -+-#-0-0-0-0-0-#-+- | | | | | | | | | -+-#-0-0-0-0-0-#-+- | | | | | | | | | -+-#-0-0-0-#-#-#-+- | | | | | | | | | -+-#-#-#-#-#-+-+-+- | | | | | | | | | -+-+-+-+-+-+-+-+-+- | | | | | | | | | Figure 2: Area strictly inside large grid with border routers # and internal routers 0. In a standard grid, any area with 50 routers strictly inside has at least 26 area border routers. On the other hand, in the grid, each router only has degree 4, so we are multiplying the amount of information by more than a factor 5. More generally, if an area strictly inside a grid has n routers, it has at least 4*sqrt(n)-4 area border routers, the best case being if n is a perfect square. Even then, the amount of information grows with nearly a factor sqrt(n). For more random-like networks, the situation is even worse due to their expander properties [Bol85]. For example, if a random graph has degree 3 or more, we expect that all possible areas have most routers on the boundary (this excludes giant areas subsuming most of the routers). In other words, we should not expect a topology to admit any useful decomposition into areas unless the topology is designed to admit areas with few, say 1-2, area border routers. We note that such a design fits a two-level topology with areas being PoPs connected to a more highly connected backbone area. In that case, we should not expect to be able to further subdivide the backbone area. Also, we might later want to add another connection from a PoP, e.g., to make it more robust against failures, or just because a new connection is the best solution for a congestion problem. In that case, with areas, we get a new border router, hence a new distance to each router in the area. Without areas, we are just adding a link. Another problematic case for areas is if an ISP buys up other networks, merging them with its own. Then it is very likely that the higher connectivity will prevent a useful area decomposition. In short, insisting on areas limits the design of networks, and in particular, it limits how well-connected the networks can be. Conversely, since most networks are sparse with few links per router, even if we do succeed in identifying areas with few border routers, it is very limited what we can win with the areas communicating distances instead of internal links. 6.0 The shortest path computation It may seem that having area border routers offering distances rather than internal links should be an advantage for the single source shortest path computation that each router has to perform to fill its forwarding table. However, a simple Dijkstra shortest path algorithm with a good heap implementation spends so little time per link that we cannot gain much by processing distances instead of links. In fact, the speed of the shortest path computation should be a non-issue: for a large network with a 1000 routers and 5000 links, a standard single source shortest path computation takes in the order of a millisecond on today's computers whereas the convergence time for OSPF in connection with link-failures is in the order of seconds [AJY00]. Interestingly, in [AJY00], the shortest path computation is reported to be in the order of hundred times slower on some high-end routers, and curve fitting even indicated a quadratic time shortest path computation. This illustrates the kind of problems one gets when making OSPF more complicated with a hack like areas. Without areas, vendors could just have used one of the many near-liner time Dijkstra implementations that have circulated for more than 30 years. However, now they had to make their own specialized solution, allowing them to come up with an unscalable quadratic implementation. When vendors can make such mistakes on expensive routers, it is troubling to think what kind of mistakes network operators might do with areas in the daily running of a network. 7.0 Link-failures starting avalanches of distance changes One of the more scary features of distance tables is that if a link fails, then all distances using this link may get affected. For example, in Figure 1, if we get any link-failure on the path a-v-w-x-c, then each router in the path gets a new distance from one of the area border routers a and c. One could imagine much worse situations with a bottle-neck link that affected distances from many border routers to many internal destinations. Thus, a simple link-failure may lead to an avalanche of distance changes flooding the network. One may think of it as an advantage that we do not communicate a link-failure outside an area A if it does not affect any distances into the area. However, without areas, such a link-failure is easily dealt with if we employ an incremental shortest path algorithm, such as the one in [RR96], from each outgoing link. If no distances into A are affected (without areas, we just think of A as a region of the network), then for any router outside A, no distance via any outgoing link is changed. Such an innocent link-failures is detected in a few instructions, that is, in nanoseconds, and it does not entail any changes to the routing tables. Thus, areas may turn simple link-failures into big problems (avalanche of distance changes), and the potential savings (not reporting an innocent link-failure) are miniscule. As reported in [AJY00,FT00], the use of incremental shortest paths algorithms [RR96] gives a large speed-up (factor 10-100) for local changes such as link-failures --- also if these do affect some distances. This is a clear advantage for plain OSPF. With areas, we can also make limited use of incremental shortest paths, but when an area creates an avalanche of distance changes, outside routers will still have to check for each distance change how it affects the choice of outgoing ports. This is hence a good example showing how the simplicity of plain OSPF allows more scalable implementation. 8.0 Summarization and infinite loops One of the big potential wins of areas is that one can identify interior destinations in the area with a prefix mask, an approach called "summarization". In the distance tables of the area border router, the distance to the mask will be the maximal distance to a destination represented by the mask. Obviously such masks can lead to very substantial savings depending on how many routers they capture. We note that summarization requires rule 1 from Section 4.0, i.e., that we cannot leave the destination area. The point is that the distance to a destination may be perceived as shorter inside an area than outside an area, and any such inconsistency can lead to infinite loops if packets are allowed to move in and out of areas. u / \ | | s====a====c=====d====b=== Area 0 | | x---------------y Area 1 Figure 3: Summarization example. It may seem pretty safe to use summarization on routers close to each other. However, such summarization can lead to unintended infinite loops in connection with link-failures. Consider the example of Area 1 in Figure 3. Without summarization, no single link-failure can disconnect x from y in the overall network. However, suppose we summarize the internal routers x and y in a common prefix mask m. Assuming unit weights, both area border routers a and b will export a distance of 2 to m. Now, if the link from x to y fails, it disconnects Area 1 into a part X containing a and x and a part Y containing b and y. From a, the longest distance to an address matching m in X is the distance of 1 to x. Similarly, the the distance from b to m in Y is 1. Thus, both a and b will export a distance of 1 to m. Now, consider a packet p at c destined for y. Matching y with the mask m, router c will think the distance to y is 2 via a, and forward p to a. However, a is in X and knows that y is not there, so a will send p back to c, thus creating an infinite loop. For contrast, if link x-y fails without summarization, a would only export a distance of 1 to x, and b would only export a distance of 1 to y, and then p would just follow the correct shortest path to x via d and b. It is a very appealing quality of plain shortest path routing that no matter how many links fail, if the remaining network has a path from one point to another, then such a path will be selected by the protocol. The infinite loop example above shows that this quality is lost with summarization. 9.0 Stub areas The basic idea in stub areas is a kind of inverse summarization, where we summarize everything outside the ISP with the empty mask, matching everything that is not matched inside the ISP. The advantage to this is that the stub area does not need to know about any routes leaving the ISP. This allows for some less powerful routers inside a stub area. The fact that the stub area does not care where we are going if it is outside the ISP can lead to more funny routing, so as with everything else related to areas, they have to be used with caution. Some not-so-stubby areas have been introduced that care about some but not all of the outside world. This can be used to give less funny routing, but it also adds to the complexity of configuring OSPF. 9.1 Leaf areas Given all the above warnings against area routing, we note that there is one safe kind of areas; namely areas with a single border router. For such a "leaf area", we can never have any problems with the area routing. Also, for these, all we need to know from the outside is what IP addresses are inside, and vice versa. We note that potential leaf areas can be identified in linear time. More precisely, we are looking for single routers whose removal disconnect the network. These are called articulation points in graph theory and all articulation points can be identified in linear time (c.f. [CLRS01,HT73]). Having identified the articulation points, automatically, the OSPF implementation could exploit whatever advantages there are to leaf areas without the need for configuration by the network operator. 10.0 Which area to chose If convinced that plain OSPF routings is best, we still have some decisions to make concerning which type of area to use. The most natural choice would be to put all routers in Area 0. However, a popular feature of stub areas is that routers inside a stub area cannot export BGP routes into OSPF. To exploit the above stub area feature with plain OSPF, we can put all routers in one big stub area. If we want some of the routers to behave as backbone routers, we can also put these in Area 0. The backbone routers can be connected with virtual links, but these will never be considered for routing since all routing happens within the same stub area (c.f. rule 1 in Section 4.0). This way we can freely partition routers into stub routers and backbone routers without having to deal with the pitfalls of area routing. We note that the above is only a hack to benefit from features of different types of areas without having to deal with area routing. A cleaner solution is to use routers where the desired features can be configured independent of area types, e.g., with a maximum on the number of BGP routes that can be exported into OSPF. 11.0 Concluding remarks We have argued that OSPF areas are dangerous and often of very limited use. It is therefore important not to count on them for solving scaling problems in OSPF. Rather one should focus on the scaling of plain OSPF without areas, exploiting the simplicity in faster and cleaner solutions. 12.0 Acknowledgments I would like to thank Jay Borkenhagen, Dave Katz, Carsten Lund, Jennifer Rexford, and Aman Shaikh for many good discussions on the theory and practice of OSPF areas. 13.0 References [AJY00] C. Alaettinogly, V. Jacobson, and H. Yu. Towards milli-second IGP convergence. Expired internet draft. Now available as http://www.packetdesign.com/pubs_003.html, 2000. [BGW98] A. Bley, M. Grotchel, and R. Wessaly. Design of broadband virtual private networks: Model and heuristics for the B-WiN. In Proc. DIMACS Workshop on Robust Communication Networks and Survivability, AMS-DIMACS Series 53, pages 1--16, 1998. [Bol85] B. Bollobas. Random Graphs. Academic Press, 1985. [CLRS01] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. MIT Press, McGraw-Hill, 2nd edition, 2001. [RFT02] B. Fortz, J. Rexford, and M. Thorup. Traffic engineering with traditional IP routing protocols. IEEE Communications Magazine, 40(10):118--124, October 2002. [FT00] B. Fortz and M. Thorup. Internet traffic engineering by optimizing OSPF weights. In Proc. 19th IEEE Conf. on Computer Communications (INFOCOM), pages 519--528, 2000. [HT73] M.L. Fredman and R.E. Tarjan. Algorithm 447: efficient algorithms for graph manipulation. Communications of the ACM, 16(6):372 -- 378, 1973. [LW93] F.Y.S. Lin and J.L. Wang. Minimax open shortest path first routing algorithms in networks supporing the SMDS services. In Proc. IEEE International Conference on Communications (ICC), volume 2, pages 666--670, 1993. [Moy98] J. T. Moy. OSPF version 2. Network Working Group, Request for Comments, http://www.ietf.org/rfc/rfc2328.txt, April 1998. [RR01] K.G. Ramakrishnan and M. A. Rodrigues. Optimal routing in shortest-path data networks. Bell Labs Technical Journal, 6(1), 2001. [RR96] G. Ramalingam and T. Reps. An incremental algorithm for a generalization of the shortest-path problem. Jounal of Algorithms, 21(2):267--305, 1996. 14.0 Author's Address Questions about this memo can be directed to: Mikkel Thorup AT&T Labs-Research Florham Park, NJ 07932 USA Phone: +1 (973) 360 8904 Fax: +1 (973) 360 8178 E-mail: mthorup@research.att.com