Internet Draft I. van Beijnum Document: draft-van-beijnum-multi6-isp-int-aggr-00.txt October 2002 Expires: April 2003 Provider-Internal Aggregation based on Geography to Support Multihoming in IPv6 1 Mandatory Statements This document is an Internet-Draft and is subject to all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html 2 Abstract Current 6bone backbone routing guidelines prohibit traditional multihoming in IPv6, because current IPv4-style multihoming doesn't scale. This stands in the way of successful adoption of IPv6. The solution outlined in this memo proposes aggregating the routing information for multihomed destinations inside service provider networks based on geography to accomplish scalable multihoming in IPv6 using current protocols and implementations. This solution does not require network operators to increase the density of interconnection; nor does it require significant cooperation or simultaneous adoption. 3 Introduction Current IPv4 and IPv6 interdomain routing operational practices depend heavily on aggregation in order to reach the necessary scalability. Current aggregation is exclusively service provider based: ISPs (Internet Service Providers) obtain blocks of address space from the Regional Internet Registries (RIRs) and assign their customers addresses from these blocks. Then they announce a single route for each block to other networks. This aggregation makes it possible for millions of Van Beijnum Page 1 draft-van-beijnum-multi6-isp-int-aggr-00.txt October 2002 organizations to be connected to the internet while limiting the global routing table to only slightly more than a hundred thousand destination prefixes. Unfortunately, provider-based aggregation doesn't work for networks connected to the internet over more than one connection ("multi-homed" networks). In the current IPv4 internet, multihoming is typically done by announcing a route for an independent address block to two or more ISPs. The address block may actually be part of a larger PA (provider aggregatable) block, but it must be visible in the global routing table independently from possible aggregates to make multihoming work under all circumstances. This makes it impossible for many millions of networks to multihome: the global routing table would grow beyond what routers can handle. There are efforts underway to provide in IPv6 the failover and load balancing functionality present in current "IPv4 style" multihoming in different ways that wouldn't increase the size the global routing table. However, all these new multihoming solutions are still on the drawing board and need changes to protocols and implementations. In the mean time, the current 6Bone backbone routing guidelines [RFC2772] don't allow non-aggregated routes in the IPv6 global routing table and thereby make IPv4-style multihoming impossible. This draft proposes new operational practices that will allow networks to handle a much larger global routing table, so multihoming in IPv6 can be made possible within a very short time frame. However, it is very important to note this isn't a perfect "one size fits all" solution that scales to huge numbers of multihomed networks without any pain or effort. (See the Limitations section later in this document.) But at least this mechanism makes multihoming possible almost immediately, without having to wait for protocols and implementations to be changed or even for network operators to reconfigure their networks. The latter can be done later, and on a per-network basis, as the size of the global routing table becomes problematic for individual networks. The idea is to make multihoming possible now, while providing networks with the means to control the size of the routing table in their routers later as necessary. After implementing the necessary filtering mechanisms, growth to several million multihomed networks world wide should be possible without much trouble. In theory, this mechanism can support many hundreds of millions multihomed networks, but this will be hard to accomplish in practice, so work on more advanced multihoming solutions should continue. There is at least one multihoming solution under development [MHAP] that can use the same addressing mechanism as is needed for the solution proposed here, so there is potential for gradual replacement by a more permanent solution. This addressing mechanism is explained in a separate document [GAPI]. Van Beijnum Page 2 draft-van-beijnum-multi6-isp-int-aggr-00.txt October 2002 4 How It Works To make multihoming (as we know it today) possible, individual routes must be present in the global routing table. But in order to fit the routing table into a router, there must be aggregation. These requirements seem at odds with each other. This is because there is an unspoken assumption: the full global routing table must be present in all routers that are part of the default-free zone. Dropping this requirement makes everything much more complex, but it is possible. The global routing table can then be split into several parts, where individual routers all handle one (or a few) of those parts. This works as long as traffic for a certain subset of the destination networks present in the global routing table is always sent to a router containing that part of the global routing table. The obvious way to accomplish this is for each router to announce an aggregate covering the part of the global routing table it serves. For instance, if a network has four routers and wants to divide routing information for the IPv6 global unicast address space over those routers, it could have router A handle 2000::/5, router B 2800::/5, router C 3000::/5 and router D 3800::/5. So if this network peers with another network that announces 2200:abc::/35 and 3ffe:def::/35, all routers except router A filter out the first route, and all routers except router D filter out the second route. When router C then has a packet for 2200:abc:1:2::1, it sends the packet to router A (because router A announces the 2000::/5 aggregate) and router A delivers the packet to the right peer. Note that this behavior is completely hidden from the peer: the aggregates are only used within the local network, they are not announced to peers. To avoid confusion with regular provider aggregatable routes, the term "pilot routes" will be used for this type of private aggregates. This practice scales relatively well: by adding more routers, it is possible to accommodate a global routing table of arbitrary size. (These extra routers must be "border routers" that interconnect with other networks.) However, there is a major problem: traffic for certain address ranges must always first be transported to the location of the router handling this address range. So if two end-users in Europe want to communicate, but the address range for one of them is handled in North America by the other's ISP, and the other's address range is handled by a router in Japan, this traffic that has the potential to stay within the region has to circle the globe. This "scenic routing" can be avoided by assigning address space to multihomers in a geographically aggregatable manner. This way, networks can have a range of addresses be handled by a router in the region where the addresses are used. However, this is not a strict requirement. For instance, a network that only has a presence in the US doesn't necessarily have to interconnect with other networks in Europe or Asia. In practice, it will have routers at the US East Coast (where many European networks are present) handle the European address ranges, and routers at the US West Coast (where many Asian networks are present) handle the Asian address ranges. Van Beijnum Page 3 draft-van-beijnum-multi6-isp-int-aggr-00.txt October 2002 5 Operational Details First of all: more specific routes from customers are usually not filtered. They are announced to peers at all interconnect locations. It is up to the network receiving the routes to filter them. Only when two networks agree on where to exchange routing information for certain geographic aggregates, there may be outbound filtering of more specific multihomed routes. The aggregation scheme works as follows. The network is divided into zones. The exact way in which this is done depends on the particular topology of the network, and doesn't have to match the layout of other networks. Static pilot routes for all address ranges used within the zone are configured on at least two routers (for redundancy) in that zone (or as close to the zone as is practical). Then both EBGP and IBGP filters are configured per peer. The IBGP filters are applied to all sessions with routers in other zones (not to sessions with other routers within the zone) and filter out the more specific routes falling within the address ranges used in the zone. The EBGP filters do the opposite and allow only more specific routes for destinations within the region. This makes sure more specific multihomed routes are allowed in the routing table within the zone, but aren't announced over IBGP to other zones. 5.1 Interconnection Since interconnection is not an exact science, there may not be adequate interconnection within the zone with some peer networks. When this is the rule rather than the exception, this indicates the zones are too small. Increasing the zone or merging several zones will make sure there is interconnection with most peer networks within the zone itself. For the few networks for which interconnection within the zone isn't possible, EBGP filters that always allow all more specific routes are used. Also, these routes are tagged with an internal community that prevents them from being filtered in IBGP. As a result, there is no aggregation for these peers, but there is still full connectivity. It should be possible to limit this de-aggregation to a small number of zones rather than the entire network with more sophisticated filtering. 5.2 Zone Partitioning It is important that regions are never partitioned, because when this happens, packets for certain destinations will loop. The router inside the zone will route them outside the zone because of the more specifics pointing to the other partition of the zone over a router that isn't part of the zone, and the first router outside the zone will route the packets back into the region to the closest router announcing the pilot route. Van Beijnum Page 4 draft-van-beijnum-multi6-isp-int-aggr-00.txt October 2002 5.3 Example Picture The following picture represents an AS with four routers and eight peers, divided into two zones that each handle routing for three regions: [S] [T] [U] [V] | | | | b E c E ZONE 1 E c E c a B b B REGIONS B c B b a G a G A, B, C G b G a v P v P P v P v | | | | +--+-----+-+ +-+-----+--+ C | | | | C B I +-|----I-B-G-P-----+ I B A | | | GFE> +-+-----+--+ | | | | ^ E ^ E E ^ E ^ e B e B ZONE 2 B e B e f G f G REGIONS G f G g f P g P E, F, G P g P g | | | | [W] [X] [Y] [Z] [S], [T], [U], [V], Peer EBGP routers [W], [X], [Y], [Z] RTR 1, RTR 2 Routers in zone 1 RTR 3, RTR 4 Routers in zone 2 A, B, C, E, F, G Pilot (aggregate) routes a, b, c, e, f, g Individual /48 routes for end-user networks <, >, ^, v The direction of the routing information flow 6 Migration Migration from a regular, non-aggregated setup to full geographical aggregation doesn't have to be immediate. The process can be carried out is several steps: Van Beijnum Page 5 draft-van-beijnum-multi6-isp-int-aggr-00.txt October 2002 1. The border router handling most of the traffic to a specific geographical destination or aggregate of several destinations is promoted to "designated router" for the matching address range. The designated router is configured to announce a pilot route over IBGP and with filters that don't allow more specifics for the destinations covered by the pilot route to be announced over IBGP to non-border routers. Now only border routers have the more specific routes. 2. Border routers are configured with EBGP filters to filter out incoming more specific routes covered by pilot routes announced by far away designated routers. (For instance, routers in Europe are configured to filter out American more specifics for which an American router announces a pilot route.) The designated router is configured to no longer send these more specifics over IBGP to the routers that now filter those same routes on EBGP sessions. (For the American routers, their European IBGP neighbors now essentially become part of the group "non-border routers".) Now each border router only has a subset of all multihomed more specifics in its routing table. Step 1 can be implemented on individual routers one at a time, and, barring configuration mistakes, doesn't pose any risks. There is only one pilot route, and only more specific routes announced by the same router as the pilot route are suppressed. Since both the new pilot route and the now suppressed more specific routes point to the same border router, the way packets are routed through the network is completely identical and there is no risk of loops. If different a router than the designated router has the preferred external route for a more specific, this more specific route will be announced as before, since only the designated router is configured to filter out these more specifics. When the designated router is the one holding the best external route, non-border routers won't see any more specific routes for this destination. The designated router has a filter, and the other border routers don't announce the route over IBGP because they aren't the ones holding the best route. To aid aggregation, the designated router can be configured to increase the IBGP Local Preference attribute for the more specifics it acts as designated router for. This way, the route over the designated router is always preferred, even if another router has a matching more specific with a shorter AS path or better Multi Exit Discriminator metric. When the designated router becomes unreachable or loses its external routes, there will be automatic de-aggregation: more specific routes are announced by other routers. Step 2 can also be implemented one router at a time. The new EBGP filters should be installed first, after which the designated router can be configured to no longer announce more specifics to the border routers with the new EBGP filters. If this is done the other way around, more specifics will leak over IBGP and there will be non-optimal routing. Without step 2, there is no aggregation in border routers: they need to Van Beijnum Page 6 draft-van-beijnum-multi6-isp-int-aggr-00.txt October 2002 hear the designated router announce a "better" more specific, or they will start to announce their own over IBGP. Introducing step 2 introduces the risk that certain destinations become unreachable when there is an outage. For instance, when European routers no longer see American more specifics, and the European and American parts of the network become partitioned, it is no longer possible for the European routers to send traffic to American destinations, even if there is peering in Europe that would have made this possible before. This step should only be taken if the risk of network partitions is negligible. 7 Address Allocation Requirements In order for the practices described here to work, a new address allocation architecture must be implemented. Since several architectures are possible, and it would be beneficial to share such an architecture with other proposed IPv6 multihoming solutions, this document doesn't specify an address allocation architecture, but rather lists the requirements such an architecture must meet in order to be usable for geographic aggregation as outlined in this document. To allow coexistence between regular provider aggregatable address space which is already extensively used in IPv6 and the addresses assigned in accordance with any new allocation architecture, it must be possible to identify the type of address space easily. The level of aggregation used by network operators will very likely change radically the next five, ten or twenty years. At present, there are only a few networks in IPv6 that can be called "multihomed" in the IPv4 sense, and there are less than 20000 multihomed networks in IPv4. When geographic aggregation becomes necessary because of the growth in the number of multihomed networks, aggregation at the continent level will probably suffice at first. (Meaning all the more specific routes for a continent are present in routers throughout this continent.) As the number of multihomed networks continues to grow, it will become necessary to aggregate at the country level for small to medium sized countries (such as those in Europe) or at the state or province level for large countries (such as the United States, India or China). Eventually, the aggregation needs will reach the city/metropolitan area level. At each stage of aggregation, the number of prefixes that must be listed to identify a geographic area should be as low as possible. 8 Limitations Since this scheme depends on geography for aggregation, it only works well for organizations that connect to the internet in locations that are close together. An organization with a network spanning multiple countries and connecting to the internet in all those countries isn't geographically aggregatable, and neither is an organization connecting Van Beijnum Page 7 draft-van-beijnum-multi6-isp-int-aggr-00.txt October 2002 to ISPs very far way, for instance by means of a satellite circuit. These types of organizations must choose address space falling within a geographic area that doesn't (fully) fit if they elect to use the type of address space this aggregation scheme uses. This choice will have consequences on routing efficiency, and when the infrastructure changes, the organization may need to adopt a new address range to minimize the routing efficiencies created by the change. 9 Route Visibility for Customers In order to be able to do traffic engineering for outbound traffic, multihomed customers need to receive a consistent view of the global routing table from all their ISPs. If the aggregation levels of different ISPs used by a multihomed customer don't match, because of the longest match first rule, most of the traffic will flow over the ISP doing the least aggregation. To avoid this, ISPs are strongly encouraged to provide their customers with a full, unaggregated view of the global routing table. If an ISP aggregates internally, such a view could be obtained by the customer by having an EBGP (multihop if necessary) session with one or more route servers, in addition to the regular EBGP session to the next hop router. ISPs should also provide their customers with pilot routes at all aggregation levels, even if the ISPs themselves don't (yet) aggregate. This makes it possible for customers to filter out more specifics and still maintain a consistent view of the global routing table. If an ISP can't do this immediately (adding a large number of pilot routes is a lot of work) the ISP should establish a time frame for implementing the necessary pilot routes and communicate this to existing and potential customers. A reasonable time frame would be six months to implement continent/country/province/state level pilot routes for the whole world, a year to implement metropolitan area pilot routes for the regions the ISP is active, and 18 months to implement world wide metropolitan area pilot routes, starting from the moment a geographically aggregatable address allocation mechanism is implemented. 10 Traffic Flow Larger ISP and ISP-like networks that interconnect with other networks in more than one location must have a policy on how to select the interconnect location used for traffic to those other networks. At present, the most widely adopted policy is "early exit" or "hot potato": packets are routed to the closest interconnect location where the other network is present and delivered to the destination network there. As a result, packets travel most of the way over the destination network. If both networks use the early exit policy, traffic in one direction will travel most of the way over one network, and traffic in the other direction most of the way over the other network, so the policy is "fair" as long as the traffic volumes are fairly equal in both Van Beijnum Page 8 draft-van-beijnum-multi6-isp-int-aggr-00.txt October 2002 directions. This policy is implemented by not changing the default behavior for the most widely available BGP implementations. Since the aggregation scheme described in this document requires traffic to be transported to a location where more specific routing information is known, and this location is presumably close to the destination of the packet, adoption of this scheme leads to a "late exit" routing policy for multihomed traffic. Assuming early exit is still used for single homed traffic, there are four possible permutations for the traffic flow between any two hosts: 1. Hosts A and B both single homed: both early exit = "fair" 2. Host A single homed, host B multihomed: traffic is exchanged close to host B = host A's network does most of the work 3. Host A multihomed, host B single homed: traffic is exchanged close to host A = host B's network does most of the work 4. Hosts A and B both multihomed: both late exit = "fair" Since networks can control the level of late exit routing by (selectively) de-aggregating and many interconnection (peering) agreements call for equal traffic volumes in both directions, the potential for changes in the flow of traffic should not adversely affect existing networks. 11 IANA Considerations The Regional Internet Registries should take the requester's geographic location into consideration when assigning address space. If this scheme is adopted, the number of networks requiring an Autonomous System number will rise beyond what can be accommodated using the current 16-bit AS number space. There is a draft proposing the use of 32-bit AS numbers [32bitAS]. Since having a universally recognized AS number is less important for a multihomed "leaf" network than for a transit network, it is recommended that the 32-bit AS number capability be implemented as soon as possible. All multihomed networks requesting an AS number that are capable of using a 32-bit AS number should be assigned an AS number higher than 65535, so 16-bit compatible AS numbers remain available for transit networks. 12 Security Considerations This aggregation scheme doesn't propose any changes to protocols or implementations, so it doesn't introduce any new protocol or implementation risks. However, there is one problem: since routing information is removed from large parts of the network, it is no longer possible to use the routing table to do ingress filtering [RFC2267] using the "unicast RPF" feature implemented by several router vendors. The alternative, having statically configured filter lists, doesn't scale. This leaves networks implementing this aggregation scheme with no Van Beijnum Page 9 draft-van-beijnum-multi6-isp-int-aggr-00.txt October 2002 protection against incoming packets with falsified source addresses, so it is highly recommended that network operators make sure they don't generate or accept from customers packets with falsified source addresses and that vendors implement mechanisms to trace back the source of these falsified packets. 13 Document and Author Information This document expires April, 2003. The latest version of this document will always be available at http://www.muada.com/drafts/. Comments are welcome at: Iljitsch van Beijnum Karel Roosstraat 95 2571 BG Den Haag Netherlands Email: iljitsch@muada.com 14 References [RFC2267] RFC 2267, "Network Ingress Filtering: Defeating Denial of Service Attacks which employ IP Source Address Spoofing" [RFC2772] RFC 2772, "6Bone Backbone Routing Guidelines" [32bitAS] "BGP support for four-octet AS number space", work in progress [GAPI] "A Geographically Aggregatable Provider Independent Address Space to Support Multihoming in IPv6", work in progress [MHAP] MHAP draft, work in progress Van Beijnum Page 10