INTERNET-DRAFT H. Berkowitz Geotrain Expiration Date: May 1998 November 1997 To Be Multihomed: Requirements & Definitions draft-berkowitz-multirqmt-00.txt 1. Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as ``work in progress.'' To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). 2. Abstract As organizations find their Internet connectivity increasingly critical to their mission, they seek ways of making that connectivity more robust. The term ''multi-homing'' often is used to describe means of fault-tolerant connection. Unfortunately, this term covers a variety of mechanisms, including naming/directory services, routing, and physical connectivity. This memorandum presents a systematic way to define the requirement for resilience, and a taxonomy for describing mechanisms to achieve it. Multiple mechanisms may be appropriate for specific situations, including DNS, BGP, etc. 3. Introduction As the Internet becomes more ubiquitous, more and more enterprises connect to it. Some of those enterprises, such as Web software vendors, have no effective business if their connectivity fails. Other enterprises do not have mission-critical Internet applications, but become so dependent on routine email, news, web, and similar access that a loss of connectivity becomes a crisis. As this Internet dependence becomes more critical, prudent management suggests there be no single point of failure that can break all Internet connectivity. The term "multihoming" has come into vogue to describe various means of enterprise-to-service provider connectivity that avoid a single point of failure. Multihoming also can describe connectivity between Internet Service Providers and "upstream" Network Service Providers. There are other motivations for complex connectivity from enterprises to the Internet. Mergers and acquisitions, where the joined enterprises each had their own Internet access, often mean complex connectivity, at least for a transition period. Consolidation of separate divisional networks also creates this situation. A frequent case arises when a large enterprise decides that Internet access should be available corporate-wide, but their research labs have had Internet access for years -- and it works, as opposed to the new corporate connection that at best is untried. Many discussions of multihoming focus on the details of implementation, using such techniques as the Border Gateway Protocol (BGP) [RFC number of the Applicability Statement], multiple DNS entries for a server, etc. This document suggests that it is wise to look systematically at the requirements before selecting a means of resilient connectivity. One implementation technique is not appropriate for all requirements. There are special issues in implementing solutions in the general Internet, because poor implementations can jeopardize the proper function of global routing or DNS. An incorrect BGP route advertisement injected into the global routing system is a problem whether it originates in an ISP or in an enterprise. 4. Goals Requirements tend to be driven by one or more of several major goals for server availability and performance. Availability goals are realized with resiliency mechanisms, to avoid user-perceived failures from single failures in servers, routing systems, or media. Performance goals are realized by mechanisms that distribute the workload among multiple machines such that the load is equalized. Like multi-homing, the terms load-balancing and load-sharing have many definitions. Paul Ferguson defines load-balancing as "a true "50/50" sharing of equal paths. This can be done by either (a) round robin per- packet transmission, (b) binding pipes a the lower layers such that bits are either 'bit-striped' across all parallel paths (like the etherchannel stuff), or binding pipes so that SAR functions are done in a method such as multilink PPP. These are fundamentally the same. "Load-sharing is quite different. It simply implies that no link is sitting idle -- that at least all links get utilized in some fashion. Usually in closest exit routing. The equity of utilization may be massively skewed. It may also resemble something along the lines of 60/40, which is reasonable." In defining requirements, the servers themselves may either share or balance the load, there may be load-sharing or load-balancing routing paths to them, or the routed traffic may be carried over load-shared or load-balanced media. The servers of interest may be inside the enterprise, or outside it. In this document, intranet servers are inside the enterprise and intended primarily for enterprise use. Multinet servers are inside the enterprise, but there is pre-authorized access by external partners. Internet servers are operated by the enterprise but intended to be accessible to the general Internet. Intranet clients have access only to machines on the intranet. Internet clients have general Internet access that may be mediated by a firewall. In the terminology of RFC1775, "To be 'on' the Internet," servers described here have "full" or a subset of client access. Client servers may not directly respond to specific IP packet from an arbitrary host, but a system such as a firewall MUST respond for them unless a security policy precludes that. Some valid security policies, for example, suppress the response of ICMP Destination Administratively Prohibited responses, because that would reveal there is an information resource being protected. RFC1775 defines full access as " a permanent (full-time) Internet attachment running TCP/IP, primarily appropriate for allowing the Internet community to access application servers, operated by Internet service providers. Machines with Full access are directly visible to others attached to the Internet, such as through the Internet Protocol's ICMP Echo (ping) facility. The core of the Internet comprises those machines with Full access." This definition is extended here to allow full firewalls or screening routers always to be present. If a proxy or address translation service exists between the real machine and the Internet, if this service is available on a full-time basis, and consistently responds to requests sent to a DNS name of the server, it is considered to be full-time. In this discussion, we generalize the definition beyond machines primarily appropriate for the Internet community as a whole, to include in-house and authorized partner machines that use the Internet for connectivity. RFC1775 also defines "client machines," on which the user runs applications that employ Internet application protocols directly on their own computer platform, but might not be running underlying Internet protocols (TCP/IP), might not have full-time access, such as through dial-up, or might have constrained access, such as through a firewall. When active, Client users might be visible to the general Internet, but such visibility cannot be predicted. For example, this means that most Client access users will not be detected during an empirical probing of systems "on" the Internet at any given moment, such as through the ICMP Echo facility. 4.1 Specific server availability The first goal involves well-defined applications that run on specific servers visible to the Internet at large. This will be termed "endpoint multihoming", emphasizing the need for resilience of connectivity to well-defined endpoints. Solutions here often involve DNS mechanisms. There are both availability and performance goals here. Availability goals arise when there are multiple routing paths that can reach the server, protecting it from single routing failures. Other availability goals involve replicated servers, so that the client will reach a server regardless of single server failures. Performance goals include balancing client requests over multiple servers, so that one or more servers do not become overloaded and provide poor service. Requests can be distributed among servers in a round-robing fashion, or more sophisticated distribution mechanisms can be employed. Such mechanisms can consider actual real-time workload on the server, routing metric from the client to the server, known server capacity, etc. 4.2 General Internet connectivity from the enterprise The second is high availability of general Internet connectivity for arbitrary enterprise users to the outside. This will be called "internetwork multihoming". Solutions here tend to involve routing mechanisms. 4.3 Use of Internet services to interconnect "intranet" enterprise campuses The third involves the growing number of situations where Internet services are used to interconnect parts of an enterprise. This is "intranetwork multihoming". This will usually involve dedicated or virtual circuits, or some sort of tunneling mechanisms. 4.4 Use of Internet services to connect to "multinet" partners A fourth category involves use of the Internet to connect with strategic partners. True, this does deal with endpoints, but the emphasis is different than the first case. In the first case, the emphasis is on connectivity from arbitrary points outside the enterprise to points within it. This case deals with pairs of well-known endpoints. These endpoints may be linked with dedicated or virtual circuits defined at the physical or data link layer. Tunneling or other virtual private networks may be relevant here as well. There will be coordination issues that do not exist for the third case, where all resources are under common control. 5. Planning and Budgeting In each of these scenarios, organization managers need to assign some economic cost to outages. Typically, there will be an incident cost and an incremental cost based on the length or scope of the connectivity loss. Ideally, this cost is then weighted by the probability of outage. A weighted exposure cost results when the outage cost is multiplied by the probability of the outage. Resiliency measures modify the probability, but increase the cost of operation. Operational costs obviously include the costs of redundant mechanisms (i.e., the addititional multihomed paths), but also the incremental costs of personnel to administer the more complex mechanisms -- their training and salaries. 6. Issues 6.1 Performance vs. Robustness: the Cache Conundrum Goals of many forms of "multi-homing" conflict with goals of improving local performance. For example, DNS queries normally are cached in DNS servers, and in the requesting host. From the performance standpoint, this is a perfectly reasonable thing to do, reducing the need to send out queries. >From the multihoming standpoint, it is far less desirable, as application-level multihoming may be based on rapid changes of the DNS master files. The binding of a given IP address to a DNS name can change rapidly. 6.2 Symmetry Global Internet routing is not necessarily optimized for best end-to-end routing, but for efficient handling in the Autonomous Systems along the path. Many service providers use "closest exit" routing, where they will go to the closest exit point from their perspective to get to the next hop AS. The return path, however, is not necessarily of a mirror image of the path from the original source to the destination. Especially when the enterprise network has multiple points of attachment to the Internet, either to a single ISP AS or to multiple ISPs, it becomes likely that the response to a given packet will not come back at the same entry point in which it left the enterprise. This is probably not avoidable, and troubleshooting procedures and traffic engineering have to consider this characteristic of multi-exit routing. 6.3 Security ISPs may be reluctant to let user routing advertisements or DNS zone information flow directly into their routing or naming systems. Users should understand that BGP is not intended to be a plug-and-play mechanism; manual configuration often is considered an important part of maintaining integrity. Supplemental mechanisms may be used for additional control, such as registering policies in a registry [RPS, RA documents] or egress/ingress filtering [Ferguson draft] Challenges may arise when client security mechanisms interact with fault tolerance mechanisms associated with servers. For example, if a server address changes to that of a backup server, a stateful packet screening firewall might not accept a valid return. Similarly, unless servers back one another up in a full mirroring mode, if one end of a TCP-based application connection fails, the user will need to reconnect. As long as another server is ready to accept that connection, there may not be major user impact, and the goal of high availability is realized. High availability and user transparent high availability are not synonymous. 7. Application/Transport/Name Multihoming [****Folks -- I am not a DNS expert. I need help and/or a coauthor here. Alternatively, may I suggest someone might want to write a detailed DNS multihoming RFC that parallels Tony & Yakov's document on BGP multihoming?] While many people look at the multihoming problem as one of routing, various solutions may be based more on DNS than on routing. The basic idea here is that arbitrary clients will first request access to a resource by its DNS name, and certain DNS servers will resolve the same name to different addresses based on conditions of which DNS is aware, or using some statistical load-distribution mechanism. There are some general DNS issues here. DNS was not really designed to do this. A key issue is that of DNS cacheing. Cacheing and frequent changes in name resolution are opposite goals. Traditional DNS schemes emphasize performance over resiliency. [RFC1034] "The meaning of the TTL field is a time limit on how long an RR can be kept in a cache. This limit does not apply to authoritative data in zones; it is also timed out, but by the refreshing policies for the zone. The TTL is assigned by the administrator for the zone where the data originates. While short TTLs can be used to minimize caching, and a zero TTL prohibits caching, the realities of Internet performance suggest that these times should be on the order of days for the typical host. If a change can be anticipated, the TTL can be reduced prior to the change to minimize inconsistency during the change, and then increased back to its former value following the change" [discuss limitations/behavior of basic round robin] Dynamic DNS may be a long-term solution here. In the short term, setting very short TTL values may be appropriate. Remember that the name normally is resolved when an application session first is established, and the decisions are made over a longer time base than per-packet routing decisions. 7.1 Servers in Multiple Address Spaces [Kent England] Have you ever had a case where a multi-homed site used address overlays, one set of addresses from within ISP#1 CIDR block and another set of addresses from within ISP#2 CIDR block? I would call this application level multi-homing as opposed to network level multihoming, with a single set of servers (web, mail, ftp) with overlay addresses using redundant access paths, controlled via DNS. Seems to me it should be workable without BGP and allow finer grained load sharing (or balancing?) than BGP.? [Paul Vixie If you want to load balance, you can use multiple A records and it works until one of the provides goes down. Then, only * get through (unless the client is bright about trying all addresses, which some are). 7.2 Coordinated DNS [This is the Cisco Distributed Director strategy] 7.3 Other methods/software? 8. Network/Routing Multihoming A common concern of enterprise financial managers is that multihoming strategies involve expensive links to ISPs, but, in some of these scenarios, alternate links are used only as backups, idle much of the time. Detailed analysis may reveal that the cost of forcing these links to be used at all times, however, exceeds the potential savings. The intention here is to focus on requirements rather than specifics of the routing implementation, several approaches to which are discussed in RFC1998 and draft-bates-multihoming-01.txt. Operational as well as technical considerations apply here. While the Border Gateway Protocol could convey certain information between user and provider, many ISPs will be unwilling to risk the operational integrity of their global routing by making the user network part of their internal BGP routing systems. ISPs may also be reluctant to accept BGP advertisements from organizations that do not have frequent operational experience with this complex protocol. 8.1 Single-homed (R1) The enterprise generally does not have its own ASN; all its advertisements are made through its ISP. The enterprise uses default routes to the ISP. The customer is primarily concerned with protecting against link or router failures, rather than failures in the ISP routing system. 8.1.1 Single-homed, single-link (R1.1) There is a single active data link between the customer and provider. Variations could include switched backup over analog or ISDN services. Another alternative might be use of alternate frame relay or other PVCs to an alternate ISP POP. 8.1.2 Single-homed, balanced link (R1.2) In this configuration, multiple parallel data links exist from a single customer router to an router. There is protection against link failures. The single customer router constraint allows this router to do round- robin packet-level load balancing across the multiple links, for resiliency and possibly additional bandwidth. The ability of a router to do such load-balancing is implementation-specific, and may be a significant drain on the router's processor. 8.1.3 Single-homed, multi-link (R1.3) Here, we have separate paths from multiple customer routers to multiple ISP routers at different POPs. Default routes generated at each of the customer gateways are injected into the enterprise routing system, and the combination internal and external metrics are considered by internal routers in selecting the external gateway. This often is attractive for enterprises that want resiliency but wish to avoid the complexity of BGP. 8.1.4 Special Cases While the customer in this configuration is still single-homed, an AS upstream from the ISP has a routing policy that makes it necessary to distinguish routes originating in the customer from those originating in the ISP. In such cases, the enterprise may need to run BGP, or have the ISP run it on its behalf, to generate advertisements of the needed specificity. Since the same basic topologies discussed above apply, we can qualify them as R1.1B, R1.2B, and R1.3B. It MAY be possible for the customer to avoid using BGP, if its adjacent ISP will set a BGP community attribute, understood by the upstream, on the customer prefixes [RFC1998]. Doing so results in the cases R1.1C, R1.2C, and R1.3C. 8.2 Multi-homed Routing The enterprise connects to more than one ISP, and desires to protect against problems in the ISP routing system. It will accept additional complexity and router requirements to get this. The enterprise may also have differing service agreements for Internet access for different divisions. 8.2.1 Multi-homed, primary/backup, single link (R2.1) The enterprise connects to two or more ISPs from a single router, but has a strict policy that only one ISP at a time will be used for default. In an OSPF environment, this would be done by advertising defaults to both ISPs, but with different Type 2 external metrics. The primary ISP would have the lower metric. BGP is not necessary in this case. This easily can be extended to multi-link. 8.2.2 Multi-homed, differing internal policies (R2.2) In this example, assume OSPF interior routing. The main default for the enterprise comes from one or more ASBRs in Area 0, all routing to the same ISP. One or more organizations brought into the corporate network have pre-existing Internet access agreements with an ISP other than the corporate ISP, and wish to continue using this for their "divisional" Internet access. This is frequent when a corporation decides to have general Internet access, but its research arm has long had its own Internet connectivity. Mergers and acquisitions also produce this case. In this situation, an additional ASBR(s) are placed in the OSPF areas associated with the special-case, and this ASBR advertises default. Filters at the Area Border Router block the divisional ASBR's default from being advertised into Area 0, and the corporate default from being advertised into the division. Note that these filters do not block OSPF LSAs, but instead block the local propagation of selected default and external routes into the Routing Information Base (i.e., main routing table) of a specific router. 8.2.3 Multi-homed, "load shared" with primary/backup (R2.3) [Thanks to Paul Ferguson for the distinction between load balancing and load sharing.] While there still is a primary/backup policy, there is an attempt to make active use of both the primary and backup providers. The enterprise runs BGP, but does not take full Internet routing. It takes partial routing from the backup provider, and prefers the backup provider path for destinations in the backup provider's AS, and perhaps directly connected to that AS. For all other destinations, the primary provider is the preferred default. A less preferred default is defined to the second ISP, but this default is advertised generally only if connectivity is lost to the primary ISP. 8.2.4 Multi-homed, global routing aware (R2.4) Multiple customer router receive a full routing table, and, using appropriate filtering and aggregation, advertise different destinations (i.e., not just default) internally. This requires BGP, and, unless dealing with a limited number of special cases, requires significantly more resources inside the organization. 8.3 Transit. While we usually think of this in terms of ISPs, some enterprises may provide Internet connectivity to strategic partners. They do not offer Internet connectivity on a general basis. 8.3.1 Full iBGP mesh (R3.1) Connectivity and performance requirements are such that a full iBGP mesh is practical. 8.3.2 Scalable IBGP required (R3.2) The limits of iBGP full mesh have been reached, and confederations, route reflectors, etc., are needed for growth. 9. Addressing Refinements and Issues It is arguable that addressing used to support multihoming is a routing deployment issue, beyond the scope of this document. The rationale for including it here is that addressing MAY affect application behavior. If the enterprise runs applications that embed network layer addresses in higher-level data fields, solutions that employ address translation, at the packet or virtual connection level, MAY NOT work. Use of such applications inherently is a requirement for the eventual multihoming solution. Consideration also needs to be given to application caches in addition to those of DNS. Firewall proxy servers are a good example where multiple addresses associated with a given destination may not be supported. RFC1918 internal, NAT RFC1918 internal, PAT Registered internal, Provider Assigned (PA) Registered internal, Provider Independent (PI) 10. Transmission Considerations in Multihoming "Multihoming" is not logically complete until all single points of failure are considered. With the current emphasis on routing and naming solutions, the lowly physical layer often is ignored, until a physical layer failure dooms a lovely and sophisticated routing system. Physical layer diversity can involve significant cost and delay. Nevertheless, it should be considered for mission-critical connectivity. The principal transmission impairment, the backhoe, can be viewed at http://www.cat.com/products/equip/bhl/bhl.htm 10.1 Local Loop >From a typical server room, analog and digital signals physically flow to a wiring closet, where they join a riser cable. The riser cable joins with other riser cables in a cable vault, from which a cable leaves the building and goes to the end switching office of the local telecommunications provider. Most buildings have a single cable vault, possibly with multiple cables following a single physical route back to the end office. A single error by construction excavators can cut multiple cables on a single path. A failure in carrier systems can isolate a single end office. Highly robust systems have physical connectivity to two or more POPs reached through two or more end offices. Alternatives here can become creative. On a campus, it can be feasible to use some type of existing ductwork to run additional cables to another building that has a physically diverse path to the end office. Direct wire burial, fiber optic cables run in the air between buildings, etc., are all possible. In a non-campus environment, it is possible, in many urban areas, to find alternate means of running physical media to other buildings with alternate paths to end offices. Electrical power utilities may have empty ducts which they will lease, and through which privately owned fiber can be run. 10.2 Provider Core As demonstrated by a rash of fiber cuts in early 1997, carriers lease bandwidth from one another, so a cut to one carrier-owned facility may affect connectivity in several carriers. This reality makes some traditional diverse media strategies questionable. Many organizations consciously obtain WAN connectivity from multiple carriers, with the notion that a failure in carrier will not affect another. This is not a valid assumption. If the goal is to obtain diversity/resiliency among WAN circuits, it may be best to deal with a single service provider. The contract with this provider should require physical diversity among facilities, so the provider's engineering staff will be aware of requirements not to put multiple circuits into the same physical facility, owned by the carrier or leased from other carriers. 11. Security Considerations 12. Acknowledgments 13. References [RFC1775] [RFC1930] [RFC1034] [RFC----] BGP-4 Applicability Statement [RFC1998] [RFC2071] Ferguson, P., and H. Berkowitz, "Network Renumbering Overview: Why would I want it and what is it anyway?", RFC 2071, January 1997. [RFC2050] Hubbard, K., Kosters, M., Conrad, D., Karrenberg, D., and J. Postel, "INTERNET REGISTRY IP ALLOCATION GUIDELINES", BCP 12, RFC 2050, November 1996. [RFC1631] Egevang,, K., and P. Francis, "The IP Network Address Translator (NAT)", RFC 1631, May 1994. [RFC1918] Rekhter, Y., Moskowitz, R., Karrenberg, D., de Groot, G-J., and E. Lear, "Address Allocation for Private Internets", RFC 1918, February 1996. [RFC1900] Carpenter, B., and Y. Rekhter, "Renumbering Needs Work", RFC 1900, February 1996. [RPS] Alaettinoglu, C., Bates, T., Gerich, E., Terpstra, M., and C. Villamizer, "Routing Policy Specification Language", Work in Progress. [RFC1812] Baker, F., "Requirements for IP Version 4 Routers", RFC 1812, June 1995. 14. Author's Address Howard C. Berkowitz Geotrain Corporation (formerly Protocol Interface & PSC International) 1600 Spring Hill Road, Suite 310 Vienna VA 22182 Phone: +1 703 998 5819 EMail: hcb@clark.net