Network Working Group R. Whittle Internet-Draft First Principles Intended status: Experimental July 15, 2007 Expires: January 16, 2008 Ivip (Internet Vastly Improved Plumbing) Architecture draft-whittle-ivip-arch-00.txt Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on January 16, 2008. Copyright Notice Copyright (C) The IETF Trust (2007). Whittle Expires January 16, 2008 [Page 1] Internet-Draft Ivip Architecture July 2007 Abstract Ivip (Internet Vastly Improved Plumbing) is a proposed global system of routers and either collection of databases which control the tunneling of some of these routers. Database changes affect all Ingress Tunnel Routers (ITRs) within a few seconds, controlling which Egress Tunnel Router (ETR) they tunnel each packet to, depending on the packet's destination address. The ETR used by a host with an Ivip-mapped address is typically located in the same network as this destination host. The ETR decapsulates packets and forwards them to the destination host. A second type of ETR known as a Translating Tunnel Router (TTR) is used for mobile-IP, with the mobile node creating two-way tunnels to one or more nearby TTRs. Ivip enables a subset of IPv4 and IPv6 address space to be portable (used via any ISP which has an ETR) and to be suitable for multihoming (connection to the Net via two or more ISPs) - without involving BGP and without requiring any changes to host operating systems or applications. This is a form of "locator-ID separation" and is based on some principles derived from LISP (Locator/ID Separation Protocol). IP addresses in the subset of address space which is subject to being tunneled by ITRs are known as Destination Identifiers (DIDs). ITRs and ETRs are located on ordinary BGP Reachable IP (BRIP) addresses. The databases and ITRs map DID addresses to an ETR's BRIP address with a granularity of a single IPv4 address or a /64 prefix for IPv6. These two granularities are 256 and 64k times finer than is typically possible with BGP. This proposal is intended to resolve many of the problems discussed in the October 2006 Amsterdam IAB Routing and Addressing Workshop (RAWS). Ivip's primary goals include the more efficient utilisation of IPv4 space and enabling millions of end- users to achieve portability and multihoming without involving BGP, without fuelling the growth of the global BGP routing table, and without requiring these end users to have ASNs or to acquire conventional prefixes of PI (Provider Independent) BGP reachable address space. Whittle Expires January 16, 2008 [Page 2] Internet-Draft Ivip Architecture July 2007 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1. Brainstorming phase . . . . . . . . . . . . . . . . . . . 5 1.2. Postal redirection analogy . . . . . . . . . . . . . . . 7 1.3. LISP and ID/LOC separation . . . . . . . . . . . . . . . 9 1.4. One way tunnels to a single ETR . . . . . . . . . . . . . 11 1.5. Anycast ITRs . . . . . . . . . . . . . . . . . . . . . . 13 1.6. Types of ETR . . . . . . . . . . . . . . . . . . . . . . 15 1.7. Types of ITR . . . . . . . . . . . . . . . . . . . . . . 15 1.7.1. ITRD - full database (push) . . . . . . . . . . . . . 15 1.7.2. ITRC - query, cache (pull) and notify . . . . . . . . 16 1.7.3. ITFH - Ingress Tunnel Function in Host . . . . . . . 16 1.8. Initial deployment . . . . . . . . . . . . . . . . . . . 17 1.8.1. Paths taken by packets . . . . . . . . . . . . . . . 19 1.8.2. Multihoming when both links are working . . . . . . . 21 1.8.3. External multihoming monitoring system . . . . . . . 22 1.8.4. Multihoming after a link fails . . . . . . . . . . . 23 1.8.5. Potential problems with internal routing systems . . 24 1.9. Ivip's intended benefits . . . . . . . . . . . . . . . . 25 1.10. Long term deployment . . . . . . . . . . . . . . . . . . 27 2. Definition of Terms, Concepts and Functions . . . . . . . . . 30 2.1. IMIP - Ivip-Mapped IP address . . . . . . . . . . . . . . 30 2.2. NIMIP - Non-Ivip-mapped IP address . . . . . . . . . . . 31 2.3. BRIP - BGP Reachable IP address . . . . . . . . . . . . . 31 2.4. UAIP - Un-Advertised IP address . . . . . . . . . . . . . 31 2.5. DID - Destination Identifier . . . . . . . . . . . . . . 31 2.6. TELOC - Tunnel Endpoint Locator . . . . . . . . . . . . . 32 2.7. IMAB - Ivip-Mapped Address Block . . . . . . . . . . . . 32 2.8. IMAB-DB - IMAB DataBase . . . . . . . . . . . . . . . . . 33 2.9. IMAB-DBD - IMAB DataBase Dump . . . . . . . . . . . . . . 34 2.10. UMUC - User Mapping Update Command . . . . . . . . . . . 35 2.11. SUMUC - Signed User Mapping Update Command . . . . . . . 35 2.12. SH/SN - Sending Host/Node . . . . . . . . . . . . . . . . 35 2.13. RH/RN - Receiving Host/Node . . . . . . . . . . . . . . . 36 2.14. IRH/IRN - Ivip-mapped Receiving Host/Node . . . . . . . . 36 2.15. MH/MN - Mobile Host/Node . . . . . . . . . . . . . . . . 36 2.16. UAS - Update Authorisation System . . . . . . . . . . . . 36 2.17. RUAS - Root Update Authorisation System . . . . . . . . . 37 2.18. US-IMAB - Update Stream specific to one IMAB . . . . . . 37 2.19. US-Complete - Update Stream for the Complete Ivip system . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.20. Replicator . . . . . . . . . . . . . . . . . . . . . . . 38 2.21. QSD - Query Server with full Database . . . . . . . . . . 38 2.22. QSC - Query Server with Cache . . . . . . . . . . . . . . 39 2.23. ITR - Ingress Tunnel Router . . . . . . . . . . . . . . . 39 2.24. ITRD - Ingress Tunnel Router with Database . . . . . . . 39 2.25. ITRC - Ingress Tunnel Router with Cache . . . . . . . . . 40 Whittle Expires January 16, 2008 [Page 3] Internet-Draft Ivip Architecture July 2007 2.26. ITFH - Ingress Tunneling Function in Host . . . . . . . . 42 2.27. ETR - Egress Tunnel Router . . . . . . . . . . . . . . . 43 2.28. ETFH - Egress Tunnel Function in Host . . . . . . . . . . 43 2.29. TTR - Translating Tunnel Router for Mobile-IP . . . . . . 43 3. The Crisis in Routing and Addressing . . . . . . . . . . . . 45 3.1. Interrelated needs and problems . . . . . . . . . . . . . 45 3.2. Constraints on possible solutions . . . . . . . . . . . . 46 4. Potential Solutions . . . . . . . . . . . . . . . . . . . . . 48 5. Comparison with LISP . . . . . . . . . . . . . . . . . . . . 49 5.1. LISP principles and mechanisms used by Ivip . . . . . . . 49 5.2. LISP principles and mechanisms not used by Ivip . . . . . 50 5.3. Additional principles and mechanisms in Ivip . . . . . . 53 6. Ivip's goals, non-goals and challenges . . . . . . . . . . . 55 7. User Interface and Update Authorities . . . . . . . . . . . . 56 8. Replicators . . . . . . . . . . . . . . . . . . . . . . . . . 63 9. Query Servers - QSD and QSC . . . . . . . . . . . . . . . . . 70 10. Ingress Tunnel (ITR) strategies . . . . . . . . . . . . . . . 71 11. Egress Tunnel (ETR) strategies . . . . . . . . . . . . . . . 78 12. Mobile-IP with TTRs . . . . . . . . . . . . . . . . . . . . . 79 13. IPv6 and longer term strategies . . . . . . . . . . . . . . . 80 14. Loose ends . . . . . . . . . . . . . . . . . . . . . . . . . 81 14.1. ETRs checking src & dest addresses . . . . . . . . . . . 81 14.1.1. Short version . . . . . . . . . . . . . . . . . . . . 81 14.1.2. ITR tunneled packet with source address of sending host . . . . . . . . . . . . . . . . . . . . . . . . 82 14.2. Scaling the Replicator network . . . . . . . . . . . . . 96 14.3. Is fast, secure, Replication possible on the Internet? . 97 14.4. TTRs and Mobility . . . . . . . . . . . . . . . . . . . . 98 15. Security Considerations . . . . . . . . . . . . . . . . . . . 103 16. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 104 17. Informative References . . . . . . . . . . . . . . . . . . . 105 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 106 Appendix B. The Ivip acronym . . . . . . . . . . . . . . . . . . 107 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 108 Intellectual Property and Copyright Statements . . . . . . . . . 109 Whittle Expires January 16, 2008 [Page 4] Internet-Draft Ivip Architecture July 2007 1. Introduction 1.1. Brainstorming phase The purpose of this Internet Draft is to contribute to the development of one or more proposals to resolve the problems of what might be called the Crisis in Routing and Addressing. Ivip is one proposal among potentially many. Ivip is at an early stage of development and this I-D is part of what I regard as a brainstorming effort on the RAM mailing list. Consequently this I-D contains more exploratory and speculative material than the architectural RFC it may one day become. Most of the discussion below focuses on IPv4 except where noted. The goal is to develop a practical, elegant, incrementally deployable model of Ivip for IPv4. Once a promising model for IPv4 has been developed, full consideration should be given to IPv6 and to what degree these two separate Ivip networks might be integrated. Consideration should also be given to how a globally deployed Ivip system might support IPv6 and the scalable tunneling of IPv6 traffic over IPv4 and vice-versa. The remainder of this long Introduction is intended primarily for readers - such as members of the RAM list - who are already familiar with RAWS, LISP and with other proposals, including especially with the limitations of BGP routers which are the primary reason why the Internet needs a new routing and addressing architecture. Following this Introduction is a section describing in detail Ivip's major Terms, Concepts and Functions. The three sections following this provide a fuller grounding for readers who are new to this field, introducing the RAWS report on the Crisis, other solutions, and a comparison with LISP. Once these sections have been read, the Introduction should make more sense to readers who were not yet familiar with these. Following these are sections which contain further discussion and diagrams regarding various deployment scenarios and about how the ITR, ETR, TTR, Replicators and Query Server functions of Ivip can be implemented in conventional routers, in servers, and in some cases within hosts. Finally, in the "Loose ends" section, is some material which I don't have time to refine and integrate smoothly into this version of the draft. This includes a section on ensuring ETRs are not a backdoor around security arrangements which prevent attackers sending packets with spoofed source addresses. There is also a section which questions whether the Internet itself is a suitable basis for building the fast, secure, high-volume system of RUAS servers and Replicators. Even if it was secured with cryptographic techniques, Whittle Expires January 16, 2008 [Page 5] Internet-Draft Ivip Architecture July 2007 it would still be vulnerable to DoS attacks from botnets. The last "Loose ends" section describes Translating Tunnel Routers and how they may be used with Ivip's ITR system to provide much more efficient and flexible Mobile IP connectivity than is possible with current techniques. While I believe the simple ITR and ETR behavior of Ivip is both satisfactory and superior to the more complex approach of LISP (although perhaps LISP 3 will involve simpler arrangements than described for 1 or 1.5 in the current LISP-01 I-D) - I don't feel I have a robust enough approach to pushing the mapping data out to ITRDs and QSDs all over the Net. If that problem can be solved, I think Ivip has a reasonable chance of satisfying the criteria set forth in the RRG's Design Goals for Scalable Internet Routing [I-D.irtf-rrg-design-goals-01]. Grave problems will arise if no suitable new architectural solution is found to the Internet's problems in routing and addressing. Ivip is intended to facilitate a much finer splitting of IPv4 address space than BGP allows - and therefore a much greater utilisation of this space than is currently possible. Ivip is also intended to provide a better approach to IP address portability and multihoming so that fewer end-users will want to gain conventional PI address space and further burden DFZ (Default Free Zone) routers with additions to the global BGP routing table. The iplane.cs.washington.edu project indicates there are approximately 63,000 BGP routers. (Lists of alias clusters in [iPlane].) Most of these will be transit and multihomed border routers. The remainder are singlehomed border routers. Transit and multihomed border routers are in the DFZ, and so need to develop a separate routing rule for each of the 220,000 or so prefixes which are advertised in the global BGP system. Every DFZ router needs to communicate with each of its peers about each of these prefixes, with messages about each prefix typically propagating across the entire BGP system. Iljitsch van Beijnum estimates that each prefix for each peer consumes between 60 and 240 bytes of router memory - and some routers have dozens of peers. [van-Beijnum-BGP] Problems with the load this places on routers, and difficulties with the stability of the whole BGP system, are the most serious and growing problem at present - and threaten to make many of these (probably) 50,000+ routers obsolete as the number of BGP routes grows. The size of these problems means that considerable resources can justifiably be devoted to introducing a new system. So while the problems to be overcome are daunting, the author of any such proposal can invoke the expenditure of millions of dollars of resources with ease, since other competing proposals will involve similar Whittle Expires January 16, 2008 [Page 6] Internet-Draft Ivip Architecture July 2007 expenditures and since inaction would result in far higher expenditures still. However, a successful proposal must be not only the most promising of the alternatives, but must also be incrementally deployable. As Noel Chiappa wrote on the RRG list on 2007 July 13: "That is *the* problem in Internet engineering these days. Any old fool (well, sort of :-) can design a better network, or a jet airplane; but it takes a real genius to figure out how to turn a fabric biplane into a jet while it's flying! :-) Ivip requires no changes to host operating systems or applications. Nor does it require changes to the BGP routing system. Ivip requires new functionality within, or closely connected to, some existing BGP and internal routers. The intention is that this can be implemented with firmware and/or configuration changes. In principle, the entire Ivip system could be introduced by adding specially programmed servers, with only configuration changes to the existing routers. However the most likely deployment scenarios involve additional router functionality, as well as the creation of some globally coordinated networks of servers. Ivip ITR and ETR behavior is relatively simple. The real challenges are in allowing end-users to securely control their part of the mapping database, getting the database information to the ITRs quickly and securely, implementing the ITR functions efficiently (including in servers and sending hosts rather than routers), in ensuring ETRs can't be used to circumvent security measures - while ensuring that some networks will want to implement Ivip even when few people use it or know what it is. Please use and adapt these ideas for your own proposals and suggest any improvements which could be made to this I-D, which was prepared in a hurry. I intend to create a better version 01 in mid to late August. In the meantime, please discuss this I-D on the RAM list - http://www1.ietf.org/mailman/listinfo/ram - or via private email. It is possible that discussions will be redirected to the RRG (IRTF Routing Research Group) list: http://www.irtf.org/charter?gtype=rg&group=rrg . I will attempt to list bug-fixes and planned improvements to this I-D at http://www.firstpr.com.au/ip/ivip/ . 1.2. Postal redirection analogy A simple and reasonably instructive analogy to Ivip is the Post Office's mail redirection system. Letters addressed to an original home address are redirected from the original destination's post office with a sticker (or within a new envelope) to a new address, Whittle Expires January 16, 2008 [Page 7] Internet-Draft Ivip Architecture July 2007 which typically involves them being delivered via a second post office. This often involves sub-optimal path lengths, for instance a letter sent from Boston to an original address in San Francisco being redirected to Manhattan. Optimal paths could be achieved - at a very high cost - if every sorting office recognised letters with redirected destination addresses, so the letter was redirected at its first point of contact with the sorting and forwarding system. Ivip does not involve every router being able to do this, but uses a subset of routers with additional ITR (Ingress Tunnel Router) functionality. ITRs recognise packets which need to be redirected. They encapsulate and tunnel packets to another router (an Egress Tunnel Router - ETR), using an address gained from a global databases for this particular block of Ivip-mapped address space. Ivip doesn't encapsulate and tunnel packets which, in the postal analogy, were addressed to ordinary addresses in streets which physically exist. A postal system which is closely analogous to Ivip would redirect every letter with a destination address in one of multiple new artificial streets or towns, which have no physical existence. The Post Office would create multiple "streets" such as Twenty-seventh Virtual St in Virtualville. It then assigns, for years or indefinitely, numbers in such streets to individuals, families and organisations. A subset of sorting houses, through which every letter must pass before reaching a delivery office, would recognise every letter addressed to a street in Virtualville. For each such letter, using a central database, these specially upgraded sorting offices would place the letter in an envelope addressed to one of the Post Office's delivery offices. The database query consists of the full Virtualville address. The response consists simply of the postal address of whichever delivery office can best deliver the letter to its proper recipient. Whenever the proper recipient moves to a new locality, they use a username and password system via the Web, or via a post office, to update the central database so their letters will be redirected to the delivery office in the new locality. When the encapsulated letter reaches the delivery office, the sticker or outer envelope is removed and the office has local knowledge of how to deliver the letter to its intended recipient. With Mobile IP and existing postal redirection systems, the destination typically has a physical "care-of" address (although the Post Office's "post restante" service does not require an address, just identification when picking up mail from a post office). With Ivip, the destination need not have any other IP address than its own Ivip-mapped IP address. In the postal analogy, the delivery office delivers packets to the correct recipient, which is not necessarily a Whittle Expires January 16, 2008 [Page 8] Internet-Draft Ivip Architecture July 2007 house with an ordinary street address. In neither Ivip nor the new Virtualville postal routing and addressing architecture does the system specify exactly how the final router or delivery office should forward the packet or letter to its proper recipient. In all cases, however, the destination does have the Ivip-mapped address, or has the full Virtualville address emblazoned on it in some manner. In the postal analogy, due to anti-terrorist security measures, every initial sorting office which processes letters posted in a locality, will not forward any letters which come from an unrecognised address. The local system needs to recognise that letters with particular, previously locally registered, Virtualville sender addresses should be delivered normally, and that letters with sender addresses from other Virtualville streets and numbers, or from any address outside the local area, should be quarantined in a safe location where they will await scrutiny by the Office of Homeland Security. Packets sent from hosts with Ivip-mapped addresses only need to pass muster in respect of their source address being locally recognised. They don't require any special delivery system, unless of course the destination address is Ivip-mapped too, in which case the packet will be forwarded to an ITR, which tunnels it to an ETR which forwards it to the destination host. 1.3. LISP and ID/LOC separation Many proposals have been made regarding additional protocol layers to take the place of IP addresses, which are widely regarded as performing two functions: identifying the end-point of a communication and specifying, as part of the address, information about where the end-point is located. The primary goal of all these proposals is that upper layer protocols would work with the identifier, and continue a communication session with the end-point, even when it becomes accessible via a different locator - for instance due to end-point mobility or switching from one provider network to another in a multihoming setting. This goes beyond the functionality of the current two level DNS and IP address system. The current system is fine for a human user or a piece of software always commencing a communication session with a FQDN such as www.example.org. What the current system cannot cope with is continuing a session, such as an HTTP session over TCP, with the remote server when that server becomes only reachable via a different IP address. ID/LOC separation proposals generally intend that higher layer protocols such as TCP can continue to operate on identifiers, with a lower, new, layer of protocol software translating these to whichever physical locators are needed to reach the server at each moment in time. Whittle Expires January 16, 2008 [Page 9] Internet-Draft Ivip Architecture July 2007 This would allow session continuity when a multihomed host becomes reachable only via a new IP address. It would also enable locators to be allocated to physical sites in accordance with the dictates of route aggregation, which makes life easier for routers, while the allocation of identifiers need not be constrained by route aggregation or any other constraint regarding the physical topology of the network. However, since physical routing is done on the lower level locators, which are still subject to topological constraints, ID/LOC separation doesn't necessarily allow complete portability of networks from one provider to another, since network's internal routing configuration is set in part by numeric locator IP addresses, and these can't be advertised at arbitrary providers without compromising route aggregation and/or adding a further route to the global BGP routing table. While Ivip is based on LISP (Locator/ID Separation Protocol) [I-D.farinacci-lisp], which is an ID/LOC separation protocol, I am not sure that Ivip meets all the formal requirements which proponents of ID/LOC might have for such a protocol. In a later section (Comparison with LISP) I attempt to list what Ivip takes from LISP, what it leaves out and what it adds. Some ID/LOC proposals require non-backwards compatible changes to operating system and/or application software. Some use conventional IP addresses for both "identifier" and "locator". For instance SHIM6 [I-D.ietf-shim6-proto] (which is still being developed) works between IPv6 hosts with upgraded TCP/IP stacks and achieves multihoming, but not portability, on a purely host-to-host basis without any changes to routers or the addressing system. LISP also uses some ordinary IP addresses for identifiers and others for locators. LISP requires no changes to hosts, BGP routers or the BGP routing system. It achieves its goals of portability, multihoming and Traffic Engineering (TE) with special ITR and ETR (Ingress and Egress) Tunnel Routers inside provider and end-user edge networks. In the LISP variants which are most suitable for adoption, a centralised or distributed database controls the ITRs. Ivip is based on some of LISP's principles, including ITRs, ETRs and using a subset of the existing address space as identifiers with the remainders being usable as locators. Ivip does not attempt LISP's communication between ITRs and ETRs. Nor does it involve LISP's explicit TE functions. Ivip has a very different method of distributing ID-LOC mapping information (instructions to ITRs on where to tunnel packets based on their original destination address) than is proposed in current LISP I-Ds. Some of Ivip's ITRs are "anycast ITRs in the core" (meaning outside Whittle Expires January 16, 2008 [Page 10] Internet-Draft Ivip Architecture July 2007 provider and AS-end-user edge networks) with the mapped addresses (identifiers, or EIDs in LISP terminology) being part of BGP advertised prefixes. In this way, packets sent by hosts in networks without ITRs will still be tunneled by an ITR and find their way to hosts with Ivip-mapped addresses. This "anycast ITRs in the core" system is an unusual form of anycast, and supports TCP and all other protocols, because all packets are tunneled to the one destination host. This system is believed to make Ivip much more incrementally deployable than LISP, because without these "anycast ITRs in the core", hosts with LISP/Ivip-mapped addresses would not be reachable from hosts in networks which have not installed an ITR. Ivip may have more ambitious goals than LISP regarding the fine division of address space to serve the needs of millions of end-users and regarding how quickly the database(s) and ITR system can respond to user commands to change mapping of their addresses. Ivip has no explicit TE functions, but it is intended that some TE be achievable. For instance to achieve load balancing over two or more links to a multihomed site which has traffic arriving on multiple Ivip-mapped addresses, the end-user would choose, for each such Ivip-mapped address, which ISP's ETR the packets are tunneled to and therefore which link these packets travel on. 1.4. One way tunnels to a single ETR If host HA has a normal BGP-reachable IP address and host HB has an Ivip-mapped address, Ivip is only involved in tunneling packets sent by HA to HB. The typical arrangement is for the packet to be forwarded to an ITR which uses the packet's Destination Address (DA) as a key to its local copy of the mapping database, with the result being an IP address to which the packet will be tunneled. IP-in-IP tunneling is used, with a single outer IP header added, using the original source address. The destination address is that of an ETR, and is provided by a copy of the database which the ITR either contains or can query. The end-user (who runs host HB) has previously set the database so all ITRs in the world will tunnel packets which are addressed to host HB's Ivip-mapped address, to whichever ETR the end-user chooses. When the encapsulated packet arrives at the ETR, the outer IP header is removed, and the original packet, as HA sent it and as the ITR received it, is forwarded to host HB. (The ITR typically copies the hop-count value from the original packet to the outer IP header and the ETR copies it from the outer IP header to the decapsulated packet.) Whittle Expires January 16, 2008 [Page 11] Internet-Draft Ivip Architecture July 2007 ................ ................ . N1 . . N2 . . . . . . HA-----ITR~~~~~BR~~~~~~TR~~~~~~BR~~~~~ETR-----HB . . . . . ................ ................ Figure 1: Basic left to right packet flow - ITR in N1. Figure 1 depicts left to right flow of a packet from host HA (7.7.7.7) to host HB (22.22.22.22). The "raw" packet, with DA = 22.22.22.22 is forwarded to the ITR in network N1. 22.22.22.22 is part of the 22.22.0.0/16 prefix, which is one of the Ivip-Mapped Address Blocks (IMABs) which all ITRs advertise. This means that every ITR which is a BGP router advertises itself as the destination for this prefix, and that every ITR which is an internal router will inject this route into the local routing system. The one /16 prefix, in this example, burdens the BGP system with one extra route, but can be used to support the portable and multihoming address needs of hundreds or thousands of end-users. Without Ivip or a similarly effective system, some or all of these hundreds or thousands of end-users would get their own PI space, totalling far more than the 65,536 addresses of 22.22.0.0/16, and adding hundreds or thousands of routes to the global BGP routing table. Ivip's capacity to reduce the growth on the BGP routing table and to enable the efficient use of IPv4 space by giving end-users precisely the number of addresses they need - not 256, 512, 1024 etc. addresses - rests on the RIRs developing an address management policy for Ivip- mapped address space which generally ensures that large blocks of addresses are assigned to the Ivip system, with each block being used to serve the needs of many end-users. It should not be difficult to develop implement such policies. Ivip is intended to serve the needs of end-users who need portability, multihoming and perhaps TE. Some of these end-users have already gained - or in the absence of a new routing and addressing architecture, would soon gain - an ASN and PI space to add to the BGP routing system. Ivip is also intended to serve the needs of end-users who do not have the resources to become an AS, gain a PI prefix etc., but who nonetheless need portability, multihoming and perhaps TE over multiple links to providers. Portability of a single IP address between providers is not ordinarily considered a high priority goal, since a single host or Whittle Expires January 16, 2008 [Page 12] Internet-Draft Ivip Architecture July 2007 NAT router and its DNS entry can easily be manually configured to a new IP address whenever a new provider is used. However there may be instances where an organisation has hundreds of branch offices, each with a single or a few IP addresses, which it wishes to remain fixed despite changing each office's singlehomed connection to one local provider or another, so its country-wide routing system does not need to be reconfigured frequently. 1.5. Anycast ITRs Multiple routers, usually each with an associated server, advertising the same prefix is known as "anycasting" [RFC1546] [ISC-Anycast]. Ivip's use of multiple anycast routers may be novel: tunneling packets to a single tunnel endpoint, which forwards the packets to a single host. Each ITR either has a copy of the Ivip database (ITRD) or queries (ITRC) a QSD server (perhaps indirectly through one or more caching QSC servers) which does have a copy. The database's array for the 22.22.0.0 IMAB has 65,536 elements - one for each IP address. Each element contains a 32 bit IP address. The element for 22.22.22.22 has been set by the end-user to contain the address 54.32.1.0, which is the address of the ETR in Network N2. The ~~~~ path in Figure 2 depicts the encapsulated packet being forwarded from the ITR, to N1's border router, to a transit router, to N2's border router and then to its destination, the ETR in N2. This transport of the encapsulated packet has been entirely with the standard BGP system and N2's internal routing system. In this example, the BGP system sees only a packet with the Destination Address (DA) of 54.32.1.0. If there had been no ITR in N1, but the TR transit router in Figure 1 was an ITR - as shown in Figure 2 - then the BGP system would handle two different packets. The first is a "raw" packet with DA = 22.22.22.22, which was forwarded to the ITR function of this transit router. The second is the encapsulated packet leaving this ITR transit router for the border router of N2, with its outer IP header having DA = 54.32.1.0. ................ ................ . N1 . . N2 . . . . . . HA-----IR------BR-----ITR~~~~~~BR~~~~~ETR-----HB . . . . . ................ ................ Whittle Expires January 16, 2008 [Page 13] Internet-Draft Ivip Architecture July 2007 Figure 2: Basic left to right packet flow - anycast ITR in core. In both Figures 1 and 2, the ETR removes the outer IP header, revealing the original packet. After updating its hop-count, the ETR forwards the decapsulated packet to the destination host. This requires either a direct connection to the destination host or support from N2's internal routing system. The latter involves the routing system recognising packets with DA = this particular IP address - 22.22.22.22 - as needing to be forwarded to this host, while (assuming N2 has no other hosts using Ivip-mapped addresses from this IMAB) other addresses within 22.22.0.0/16 are forwarded as usual. "As usual" means towards any ITR inside N2 or failing that, to N2's border router, because this prefix is one which is advertised in BGP by multiple anycast ITRs in the core. A border router of a provider or AS-end-user network which is an ITR and advertises 22.22.0.0/16 to its BGP peers in other ASes also functions as an "anycast ITR in the core" because "raw" packets emerging from networks with no ITR will be forwarded to this ITR border router and be encapsulated and tunneled from there. Exactly why a network would provide this service for packets not associated with its network is a separate question. A border router in N1 may be a convenient location to install ITR functionality. A more likely arrangement is that it would not advertise 22.22.0.0/16 or any of the other IMABs in the Ivip system to its BGP peers outside N1 (so as not to attract packets originating from non-ITR networks). The border ITR would internally advertise 22.22.0.0/16 and the other IMABs so that all packets addressed to an Ivip-mapped IP address (IMIP) would be forwarded internally to this border router ITR. To the picture given by Figures 1 and 2 three other concepts need to be added. Firstly, packets sent from hosts all over the Net to 22.22.22.22 are tunneled by ITRs to the one ETR at any one time. Secondly, the address of the tunnel endpoionts for all the ITRs can be changed within a short time, globally - ideally within a few seconds - by the end-user who controls the Ivip-mapping of 22.22.22.22. Thirdly, if the network in which the sending host is located does not have an ITR, the raw packet will be forwarded internally to the border router and then forwarded through the BGP system to the "nearest" (in BGP terms) ITR, which tunnels it to the end-user's chosen ETR. Whittle Expires January 16, 2008 [Page 14] Internet-Draft Ivip Architecture July 2007 Packet's flowing from HB to HA do not require any involvement of Ivip. Each ITR and ETR shown in the previous and the following diagrams also performs whatever functions an ordinary router in its position performs. A packet sent from HB to HA is forwarded internally to the N2's border router and then through BGP routers to the border router of N1, after which it is forwarded internally to HA. The packet may well pass through N2's ETR, but since its DA is not one of N2's ETR's IP addresses, that ETR forwards it normally. The packet may well pass through N1's ITR (Figure 1) or the core-ITR in Figure 2 - but since its DA is not within one of the Ivip system's IMABs, both of those ITRs behave like an ordinary internal router (Figure 1) or transit router (Figure 2) and forward the packet normally towards HA. If HA had an Ivip-mapped address too, then packets sent to it from HB would also need to go via an ITR and an ETR. These are not shown in the previous two diagrams. 1.6. Types of ETR ETRs (with the exception of some TTRs - Translating Tunnel Routers, for mobile destination hosts) are always located in provider or AS- end-user networks. It is also possible for the destination host to perform its own ETR function, which requires it to have suitable software and a BGP-reachable care-of address. TTRs are discussed in a section below concerning mobility. 1.7. Types of ITR ITRs are typically located in provider or AS-end-user networks. ITRs outside those networks - "anycast ITRs in the core" - handle packets sent from networks which have no ITR. It is also possible to perform the ITR function in the sending host, provided that host is not behind NAT. The NAT router itself, assuming it is not behind NAT, is a good place to perform the ITR function. In this introduction, I assume each ITR handles the full range of IMABs in the Ivip system. However, to spread load over multiple ITRs in a single location, several could be configured so they each cover a fraction of the total Ivip-mapped address space. 1.7.1. ITRD - full database (push) An ITRD is an ITR which has a real-time updated copy of the full Ivip-mapping database (or multiple databases, one for each IMAB). Its FIB is always up-to-date, instantly tunneling all packets received whose DA is within any one of the IMABs. An ITRD requires a Whittle Expires January 16, 2008 [Page 15] Internet-Draft Ivip Architecture July 2007 very extensive FIB and a large amount of CPU RAM. An ITRD could be implemented in a server - but the highest performance ITRDs would always be those with a full ASIC-based router FIB hardware system. 1.7.2. ITRC - query, cache (pull) and notify An ITRC does not keep a full copy of the database, but queries a nearby (ideally) Query Server which does have a full copy. Query Servers are not described in detail in this introduction. The ITRC's FIB only tunnels packets for which the ITRC has recently received mapping information. ITRCs are informed by Query Servers if the mapping changes for any IMIP (Ivip-mapped IP address) for which it recently received mapping information. This cache invalidation message is known as Notification, and is initiated by the Query Server which has a real- time updated copy of the full database for each of the IMABs in the Ivip system. ITRCs could be implemented with a server, but the highest performance ITRCs will generally be routers with additional capabilities, using their existing FIB hardware to encapsulate packets. 1.7.3. ITFH - Ingress Tunnel Function in Host An ITFH (Ingress Tunnel Function in Host) is an operating system implementation of an ITRC. As such, this is an additional layer of TCP/IP software in the upper part of the IP Layer 3 code, at the same level chosen for SHIM6. [I-D.ietf-shim6-proto] This is suitable for hosts which are not behind NAT or for a NAT router itself, provided it is not behind NAT. There is absolutely no requirement for ITFH in Ivip, but in the longer term, if Ivip or something similar becomes widely deployed, the most cost-effective location to perform most or all encapsulation may be in the sending host or the NAT router. Both ITFHs and ITRCs may not be able to gain mapping information fast enough to correctly tunnel all packets whose destinations are Ivip- mapped. Also, they may not be able to store all this information in their RAM, or implement all the mapping in their limited FIB functions. These "unmatched" packets (including those which are not novel, but which for one reason or another should be encapsulated but have not been) may be simply forwarded normally, in which case they will find their way to an ITRD. Alternatively, the ITRC or ITRH may be able to identify these packets and explicitly forward or tunnel them to a nearby ITRD. Whittle Expires January 16, 2008 [Page 16] Internet-Draft Ivip Architecture July 2007 1.8. Initial deployment The simplest initial deployment of Ivip involves a single database, multiple anycast ITRs in the core, and one or more ETRs in each of multiple provider networks. Better performance would be achieved with ITRs in provider and AS-end-user edge networks. The diagram below assume a single or distributed database system which controls all ITRs. In later sections I describe how multiple databases, one for each IMAB, are distributed over multiple systems and their updates combined and distributed by a global Replicator system. ......... .......... . N1 . . N3 . . . . . . . . /-IH5 . . H1----\ . . / . . BR1------ITR1-------BR3--ETR1 . Multihomed . H2----/ . \ / \ / .\ \ . end-user . . \ / \ / . H6 \- PE1-\ ........... ......... \ / \ / .......... \ . N5 . \/ \/ \ . . /\ /\ CE1---IH9 . ......... / \ / \ .......... / . \ . . . / \ / \ . . / . \-IH10 . . H3-ITR2 . / \ / \ . /------ETR2-/ . . . BR2-------TR1-------BR4---H7 . ........... . H4----/ . . \-IH8 . . . . . . N2 . BR4 = ITR & ETR . N4 . ......... .......... Figure 3: Simple multihoming scenario. The following discussion relates to Figure 3. This represents a small section of the Internet, but we can assume it is the entire Internet for these examples. Networks N1 to N4 are provider (ISP) networks. N5 is the network of an end-user. Current multihoming practice requires the end-user to have their own PI (Provider Independent) address space, which typically requires them to be an Autonomous System. This means they could run BGP routers, but this is not actually required. All that is required is that both N5's two providers have links to N5's CE1 (Customer Edge) router and that one or the other advertises N5's PI prefix at its border routers and forwards those packets to CE1. Whittle Expires January 16, 2008 [Page 17] Internet-Draft Ivip Architecture July 2007 I assume the reader is fully familiar with this approach to multihoming, and that it is understood that the central challenge in devising a new routing and addressing architecture for the Internet involves achieving multihoming without N5 having an unnecessarily large number of IP addresses assigned to it and without burdening the BGP system both with an extra advertised prefix and when changes are made to this advertisement when, for instance, the link to N3 fails and N4's BR4 advertises the prefix instead. The sections following this Introduction provide more background information on these matters. N1 is an unaltered provider network - it has no ITRs or ETRs. Therefore it is not possible (except via a TTR inside or outside N1) to have any hosts there using Ivip-mapped addresses. N2 has an ITR but no ETR. Without an ETR (and ignoring TTRs for the rest of this discussion) N2 cannot have any hosts with Ivip-mapped addresses. N3 has an ETR. The diagram shows one host IH5 with an Ivip mapped address. In this discussion I will assume that each host has a single Ivip-mapped address, but it is perfectly possible for a host to have multiple such addresses, prefixes of such addresses etc. as well as having ordinary BGP-reachable non-Ivip-mapped addresses. N3 has a PE1 (Provider Edge) internal router which has a link to the end-user's site. N4's border router BR4 is both an ITR and an ETR. N4's Provider Edge router (ETR1) with a link to the end user's site is also an ETR. N4 has a host H7 with an ordinary address and IH8 with an Ivip-mapped address. N5 has an Ivip-mapped prefix: 22.22.2.0/28 - 16 IP addresses. These are effectively PI addresses, because they have been obtained either from the Ivip system itself, or from whichever company (perhaps an ISP) is participating with the Ivip system and which has assigned the IMAB 22.22.0.0/16 to the Ivip system. N5 probably pays a small annual fee for these addresses, and may need to justify its use of them, as pressure mounts to use IPv4 space efficiently. N5's Ivip-mapped prefix consists of 16 contiguous IP addresses which happen to fit on binary boundaries. Initially we will consider multihoming for robustness, with all these 16 addresses treated as a prefix, and all tunneled to one ETR or another. In practice, the Ivip-system and the ITRs will tunnel packets to whatever address the end-user chooses, subject to some areas of the address space being off-limits for tunneling and also subject to packets never being tunneled to any address which is Ivip-mapped. In this example, I Whittle Expires January 16, 2008 [Page 18] Internet-Draft Ivip Architecture July 2007 assume that the end-user ensures that packets addressed to their addresses are always tunneled to an ETR of a provider they have a commercial relationship with. 1.8.1. Paths taken by packets Here I will give examples of packet flows in Figure 3. Packets to and from hosts with ordinary BGP Reachable IP (BRIP) addresses follow predictable paths, for instance: H1, BR1, ITR1 (acting as an ordinary transit router), BR3, H6. Packets sent by H6 to H1 follow the same path in reverse. Packets sent by H1 to IH5 (with an Ivip-mapped address) follow this path: H1, BR1, ITR1 (which encapsulates the packet with IP-in-IP, DA = ETR1's IP address), BR3, ETR1 (decapsulates the packet), BR3 (assuming N3's internal routing system has an appropriate route to handle packets with DA = IH5's Ivip-mapped address), IH5. Packets sent from IH5 to H1 follow a simpler path, because destination address is an ordinary BRIP - so the packet is handled by the usual internal and BGP systems, without involving Ivip mechanisms: IH5, BR3, ITR1 (acting as a conventional transit router), BR1, H1. A packet from H3 to IH5 does not use the core-ITR ITR1, because its network N2 has its own ITR. The path is: H3, ITR2 (which encapsulates the packet with DA = ETR1's IP address), BR2, TR1 (or perhaps ITR1, depending on BR2s choice of best path for the prefix in which ETR1's BRIP matches), BR3, ETR1 (which decapsulates it, to restore the original packet with DA = IH5's Ivip-mapped address), BR3, IH5. Packets from IH5 to H3 involve no Ivip handling and follow a path such as IH5, BR3, TR1, BR2, ITR2 (acting as an ordinary internal router, since the packet's DA is not part of an Ivip-mapped address block - IMAB), H3. A packet from H4 to IH5 would follow a similar path to that just described, but initially it would travel to BR2, and then to ITR2. ITR2 advertises (injects?) the routes for all the IMABs into N2's internal routing system. BR3 forwards the packet to ITR2 for this reason. If there was no ITR in N2 (like the situation in N1) then BR2 would have forwarded the packet to one of its BGP peers, probably ITR1, which also advertises the same set of IMABs. I assume that the internal routing system route for packets addressed to any one of these IMABs takes precedence for BR2. Once the packet reaches ITR2, it is encapsulated and forwarded as previously described to ETR1, where it is decapsulated and forwarded to IH5. Whittle Expires January 16, 2008 [Page 19] Internet-Draft Ivip Architecture July 2007 A packet sent from H6 to IH5 would presumably be handled by N3's internal routing system, which presumably has a route specific for IH5's Ivip-mapped address. If not, then the packet will be forwarded out of N3, because N3 has no ITR, and will reach the nearest ITR, which is the core-ITR ITR1. There is it will be encapsulated and forwarded to ETR1, to be decapsulated and forwarded through BR3 to IH5. Similarly, a packet from H6 to IH9 will either be handled by N3's internal routing system - forwarded directly as a raw packet through BR3, ETR1 (as a normal internal router), PE1, CE1 and IH9 - or be forwarded out to ITR1, where it is encapsulated, forwarded to BR3 and ETR1, decapsulated and forwarded to PE1, CE1 and IH9. N2 has its own ITR, so its hosts do not rely on external ITRs such as ITR1 when sending packets to hosts with Ivip-mapped addresses. N2 has no ETR, so it can't have any hosts with Ivip-mapped addresses. N4 has one ITR and at least one (two) ETRs, so it can have hosts with Ivip-mapped addresses. N4's host's don't rely on external ITRs either when sending packets to Ivip-mapped addresses. The hosts in N5 are all on Ivip-mapped addresses. When they send packets to hosts with Ivip-mapped addresses which are outside N5, these packets will need to be handled by an ITR - unless the destination host is within whichever provider network N3 or N4 CE1 sends the outgoing packets to and if that provider network's internal routing system has routes for that destination host. If IH9 sends a packet to IH8, while CE1 is sending outgoing packets along the link to N3, then the raw packet will be forwarded out of N3, since N3 has no ITR. The raw packet will be forwarded to ITR1, which will encapsulate it and tunnel the packet to BR4, which decapsulates it and forwards it to IH8. If there was no core-ITR such as ITR1 nearby, these packets would have to travel to the nearest core-ITR. This is assuming that N4's BR4, which is an ITR, is not advertising the Ivip IMABs to its BGP peers. If there was no nearby core-ITR and N4's BR4 was advertising the Ivip IMABs, then in the previous example, the raw packet would be forwarded out of BR3 and find its way to BR4, which is acting like a core-ITR. BR4 could respond in two ways. Firstly, BR4 would look into its database (if it was an ITRD - or use a Query Server if it was an ITRC) and find that the Ivip mapping for this address (IH8's) is to tunnel it to one of BR4's own addresses. It could encapsulate it, forward it to itself and decapsulate it. Secondly, before testing the packet against the Ivip database, BR4's FIB could first apply local routing rules to the packet, in which case the packet would be forwarded directly to IH7. This would be a rare, but perfectly valid, case where a packet sent to a host with an Ivip- Whittle Expires January 16, 2008 [Page 20] Internet-Draft Ivip Architecture July 2007 mapped address completes the journey, in this case via three networks, without actually being tunneled. It would be a public-spirited act for N4 to make its BR4 ITR functions available to packets arriving from its BGP peers. There could be a number of reasons why N4 does this, including simply wanting to encourage Ivip adoption, in the hope of saving a bunch of money by not having to upgrade its DFZ routers as quickly as would be required without something like Ivip. Perhaps there could be some central collection of funds and subsidisation of core-ITRs - which BR4 would effectively become - if it advertised the Ivip IMABs to its BGP peers. However, any ITR which does this MUST forward all decapsulated packets without restriction. For instance if ITR1 was an ordinary transit router and there was no other core-ITR anywhere close, then BR4, acting as a core-ITR, could be handling packets which have nothing directly to do with N4 or its customers. For instance, a packet from H2 to IH5 would follow this path: H2, BR1, TR1, BR4 (acting as an ITR, encapsulates it), TR2 (a new name for the transit router where ITR1 was), BR3, ETR1 (which decapsulates it), BR3 and IH5. It would not be acceptable for N5 to make BR4 an anycast ITR for its BGP peers and only forward encapsulated packets received from those peers where the final destination was within N4. N5 could have its own ITR, which would get the raw packet and encapsulate it - but perhaps N5 doesn't want to run an ITR, due to the capital cost, due to the high traffic volume of database updates for an ITRD, or due to the slow response times and extra traffic over its link for an ITRC due to the slow nature of its link to the Query Server(s) the ITRC would depend on in whichever provider network CE1 is currently sending outgoing packets to. This discussion has involved a lot of low-level detail but I hope it has helped the reader understand various ways packets can flow with Ivip. 1.8.2. Multihoming when both links are working The end-user arranges with N3 and N4 to configure their ETRs, internal routing systems etc. ready to accept encapsulated packets for its 22.22.2.0/28 prefix. This also involves N3 and N4 allowing packets with Source Addresses (SAs) from this prefix to be forwarded normally, including out of their border routers to the BGP system. In the case of N4, BR4 must accept outgoing packets with SAs within 22.22.2.0/28 to be forwarded to its BGP peers and to be accepted into its ITR function (if their DA matches one of the Ivip IMABs). Whittle Expires January 16, 2008 [Page 21] Internet-Draft Ivip Architecture July 2007 N5's CE1 router accepts incoming packets with DA matching 22.22.2.0/28 on either link, and forwards them to the local network, which is shown with two hosts IH9 and IH10 which both have Ivip- mapped addresses. The administrators of N3 and N4 tell the end-user (the administrator of N5) the IP addresses of their two ETRs: ETR1 and ETR2. It would also have been possible for N4 to have the packets decapsulated by its BR4 which is N4's second ETR, as well as an ITR and border router. However, this is a busy router and it makes more sense to have ETR2 do the decapsulating work. In this case - ETR2 doing the decapsulation - N4 doesn't have to alter its internal routing system to forward packets for N5's prefix, because the link to CE1 is connected directly to an interface of ETR2. N3 does need to configure its internal routing system to handle CE1's prefix, unless some special tunneling is used to get the decapsulated packets from ETR1 to PE1. 1.8.3. External multihoming monitoring system Not shown in this diagram is some kind of commercial monitoring system, which the end-user hires to keep a constant watch on the status of their multihoming arrangement. Monitoring of link failure etc. is not part of Ivip. There may be an argument for one or more IETF standardised protocols etc. for such a monitoring system. Here we assume there is a monitoring system which can rapidly and reliably detect any failure which affects N5's multihoming arrangement, including for instance the failure of the link to either ISP, the failure of either ISP's PE router, its ETR, or the ISP's entire connection to the Net. The monitoring system probably needs to be located entirely outside N3, N4 or N5. In principle, it might be possible to locate it in N5, but the whole purpose of a monitoring system is to change the Ivip database once a fault occurs, so that the ITRs tunnel packets to an alternative ETR which has a working link to CE1. Any such commands need to be cryptographically secured, and a unidirectional system for such commands to whatever accepts commands to alter the mapping database might be vulnerable to a replay attack. I assume that the monitoring system needs a reliable two way link to whatever Update Authorisation Server (UAS) the end-user uses to alter the mapping of their Ivip-mapped addresses. In that case, it is best that it the monitoring system not be in the end-user network, because at the time of N3 link failure, two-way communication can't occur using the current ITR tunnels, which are still to N3's ETR. It is conceivable that the UAS could be preconfigured to communicate Whittle Expires January 16, 2008 [Page 22] Internet-Draft Ivip Architecture July 2007 with a monitoring system at the end-user site via its own tunneling of packets to one or more ETRs which are not currently tunneled to by the Ivip ITRs. That would best be achieved by an IETF standardised protocol. It is also conceivable that an external monitoring system might accept prompt, cryptographically secured, messages from some router or server in the N3 network the moment the link to the CE1 went down. This too could be the subject of an IETF standardised protocol, but it would not directly involve ETRs or Ivip. Ignoring for the moment packets sent by hosts in N3, N4 or N5, initially, all packets sent by hosts all over the world to any of N5's prefix of Ivip mapped addresses are sent via ETR1, because that is where the end-user and/or the monitoring system has configured the database to map these 22.22.2.0/28 addresses to. In this example, the end-user has given the monitoring system the private key, username and password etc. which is necessary for the monitoring system to automatically change the mapping, via the UAS which handles the end-user's Ivip-mapped addresses. 1.8.4. Multihoming after a link fails The monitoring system sends frequent probe packets to CE1, by tunnelling packets to both ETR1 and ETR2. The monitoring system might also monitor the current state of the mapping of the end-user's 16 Ivip-mapped addresses. It could do this either by gaining a real- time feed of database changes, or by querying a Query Server (which would use Notify to instantly inform the monitoring system of any change). At some point in time, the inability of the monitoring system to receive responses to probe packets sent via ETR1 causes it to decide this link has failed. It uses the credentials supplied by the end-user and initiates a session with the UAS by which this end- user controls the mapping of its addresses. Once logged in, the monitoring system could issue separate commands to change the mapping for each of the 16 IP addresses, or a single command for all 16 together. It changes their mapping from the IP address of ETR1 to the IP address of ETR2. There is no particular reason other than N5's internal networking convenience why its addresses should be a conventional prefix on binary boundaries, as they are in this example. The Ivip system can handle individual addresses and arbitrary ranges of addresses with equal ease. The precise details of the UASes, databases, update streams, database dump files, Replicators, ITRs and QSD/QSC Query Servers are detailed in later sections of this I-D. The discussion here gives a rough idea of what is achieved by these systems. The command from the monitoring system - or from any other system or Whittle Expires January 16, 2008 [Page 23] Internet-Draft Ivip Architecture July 2007 a web-browser human interface session with anyone or anything with credentials accepted by the UAS - will cause the UAS to hand down a User Mapping Update Command (UMUC), with its signature (therefore a SUMUC), to another UAS which delegated it the responsibility for whatever ranges of Ivip-mapped addresses it is authoritative for. This command causes a change in the database for the particular IMAB which the end-user's addresses are part of. This results in the change being incorporated within a second or so into multiple identical UDP packets which are sent to 30 or so "Level 1 Replicators". (There may be other ways of achieving the same results, but this is the plan I am pursuing at present.) Three or four levels of replicators reliably propagate the changes to the global network of ITRs and QSDs (Query Servers with a full copy of the Database). This causes a nearly instant (say a few seconds delay, but ideally a fraction of a second) change in the FIBs of the ITRDs all over the Net - so that all packets arriving with DA matching 22.22.2.0/28 will now be tunneled to ETR2, instead of to ETR1. All ITRCs which recently (perhaps some standard caching time, such as 600 seconds) requested from a QSD (perhaps via one or more QSCs - Query Servers with Cache) mapping for an IP address which resulted in a response which concerned any one of the 16 addresses which have just had their mapping changed, will quickly (fraction of a second?) receive a Notification from the QSD (plus chain of 0 or more QSCs) which provided the response. The notification causes all these ITRCs to change their tunneling to ETR2 as well. ITFH functions in hosts behave and are notified in exactly the same way as just described for ITRCs. Connectivity is restored, as long as N4, its ETR4, the link to CE1 etc. are still working. CE1 also needs to have changed its outgoing packet path to be via ETR2. Perhaps the monitoring system could inform it of the change, if CE2 had not already determined that there was a problem with the link to N3. 1.8.5. Potential problems with internal routing systems There are some potential problems during this failure and changeover time which I will briefly mention. I would appreciate any assistance understanding the likely behavior of provider internal routing systems in this situation. I understand that typically, the internal routing system will rapidly respond to the broken link, but would like to know more about all this. When the link to CE1 fails (which could be due to any failure in CE1, the link, PE1, the internal routing system etc.) can the internal routing system of N3 be relied upon to quickly cancel the special Whittle Expires January 16, 2008 [Page 24] Internet-Draft Ivip Architecture July 2007 route it has for forwarding packets whose DA matches 22.22.2.0/28 to PE1? If not, then there is a potentially serious problem with hosts within N3 not being able to send packets to N5. If N3 can't guarantee that its internal routing system will quickly remove any such routes, and so allow packets addressed to 22.22.2.0/28 to find their way out of N5 like the packets addressed to the rest of the 22.22.0.0/16 IMAB (where they will find their way to an ITR such as ITR1, or any ITR within N3), then perhaps it would be better if N3 never made such a route in its internal routing system. In this scenario, all packets from hosts inside N3 to 22.22.2.0/28 would need to go via an ITR, and ETR1 would use an explicit tunnel to get decapsulated packets to PE1. 1.9. Ivip's intended benefits From the above examples, it can be seen that a global Ivip system, or something similar, is capable of having large amounts of address space assigned to it, where it can slice and dice it with very fine resolution (single IPv4 addresses) with very rapid response times (probably a few seconds, but perhaps less with ideal arrangements) so that the addresses can be portable between any ISP with an ETR. This portability directly supports multihoming which can be controlled at a "site level" (range of IP addresses all at once) or down to an individual host (single IP address) level. For IPv6, I envisage Ivip mapping each /64 to a particular ETR. This portability and multihoming - and whatever TE is possible with Ivip - requires no changes to host operating systems or applications. The ITFH function is a strictly optional concept, which would be attractive for some hosts and NAT routers in the longer term but which is not required at any time, including initial introduction. The use of "anycast ITRs in the core" means that hosts in unaltered provider and AS-end-user networks are all capable of sending and receiving packets to and from hosts with Ivip-mapped addresses. There are cost and administrative challenges in deploying the entire Ivip system, including especially the anycast core-ITRs. However, these costs and difficulties are arguably far less challenging than what may be the two remaining alternatives: firstly to pay for and ensure the installation of ITRs in every provider and AS-end-user network as LISP is widely believed to require, or secondly, to do nothing and allow all the routers in the DFZ to become swamped by continued growth in the global BGP routing table, and so need replacement with new, more expensive, models. Ivip or something like it seems to offer the only chance we have for Whittle Expires January 16, 2008 [Page 25] Internet-Draft Ivip Architecture July 2007 efficiently using limited IPv4 address space. Ivip is unconstrained by binary boundaries, "route aggregation" etc. Only when addresses can be assigned according to direct need, rather than in large chunks as they have been to date, can the address space be used efficiently. For instance to have the majority of the 3.7 billion available IP addresses ((0 to 223 inclusive, except 10 and 128) * 256 * 256 * 256 = 3.724 billion) actively used either for an individual host or for a NAT device which supports multiple hosts on a private network. There are no reliable estimates of actual usage of IPv4 utilisation, but in early 2007, a random ping survey indicated there were about 108 million ping-responsive hosts, with much higher densities in some advertised prefixes. [RW ping survey] Ivip can also be used to achieve some TE benefits, by steering traffic of individual Ivip-mapped addresses to one ETR or another. Ivip's ability to support highly efficient mobile-IP is discussed in a later section. So to is the possibility that it could be used to greatly facilitate highly scalable IPv6 tunneling over the existing IPv4 system. None of this places any further burden on the BGP system. Ivip's benefits should greatly reduce the impetus for end-users and perhaps providers for gaining and advertising PI addresses in the global BGP system. This I-D proposes changes which are pervasive and unprecedented. There are many questions to be explored, security problems to be resolved etc. The scope of this project goes beyond the IETF developing protocols and recommended procedures, since it requires cooperation amongst providers, end-users and RIRs, who must approve of address space being used for this novel purpose. There is nothing technically preventing one or more Ivip systems being created today, perhaps as profitable enterprises hiring out their IP addresses to customers - as long as RIRs approve. Although it may be impossible and/or undesirable to prevent the creation of multiple independent Ivip systems which behave as described here, the rest of this I-D concentrates on the establishment of a single global Ivip system. (Multiple Ivip systems need not know about each other - it is not disastrous if an ETR tunnel end-point of one Ivip system's mapping is actually an address which is Ivip-mapped in another system.) This introduction has provided a good general overview of Ivip, for those with some familiarity with the crisis in routing and addressing. Sections below contain a more comprehensive statement of Whittle Expires January 16, 2008 [Page 26] Internet-Draft Ivip Architecture July 2007 the problem space, goals and potential solutions. Following that I explore in greater detail the various aspects of the Ivip system. This is a very early stage of development and I hope many people will point out faults, suggest improvements, and be inspired to create their own proposals to these challenging problems. One luxury this field enjoys is that we can invoke large resources and make uncommonly bold plans - because there is a dearth of easy alternatives and the costs of doing nothing are expected to be so high. 1.10. Long term deployment The above discussion primarily relates to Ivip's capacity to provide important benefits to those who adopt it, while maintaining reachability from hosts in networks which have made no changes, such as installing ITRs or ETRs. The most likely deployment actions will involve the networks of Update Authorisation Servers, Replicators, ITRDs, ITRCs and Query Servers. Although all these functions should be capable of being implemented in software on ordinary servers (albeit with many gigabytes of RAM for the QSDs and ITRDs) it is likely that most network operators will require the ITRD and ITRC functions to be performed on existing or future router systems. In the longer term, assuming Ivip or something similar is widely adopted, it can be expected that there will be widely available, auto-discovered, QSC and QSD services which can support queries from ITRCs and the ITFH functions in hosts. An ITFH function in a host operating system is the most cost- effective way of performing the Ingress Tunneling function of Ivip. The cost will be essentially zero for the software, and there is generally plenty of CPU power and RAM available to do the work. Assuming the Replicator network will be largely built by and shared by providers and AS-end-users and assuming this system propagates updates throughout the world in a few seconds, then it is possible that the Notification arrangement will make the cheaper ITRC routers an attractive alternative to the full database feed, large RAM, very large FIB ITRD routers (or their server-based alternatives). If an ITRC can get an up-to-date response to a query about any IP address from a local QSC - in a fraction of a second - then it may be acceptable for it to do this for every novel packet it receives. In that case, the ITRC handles all packets without delay, providing the performance of an ITRD without the need for a full database feed and without the same large FIB and RAM requirements (assuming of course that the ITRC is not attempting to handle packets addressed to millions of Ivip-mapped addresses at once). Whittle Expires January 16, 2008 [Page 27] Internet-Draft Ivip Architecture July 2007 If ITRCs can be so successful, then so can ITFHs which have sufficient RAM and CPU power. An ITFH costs nothing and always achieves optimal paths, since there is no deviation from the shortest path towards a separate ITR. An ITFH function would probably become mandatory in any web server at a hosting company. The alternative would be a large investment in ITRCs and/or ITRDs. Similarly, ITFH functions in the NAT functions of DSL and HFC cable modems would also be an effectively zero cost alternative to the provider network deploying large numbers of ITRDs and ITRCs. The provider would still need to maintain a responsive QSD and QSC network. (I tend to think of this being an "in-host" function because these modems, although technically routers, have no hardware FIB and the ITRC function is performed entirely in software.) The proliferation of peer-to-peer filesharing and other applications presents something of a challenge for ITRCs and ITRHs. An ITRD has no difficulty with this traffic, since its large FIB is ready to encapsulate packets with any Ivip-mapped destination address. However, a smallish ITFH function in the NAT router section of an ADSL modem will have some limitations on memory for its cached mapping information. A large number of hosts behind the NAT, each firing off packets to thousands of separate Ivip-mapped host addresses, would place a significant burden on the ITFH, including a frequent need to contact the nearest Query Server. However, hopefully most users behind a NAT firewall, including especially the hundreds of millions of DSL, HFC cable and fibre home and SOHO end users, will have no need to have their NAT on an Ivip-mapped address. This is a highly speculative and optimistic vision for a proposal which is less than a month old. If such widespread deployment eventuated, the long-term stable outcome might resemble what the proponents of ID-LOC separation have long preferred: a new layer (ITFH) of software in the TCP/IP stacks of many hosts. However, such changes to hosts would be purely to increase efficiency and reduce costs, not to ensure reachability - which is already provided by a sufficiently widely distributed system of core-ITRs. ETR functions can also be performed in hosts, or at least in NAT devices for hosts behind NAT. The NAT device could be an ETR for specifically identified hosts, each with a care-of address in the private network. In this case, the NAT ETR somewhat resembles a TTR, since the destination host sends its outward-going packets through the same device. These visions of ubiquitous Ivip adoption are probably unnecessary and unrealistic. Only a subset of hosts or end-user networks will benefit from real portability and multihoming. Whittle Expires January 16, 2008 [Page 28] Internet-Draft Ivip Architecture July 2007 Future versions of this I-D will more fully explore the highly promising use of the ITR system to beam packets to TTRs for mobile IP. Future versions of this I-D will more fully explore the potential for using the IPv4 Ivip system for tunneling IPv6 packet in a highly scalable fashion, for using Ivip with IPv6, and for using IPv6 Ivip to tunnel IVv4 packets. Whittle Expires January 16, 2008 [Page 29] Internet-Draft Ivip Architecture July 2007 2. Definition of Terms, Concepts and Functions In the context of the extensive Introduction, this is a comprehensive set of definitions not just of new terms, but of the main concepts and functions which make up the current Ivip proposal. I explore in greater detail in sections below how the various forms of ITR etc. are used, but have included considerable detail here. There is some repetition of material from the Introduction. Some of the terms defined here are identical or similar to those used in LISP and in general discussion. Others are different from roughly equivalent terms used in LISP. There has been a long discussion on the RAM list about the precise meaning of the terms "Identifier" and "Locator". I am trying to avoid these terms as much as possible with Ivip, because of the evident confusion they cause. Whether an item of information such as an IP address should be considered or referred to as an "Identifier" or a "Locator" depends very much on the context in which it is used - so these terms tend to describe usage, rather than any intrinsic quality of the item. The long Introduction above has used some of these terms, but not all. Eventually the Introduction may be rewritten to use all these terms consistently, and this section moved in front of that introductory material. For now, I want the Introduction to be accessible to readers without learning much new terminology. However, for the more detailed description of Ivip principles and mechanisms below, we need to use the new terms extensively. This is quite a detailed definition of terms, which gives some insight into the operation of whole the Ivip system. [To do: references for LISP, APT etc. in definitions below.] 2.1. IMIP - Ivip-Mapped IP address Within the global unicast address space of IPv4 or IPv6, a subset of these addresses are covered by one of the one or more IMABs (Ivip Mapped Address Blocks, as described below). Every such address is an IMIP. The fact that the relevant part of the Ivip database system (the particular IMAB-DB as defined below) may contain a null entry (zero) for this particular address (meaning to drop the packet, rather than tunnel it somewhere) does not alter the fact that this address is an IMIP. Similarly, if current mapping is to an unreachable address, or to the wrong ETR, or to no ETR etc. the address is an IMIP simply because it is within the range of one of the Ivip system's IMABs. Whittle Expires January 16, 2008 [Page 30] Internet-Draft Ivip Architecture July 2007 2.2. NIMIP - Non-Ivip-mapped IP address Within the global unicast address space of IPv4 or IPv6, every address which is not an IMIP (is not within one of the IMABs) is a NIMIP. 2.3. BRIP - BGP Reachable IP address A BRIP is an ordinary IP address which is within one of the currently advertised BGP prefixes, excluding those prefixes which are for IMABs, meaning they are used to advertise Ivip mapped addresses (IMIPs). Whether or not there is actually a host or router at this address is not important. The criteria is that the global BGP system has an advertisement for it, and that therefore ordinary BGP routers will forward packets with this DA to whichever router advertises the relevant prefix. BRIP addresses include those which are anycast by all systems other than Ivip. For instance, I understand that some root nameservers are implemented with multiple servers using anycast. Those addresses are BRIPs too. (This discussion assumes a single global Ivip system. How to define this term when there are multiple Ivip systems, including those which are not known publicly, would be trickier.) 2.4. UAIP - Un-Advertised IP address Any global unicast IP address which is not part of a currently advertised BGP prefix is a UAIP. UAIPs include addresses which have not been allocated by the IANA to any RIR, and which have not been assigned by an RIR (or other address assignment authority) to any end-user. The remainder of the UAIPs are in regions of the address space which has been assigned to a provider or AS-end-user but with they are not, at the moment, advertising. (This assumes that no router ever advertises a prefix its operators are not entitled to advertise, by virtue of that prefix not having been allocated or assigned.) [To do: link to Geoff Huston's site and my ping survey page's table.] 2.5. DID - Destination Identifier This is roughly synonymous with LISP's "EID" (Endpoint ID). A DID is an IP address which is an IMIP. "IMIP" is a subset of all the possible IP addresses. We can know that a packet's DA is within this IMIP set, so we know this specific address refers to a DID, of some particular IRH/IRN (Ivip-mapped Receiving Host/Node). A host or any non-ITR router doesn't recognise this. It is one of the tasks any kind of ITR must perform to recognise that the packet's address is in Whittle Expires January 16, 2008 [Page 31] Internet-Draft Ivip Architecture July 2007 the IMIP set, and therefore is a DID which must be used to look up mapping - in an internal set of copies of the IMAB-DBs or via some external Query Server. 2.6. TELOC - Tunnel Endpoint Locator A TELOC is a BRIP address which we, or an ITR, reasonably believes is the address of an ITR - because this address is found in the database as the mapping for one or more IMIPs. The ITR will encapsulate the packet, using the appropriate TELOC as the DA of the outer IP header. To all routers, the packet is just an ordinary packet addressed to some BRIP address. When it arrives at its destination, the idea is that this will be an ETR which decapsulates the original packet and forwards it to the host with the orginal DID address. However, the ITR doesn't know for sure this will happen. It simply tunnels the packet to the TELOC. "TELOC" is related to LISP's "RLOC" (Routing Locator), except I think that some LISP material uses "RLOC" to refer to any IP address which is not an EID. I think this is rather too loose a use of a single term, so for Ivip, "BRIP" means any advertised address which is not an IMIP. "DID" refers to the specific address of a packet, which is an IMIP, and "TELOC" refers to a specific address to which a packet is tunneled. 2.7. IMAB - Ivip-Mapped Address Block (This is what I previously referred to as a "master-subnet".) An IMAB is a contiguous range of address space for which a single RUAS (Root Update Authorisation System) is authorised to control the mapping for, and for which it does so via a single stream of update packets (US-IMAB) and a single IMAB-DBD (IMAB DataBase Dump) file. While the database structure, update messages etc. work fine for arbitrary starting and ending points for an IMAB, it is important that the IMAB can be advertised as a single BGP prefix. A straightforward prefix on binary boundaries can be an IMAB, such as 29.0.0.0/20. Assuming IPv4 for the rest of this definition, and assuming a /24 limit on the longest prefix which is admitted to the BGP system, all IMABs need to be on /24 boundaries. They should not involve a prefix any shorter than /8. An IMAB may straddle simple binary boundaries, as long as it is still acceptable to be advertised within BGP. For instance 29.0.1.0/20 is also a valid IMAB, covering 29.0.1.0 to 29.0.16.255. 29.0.1.128/20 Whittle Expires January 16, 2008 [Page 32] Internet-Draft Ivip Architecture July 2007 would not do, because it straddles a /24 boundary. It is not permissible to use a range such as 29.0.1.0 to 29.0.15.255 as an IMAB, since this does not match a full /19, /20 or /21 range. The reason for these restrictions is that when an ITRD (full "push" database ITR) downloads an IMAB-DB, decodes it and applies all real- time updates to it, it is then able to handle packets for the address range of the IMAB. At that point in time, it advertises the IMAB's prefix to its BGP peers. In order to reduce the number of advertised BGP routes and to reduce churn in the way they are advertised, it is desirable for every area of address space covered by a single database dump and by a single stream of update packets to match a single prefix which can be advertised in BGP. Where a single large range of contiguous addresses is for some scaling reason handled with separate database dumps and update streams, it should be divided into separate IMABs. This increases the number of BGP advertised prefixes, but may be justifiable, for instance within a large (eg. /8) prefix of IMIP space, so that ITRs can load share by each handling a subset of the entire /8. 2.8. IMAB-DB - IMAB DataBase This refers to the body of data which specifies the Ivip mapping of the individual IPv4 addresses (or /64s for IPv6) for a single IMAB. Within a RUAS (Root Update Authorisation System) there exists one or more copies of the Master IMAB-DB for each IMAB this RUAS is authoritative for. This is updated in real-time by Update Commands directly from end-users or from branch and leaf UASes (Update Authorisation Systems). ITRDs (full database ITRs) and QSDs (Query Servers with the full Database) maintain as best they can a real-time updated copy of each IMAB-DB for each IMAB in the Ivip system. This is a Slave copy of the IMAB-DB. The state of the slave copy is that it lags behind the master, ideally by only fractions of a second, but in practice probably by a few seconds - or more if there is congestion or lost packets in the Replicator system. The slave copy of the IMAB-DB directly controls the FIB of the ITRD, and how the QSD responds to queries. (In a server-based ITRD, the array which contains the raw mapping data is the FIB, because the packet handling code simply indexes into the appropriate location in the array for the appropriate IMAB, and reads the 32 bit result there.) Changes to the IMAB-DB may cause the QSD to send Notifications to child QSCs, ITRCs or ITFHs which previously received query responses concerning one or more IMIPs for which the mapping Whittle Expires January 16, 2008 [Page 33] Internet-Draft Ivip Architecture July 2007 has changed. Whereas LISP and APT carry a potentially large amount of information for each IP address or prefix within their database system (eg. multiple ETR addresses, TE parameters for choosing dynamically between multiple ETRs and in the case of APT, the end-user's public key), the Ivip database structure is extremely simple. Each element of the database contains a single IP address: 32 bits for IPv4 or 128 bits for IPv6. Typically, this is the address of an ETR, but in fact it could be any address, subject to certain off-limits ranges, including the prohibition of any addresses which is an IMIP. In practice, the value of the IP address would always point to a BRIP address, not to an unadvertised UAIP address. Consequently, the dump and the update messages for this database can be highly compressed and easily interpreted. (Any protocol handling these dumps or update messages should be backwards compatible extendable to incorporate further elements, but I can't think of a use for them at present.) The easiest way to think of this database is an array, where location 0 refers to the first IMIP in the IMAB. It is also possible to structure the database as a series of prefix rules, so for instance 16 contiguous addresses on binary boundaries with the same mapping could be specified by a rule to this effect, rather than with 16 separate IP addresses. For IPv4, I will assume the database is simply an array. For IPv6, it would probably be best to structure the database as prefix rules, since so many more address bits may vary over the range of the IMAB. (I guess IPv6 was designed by people who wrote programs in high level languages, rather than electronic hardware engineers!) 2.9. IMAB-DBD - IMAB DataBase Dump This is a file, typically compressed, which carries the full contents of the master IMAB-DB at some point in time. It is made available quickly at multiple servers so ITRDs and QSDs can download a copy when they boot up, or periodically afterwards. The dump file format needs to be carefully standardised. It should have an extendable format, and be compact for all typical data patterns. Probably a series of binary elements followed by a long array would be fine, all gzipped. However, maybe a specialised compression algorithm would be more efficient, be easier to implement at the ITRD or QSD, or provide some other benefits. The dump file needs to specify: the format of the file, such as by the RFC version it adheres to; the time and date it was created; a Whittle Expires January 16, 2008 [Page 34] Internet-Draft Ivip Architecture July 2007 number identifying the RUAS which generated it; a sequence number matching such a number in an update stream packet which signifies a dump was made at that instant; the AFI (Address Family Identifier) of the address space covered; the starting address and range of the address space covered; the BGP prefix which will be advertised once the ITRD has this data loaded and fully updated (perhaps this is redundant); finally, the array of addresses in some compressed form. There probably needs to be a CRC as well, with the ITRD or QSD able to ensure by some cryptographic means that the data is valid and really originates from the RUAS. 2.10. UMUC - User Mapping Update Command A UMUC is whatever action the end-user performs on one or more different user-interfaces of whatever UAS (Update Authorisation System) they use to change the mapping of their one or more IMIPs. The system would be able to tell the user the current mapping and also confirm that a requested change to the mapping was to an acceptable address. For now, I will assume that all UMACs are for valid mapping addresses - so a UMAC is a successfully accepted update command from the end- user, or some person or system or with the end-user's credentials. There probably needs to be a protocol by which a request to change to an invalid address, for example a UAIP, is rejected with an error message. The command takes the form of a starting IMIP, a range, and a single IP address to which this one or more IMIPs will have their mapping changed to. The UMUC exists only after the UAS has verified the credentials, the addresses and the new mapping address as being valid. The UMUC is then ready to be handed down either to alter the IMAB-DB itself, or to another UAS which achieves the same outcome. 2.11. SUMUC - Signed User Mapping Update Command This is the information contained in a UMUC, signed by the UAS which accepted it from the user (or by some lower UAS in the tree), being handed down the tree to another UAS, perhaps the RUAS of the tree, so that the recipient UAS can verify the signature and regard the UMUC as authoritative. 2.12. SH/SN - Sending Host/Node The host computer, or a router, which sends the packet in question. Other than the local network's checking the SA (Source Address) of the packet to decide whether it is from an authorised address, there Whittle Expires January 16, 2008 [Page 35] Internet-Draft Ivip Architecture July 2007 is no difference in Ivip whether the sending host or node has an IMIP or a BRIP address. 2.13. RH/RN - Receiving Host/Node The host computer, or a router, with an ordinary BRIP address (or prefix) which is intended to be the final recipient of the packet in question. 2.14. IRH/IRN - Ivip-mapped Receiving Host/Node The host computer, or a router, with an IMIP address (or prefix of IMIP addresses) which is intended to be the final recipient of the packet in question. An IRH or IRN does not need any address or prefix other than the one it has via the Ivip system. However it may have one or more addresses in the local network and it may have more than one IMIP address or prefix, each perhaps using a different ETR. 2.15. MH/MN - Mobile Host/Node A host computer, or a router, with an IMIP address (or prefix of IMIP addresses) which is using via one or more two-way tunnels it establishes with one or more TTRs (Translating Tunnel Routers). A Mobile Host or Node typically [To do - what is the proxy MIPv6 mode where this is not true??] has a "care-of" address in the one or more networks it is currently connected to. It also needs special software which operates from the care-of address, running the tunnel to and from the TTR, and connecting that tunnel with the main TCP/IP stack in the host or node. Please see the "Loose ends - TTRs and Mobility" section for a fuller description of how Ivip can help with Mobile IP. 2.16. UAS - Update Authorisation System This is a general term for a system which is operated by an organisation and plays some role between the user making a UMUC and the actual IMAB-DB being changed. Some UASes accept UMUCs as their inputs. Those which do not must accept SUMUCs from other UASes. A UAS may have end-user interfaces and links to branch or leaf UASes higher in the tree. Leaf UASes are at the ends of branches of a tree composed of UASes, with a single Root UAS at the base. Each UAS SHOULD be implemented as two or more linked but redundant servers, similar to the master and one or more slave arrangement of nameservers, with all of them being authoritative in terms of their interactions with other UASes and with end-users. Whittle Expires January 16, 2008 [Page 36] Internet-Draft Ivip Architecture July 2007 2.17. RUAS - Root Update Authorisation System A RUAS is the authoritative UAS for one or more IMABs. Therefore, it periodically generates - say every 10 minutes - an IMAB-DBD file. It also continually produces a stream of updates. The RUAS MUST be implemented as two (three?) or more redundant servers in geographically and topologically well-separated locations. The interactions between the RUAS and its branch and leaf UASes SHOULD be governed by some new IETF standards to ensure it is easy and robust to run these systems and have them interoperate securely. The set of other UASes each RUAS may interact with may be different for the authorisation tree for each of the potentially multiple IMABs it handles. The branch and leaf UASes in each such tree may also be members of other trees of this RUAS (for other IMABs) and of trees rooted in other RUASes. An RUAS may be a leaf or a branch in some other RUAS's tree, but in that role the system and its servers only behave as an ordinary UAS. 2.18. US-IMAB - Update Stream specific to one IMAB This is a stream of data, at present assumed to be UDP packets (but perhaps implemented in another way, such as a multicast system) by which the real-time updates to the mapping data for any one IMAB are conveyed. One or more identical US-IMAB streams are generated for each IMAB for which the RUAS which is authoritative. So each RUAS could be generating these streams for multiple IMABs. As described in a section below, these streams are replicated and delivered, with high reliability, to ITRDs and QSDs all over the Net - ideally within a second or so. 2.19. US-Complete - Update Stream for the Complete Ivip system This is the combined set of all US-IMAB streams which each ITRD or QSD needs. To what extent it is simply the sum of all US-IMAB packets simply replicated, or to what degree the first level of replicators compacts the data to reduce the number of packets, is yet to be determined. There are also problems to be solved when this US- Complete is missing packets. Theoretically, all Replicators get two copies of all US-IMAB streams, for redundancy. Ideally, each ITRD and QSC will get two separate US- Complete streams from two separate Replicators in widely topologically distinct locations on the Net, to enhance robustness. This is a crude doubling of bandwidth, but it might be better than Whittle Expires January 16, 2008 [Page 37] Internet-Draft Ivip Architecture July 2007 something more complex with lower bandwidth. 2.20. Replicator A system of Replicators form a redundant, reliable, high-speed distribution system for update streams. The Replicator system is only roughly described in this I-D. Its job is to get packets which together make up at least one US-Complete stream to every ITRD and QSC which needs it. Replicators could be implemented in routers, but are probably best implemented in ordinary software on a Linux/BSD etc. server. They don't need hard drive storage and do no caching of data. Replicators could be located within, or as stubs to, transit routers or border routers. Within large provider or AS-end-user networks, they would be servers or perhaps implemented in internal routers. An ITRD or QSD could also operate as a Replicator. 2.21. QSD - Query Server with full Database Like ITRDs, QSDs get a full feed of updates (at least one copy of US- Complete) from one or more Replicators. Like ITRDs, when they boot, they download individual IMAB-DBD files for each IMAB in the Ivip system. I write more about this in a section below on ITRs. Once their slave copies of the complete set of IMAB-DBs is up-to-date and being continually updated, they are ready to respond to queries. The query protocol needs to be defined, and is the same for queries from ITRCs, ITFHs and QSCs - Query Servers which Cache. The QSD needs to keep a record of responses sent out, and cache times (which ideally might be a single fixed time, to make it easy to implement). It keeps a watch on incoming changes to the many IMAB- DBs, and if any change affects IMIPs which were covered by a response it sent out which could be cached by another device, it sends out a Notification to that device, with the new information. A QSD could be integrated with a Replicator function, and perhaps an ITRD function - or for that matter an ETR function too. QSDs have no routing functions, so it would be overkill to implement this in a router. They need a lot of memory, so the best way to implement a QSD is probably on an ordinary server with one or more gigabit Ethernet interfaces. No hard drive is required, except perhaps for logging purposes. Whittle Expires January 16, 2008 [Page 38] Internet-Draft Ivip Architecture July 2007 2.22. QSC - Query Server with Cache A QSC could be implemented in a router. It does not route packets, but its memory and computational requirements are likely to be modest compared to those of a QSD. There is no need for a full feed of US- Complete data. However, there must be one or more upstream QSDs - or perhaps QSCs with upstream QSDs. The easiest way to implement this would be software on a modest server, which would only need a hard drive for logging purposes. In addition to handling queries from cache or by passing the query to one or two or more upstream QSDs or QSCs, the QSC needs to keep a record of responses sent out to this queriers - which are ITRCs, ITFHs or other QSCs. When it receives a Notification from its upstream QSD/QSC, it needs to look at those records and decide which of its queriers to send the Notification to. Small sites could use one or more QSCs for local ITRCs and ITFHs, relying on one or more external QSD to answer all queries. This saves bringing a full US-Complete feed into the site and it saves on the RAM needed for a full QSD. 2.23. ITR - Ingress Tunnel Router A general term for a router or server which accepts packets with DA = an IMIP and which encapsulates the packet, with the outer IP header having a DA of some BRIP address the end-user chose as the mapping for this IMIP. That address will presumably cause the packet to arrive at an ETR, which decapsulates it and forwards the packet to the Destination Node. The ITR has a locally configured set of limits which prevent it from tunneling packets to certain ranges of addresses, including those defined for protecting critical infrastructure against Ivip malfunction, and including all IMAB addresses. This set of limits is downloaded regularly and securely, so that over time, these limits can be altered. 2.24. ITRD - Ingress Tunnel Router with Database An ITR with a full copy of all IMAB-DBs, updated in real time by the US-Complete it gets from one or ideally two Replicators. The updates alter the local copy of each IMAB-DB and cause a corresponding change in the FIB of the router, which finds and tunnels every incoming packet with an IMIP DA. (Unless the address in the database for that IMIP is zero or within a banned region, in which case the packet is dropped.) Whittle Expires January 16, 2008 [Page 39] Internet-Draft Ivip Architecture July 2007 ITRDs can be implemented in a suitable router with lots of RAM, CPU power and a very high capacity FIB, in terms of the ability to tunnel packets and in terms of how many rules can be applied, down to potentially millions of /32 (IPv4) or /64 (IPv6) prefixes. I explore in a section below how an approximately 1 gigabit ITRD could be built using commonly available server hardware. For a well developed Ivip system, this will require quite a few gigabytes of RAM - since the best way to implement the database and FIB is as a series of arrays with 32 bits (128 bits for IPv6 - urrgh!) for each mapped address (or /64 for IPv6). An ITRD might also implement the Replicator, QSD and/or ETR functions. 2.25. ITRC - Ingress Tunnel Router with Cache An ITR without a full copy of all the IMAB-DBs - and so not requiring a US-Complete stream from one or more Replicators. The ITRC gains mapping information from a nearby QSD, perhaps by one or more intermediate QSCs. It may hold every packet it receives with an IMIP DA until it requests and receives mapping information. In this case, it handles every packet with DA within an IMAB - generally as quickly as a full ITRD. Whenever an ITRC chooses to request mapping information from the one or more QSD/QSC systems it relies upon (two separate systems might be more robust, especially if the query and response is sent via UDP), its request specifies a single IP address, the DID of this packet, which it already knows is an IMIP address. The response it receives will concern that DID address, and potentially one or more IMIP addresses above and below this address - all of which have the same mapping. So the response will consist of a starting address, a range, and a TELOC IP address which will become the DA for the encapsulated packet for any incoming packet with a DA within this range. There may also be an explicit caching time for this response, or perhaps a default, system-wide, constant caching time such as 600 seconds. The ITRC uses this mapping information, updating its FIB accordingly, for the caching time. At the end of that time, it may choose to make another query - which it would ordinarily only do if it is still receiving packets within that range. At any time during the caching period, if the QSD which answered the query (or provided an answer to a QSC which actually answered this Whittle Expires January 16, 2008 [Page 40] Internet-Draft Ivip Architecture July 2007 ITRC's query) recognises a change in the relevant IMAB-DB which affects the range of addresses in the response this ITRC received to its query, then the QSD will send a Notification. The Notification may pass through multiple QSCs, but will reach this ITRC and any other ITRCs which received similar responses. ITRCs do not need a massive FIB, but if they are a router, their FIB needs to be able to encapsulate packets and handle a substantial number of rules, depending on the volume and nature of the traffic. CPU involvement would be modest to substantial. An ITRC could be implemented in a server with modest memory requirements. It requires only modest bandwidth (compared to a full US-Complete feed) for the queries, responses and Notifications with its one or more parent QSDs or QSCs. An ITRC faces some choices regarding which packets to try to gain mapping information for. Firstly, it needs some way of identifying incoming packets as having a DA which matches one of the IMIPs or ranges of IMIPs which it already has mapping information for. Those packets should be encapsulated immediately according to that mapping information. Secondly, the FIB needs a way of detecting which packets arrive with IMIP DAs, but which are not currently matched by one of the existing encapsulation rules. I guess the most advanced routers such as the CRS-1, M120 and MX960 have such flexible ASIC and RAM FIBs that with suitable firmware, they could do this sort of thing. I would be surprised if lesser routers could be programmed to do this sort of thing efficiently. Also the router needs to reliably monitor which of its currently cached rules are still being used by packets. Furthermore, the router may need an efficient way of only requesting mapping information for packets whose DA appears more than once. If the ITRC doesn't quickly (fraction of a second) gain the mapping information for every IMIP packet it receives, and/or if its RAM or FIB can't hold all these rules and mappings, then it has to decide what to do with packets which it cannot at present tunnel to the correct address. One option is to drop the packets - but this is unlikely to be acceptable. Another is to let the packet be forwarded towards a peer router which also advertises the complete set of IMAB prefixes. If that peer is an ITRD, or this path leads to some ITRD in the core, then it is probably acceptable to let a small proportion of packets pass like this. Whittle Expires January 16, 2008 [Page 41] Internet-Draft Ivip Architecture July 2007 Alternatively, these untunneled packets, assuming the router can identify every one, could be forwarded or tunneled to a nearby ITRD. A bunch of ITRCs could therefore take most of the load, with the ITRD instantly tunneling a fraction of the network's total DA=IMIP packets. An ITRC might also implement the QSC and/or an ETR function. 2.26. ITFH - Ingress Tunneling Function in Host A host which is not behind a NAT could have additional software in its TCP/IP stack to perform the ITRC functions described above. It needs a good link to a nearby QSD/QSC system - so this would not be suitable over a dialup modem or radio link. Host software, CPU power and RAM is free, provided there is enough of it. This would greatly reduce the load on any ITRCs and perhaps ITRDs in the rest of the network. An ITFH function would be highly desirable in every web server in a hosting company. As with ITRCs, ITFHs need to have some kind of backup ITRD to handle packets they can't tunnel. As with ITRCs, ideally the location of two or more nearby QSDs or QSCs should be auto-discovered. Likewise the location of two or more ITRDs if there is a way of explicitly tunneling packets to them when the ITFH doesn't have the mapping or FIB capacity to tunnel them itself. The ITFH device doesn't need to be on a BRIP address (neither does an ITRD or ITRC, but I usually assume "routers" are on BRIP addresses), but it cannot be behind a NAT. A host performing NAT functions for some hosts on a private network is a good place to implement ITFH, as long as this host is not behind NAT itself. The most common NAT situation is a DSL or cable modem (or an optical home/SOHO adaptor too). I have referred to performing Ingress Tunnelling functions in such a modem as ITFH, but I guess they are formally a router, not a host, so maybe it would be purely software-based ITRC function as a firmware upgrade. ITRCs and ITFHs could easily be overwhelmed by a large number of different DA addresses inside the caching period, so they need to be able to drop old cached mapping data when their RAM or FIB can't handle it. They need to be in a network position where an upstream ITRD will always find their packets. In principle, with Ivip, this is always the case, depending on how congested the nearest "anycast core-ITR is". Whittle Expires January 16, 2008 [Page 42] Internet-Draft Ivip Architecture July 2007 2.27. ETR - Egress Tunnel Router An ETR is a router or a server which receives encapsulated packets on one of its one or more BRIP addresses, strips off the outer IP header, copying its hop-count to the internal packet, and then by some means ensures the resulting packet is delivered to the IRH/IRN (the receiving host/node with an Ivip-mapped address). There needs to be some local network management system which can tell the IRH/IRN - or at least the end-user by some means, where the one or more usable ETRs are. This management system may also need to ensure the local routing system can deliver decapsulated packets with DA=DID to the IRH/IRN. The ETR is not necessarily the device to be responsible for this, because ETRs can die and there should be another available to select by the end-user changing the Ivip-mapping of their IMIP. Ivip ETRs don't need any fancy functions, management or protocols - they just accept any IP-in-IP packet they get on one or more of their BRIP addresses, decapsulate it, and - if the DA matches an address the ETR and the local routing system is ready to handle - forward the packet to its destination host or link to the end-user's site. 2.28. ETFH - Egress Tunnel Function in Host I haven't given much thought to this. Maybe it would be useful for a host with a local care-off address to do its own ETR functions, rather than relying on a separate ETR. Perhaps the host has another link to another network for multihoming. This resembles some mobile-IP situations. 2.29. TTR - Translating Tunnel Router for Mobile-IP A TTR behaves like an ETR as far as the Ivip system (IMAB-DBs and ITRs) are concerned - it is simply a device with a BRIP address to which packets are tunneled. A MN/MH establishes a two-way tunnel to the TTR from its care-of address, which can be behind NAT. The MN/MH may have such tunnels to other TTRs, including via different edge networks - such as one link over WiFi, another over UMTS and a third via an Ethernet cable. I have not looked into the details of such tunnels, but lets say it is a two-way IP-in-IP tunnel or some other type, perhaps with compression and encryption. A TTR may be in the provider network with which a MH/MH has a link, in which case the provider network probably runs it and pays for it. In this case, the TTR will need to be authorised to forward the packets with the source address being that of the MN/MH's IMIP addresses. Whittle Expires January 16, 2008 [Page 43] Internet-Draft Ivip Architecture July 2007 A TTR may be outside the current provider network where the MN/MH has its care-of address. In this case, the end-user will probably need secured access to it, and have to pay some TTR network for using the TTR. This TTR network might have central monitoring systems and autodiscovery software in the MN/MH which automate the process by which the MN/MH finds TTRs, and by which the TTR network controller changes the mapping by issuing a UMUC to whichever UAS handles the end-user's IMIP address(es). Please see the "Loose ends - TTRs and Mobility" section for a fuller description of how Ivip can help with Mobile IP. In some ways, the TTR resembles a home-agent in Mobile IPv6. However, there can be TTRs all over the world, inside and outside provider networks. The MH/MH can choose two or more "nearby" TTRs and either by itself, or more likely via some centralised monitoring system, cause the ITRs of the world to beam packets to whichever TTR it has the best link with. This means highly optimal paths in both directions between correspondent hosts anywhere and MN/MHs anywhere, including as the MN/MH moves from one place to another. Handover from one TTR to another in the event a radio link fails is unlikely to be fast enough to support glitch-free VoIP, but this still represents a tremendous boost for mobile IP. Firstly, it gives optimal paths without any fixed home-agent. Secondly it will work fine with IPv4! Mobile IP is not such a big thing that a global system of ITRs and databases etc. would be built. But we clearly need to build something like this to avoid the entire BGP system being swamped. Once built, it enables a great new approach to Mobile IP. I can't see anything specific which needs to be added to Ivip itself - the UASes, Replicators, ITRs etc. specifically for facilitating mobile IP. However, since mobile IP involves many changes to mapping, compared to end-users who use Ivip for portability or multihoming without fancy TE, mobile-IP end-users are going to be placing quite a load on the entire Ivip system. They need to pay for this in some way, and it could be argued that the whole Ivip system is so valuable to mobile-IP end-users that it could be run at a profit just by charging them for their frequent changes and generally rather low traffic volumes, Whittle Expires January 16, 2008 [Page 44] Internet-Draft Ivip Architecture July 2007 3. The Crisis in Routing and Addressing I don't have time to do this section properly. Obviously there needs to be reference to the RAWS report etc. [IAB-RAWS-website] and to work which is yet to be completed such as RADIR's "Problem Statement" and the RRG's "Design Goals I-D". Here are some points which attempt to summarise the situation. 3.1. Interrelated needs and problems Internet routers cannot handle millions of separate routes. Therefore, in order to maximise the number of separate subnets which are advertised at border routers, while decreasing the number of routes each router needs to implement, "route aggregation" must be maximised. Route aggregation can be maximised by end-users obtaining Provider Assigned (PA, AKA Provider Aggregatable) addresses from their ISPs rather than becoming Autonomous Systems and obtaining their own Provider Independent (PI) addresses. Route aggregation can also be maximised by trying to encourage ISPs and AS-end-users to connect prefixes which are close to each other in the address range at topologically close points in the Internet. Route aggregation can also be maximised by giving ISPs and AS-end- users rather large subnets of address space, infrequently, rather than frequently giving them more numerous smaller subnets, which would necessarily break up the address space into more divisions. Large and medium sized end-users, such as companies and businesses, need IP addresses which are portable, meaning PI addresses. This has long been allowed for IPv4, but the general absence of this to date for IPv6 has been a barrier to the adoption of IPv6. [To do: link to ARIN policy which apparently allows this, and to proposal for RIPE policy change.] These end-users also need PI addresses because this is currently the only way of achieving multihoming. Multihoming is maximised when the prefix can be advertised in widely different parts of the network topology, which is directly at odds with Route Aggregation. All end-users must become Autonomous Systems and invest heavily in BGP expertise and routers before they can obtain PI address space and Whittle Expires January 16, 2008 [Page 45] Internet-Draft Ivip Architecture July 2007 therefore portability and multihoming. [Is this true, or can an end- user get their own PI space without becoming an AS? I think someone told me they could, but this surprised me.] Many such end-users and millions of businesses and organisations with less resources want and arguably need both portability and multihoming. Consequently, the only way large end users can meet their needs for portability and multihoming is to advertise more and more prefixes, fuelling growth in the "global BGP routing table" to the point where ISPs, AS-end-users and other router operators (transit providers) fear overly rapid obsolescence of routers and very high replacement costs for more powerful routers. Fresh supplies of IPv4 address space are projected to run out in 2010. [To do: link to Geoff Huston's site.] Yet if routers had always been capable of handling tens or hundreds of millions or routes, there would be no shortage yet, because the address space would have been handed out in smaller, more efficiently used chunks, because there would have been no imperative to maximise route aggregation. Some administrative and technical changes must be made to allow and encourage the more efficient use of the 3.7 billion IP addresses available in the IPv4 Internet. Current utilisation rates of the portion which has been assigned so far cannot be know for certain, but are probably 10% or less on average, while some areas of the address space are used with efficiencies of 20% or more. [RW ping survey] 3.2. Constraints on possible solutions The constraints upon solutions are quite oppressive. It is not practical to upgrade or alter the operating systems or application programs of any Internet computers, including servers and desktop machines. While some improvements in the BGP network may be achieved, the benefits which might be achieved are not of the scale which would be required to cope with current growth rates in the "global BGP routing table". Ideally, most of the BGP routers in use today will still be usable for five or more years, but this will only occur if end-user needs can be met while halting or drastically slowing the growth in the "global BGP routing table". A wholesale move to IPv6 is not possible - and IPv6 packets are Whittle Expires January 16, 2008 [Page 46] Internet-Draft Ivip Architecture July 2007 harder for routers to classify, while the problems of the growth in the IPv6 BGP routing table are just the same as for IPv4. Any new system must be backwards compatible with existing software, ISP and AS-end-user edge network structure, BGP routers and the existing BGP routing system. Any new solution must be incrementally deployable. This means that there must be some immediate benefit for those who first make the effort to install the new system. The most likely solution is a major change to the Internet's routing and addressing architecture, in the form of an "overlay network" which provides new flexibility for portability and multihoming - and ideally for TE - for many more end-users without them needing to become Autonomous Systems, gain PI space or add to the number of routes in the global BGP routing table. Whittle Expires January 16, 2008 [Page 47] Internet-Draft Ivip Architecture July 2007 4. Potential Solutions [To do: point to LISP I-Ds. SHIM6, eFIT - APT, Tony Li's and Geoff Huston's BGP ID and other BGP potential improvements. Maybe list more esoteric non-backwards compatible ID/Loc split work.] Whittle Expires January 16, 2008 [Page 48] Internet-Draft Ivip Architecture July 2007 5. Comparison with LISP This section contains three lists of principles and mechanisms: those in LISP which are used by Ivip, those in LISP which are not used by Ivip, and those which are not in LISP and which are in Ivip. These lists concern principles and mechanisms, not outcomes or functional features of each entire system. This is based on my potentially incorrect understanding of LISP, based on the current IDs LISP-010 and LISP-CONS-01. There is also LISP-NERD which I am not really considering here. 5.1. LISP principles and mechanisms used by Ivip The basic concept of using some existing IP addresses to identify a final destination host, and others to locate an ETR which can deliver the packets to that host. This is in contrast to some Id/Loc separation schemes which propose major alterations to the TCP/IP stack so these two types of addressing are performed by separate types of address which are not at all the same - and with one or both of these being incompatible with the existing concept of an IP address. Changing some parts of the routing and addressing system, but not host operating systems or applications. Both LISP and Ivip are invisible to hosts. Applicability in principle to IPv6, with most work at present developing a good system for IPv4. Potential carriage of IPv4 traffic over IPv6 tunnels and vice-versa. Intended to be incrementally deployable, to minimise the software changes required in routers and to require no hardware changes in routers. However, to what extent the ITR functions of encapsulation - and for LISP, quite complex communication with ETRs - can be implemented in existing FIB hardware would depend very much on the model of router. (Perhaps it is fine for these communications to be done by the main CPU. LISP's encapsulation is more complex than Ivip's and I think it often, or always, involves a nonce - a single- use random number.) The ITR term and basic concept. (With Ivip the ITR is purely for encapsulating a packet to be sent to an ETR. Ivip does not involve messaging between ITR and ETR, either in headers or separate packets.) ITRs are globally distributed and are close to sending hosts - to encapsulate packets which are destined for LISP/ Ivip-mapped hosts in networks all over the Net. (LISP ITRs are in the network of the sending host, which is the ideal location for Ivip Whittle Expires January 16, 2008 [Page 49] Internet-Draft Ivip Architecture July 2007 ITRs, but Ivip also has "anycast ITRs in the core".) The ETR term and basic concept. (Ivip ETRs simply decapsulate packets. They do not engage in communication with the ITR, or make use of any information in the encapsulation header except the hop- count, which as with LISP, is copied to the decapsulated packet.) With both LISP and Ivip, the ETR or the local routing system needs to be configured to deliver the decapsulated packets to the destination host. Except for Ivip's TTR, which is a type of ETR, ETRs are always inside provider or end-user networks, and are close to the host for which they are decapsulating packets. Providing portability (not LISP 1.0 or 1.5?), multihoming and TE, without involving BGP and without requiring any changes to host applications or operating systems. (Ivip has no explicit TE capability and its TE goals may be more modest, and require external control systems.) Depending on the response time of the database and ITR system, both LISP and Ivip will ideally support mobility while existing communication sessions continue on the same EID (DID for Ivip) IP address. Not altering all the BGP routers. The best place to put ETR and ITR functions in LISP is often in border routers, which are BGP routers, but I understand that transit routers are typically unmodified. (Ivip requires a large number, but still only a small proportion, of "transit" BGP routers to be ITRs - although the same effect can be achieved if many border routers are ITRs which accept raw packets from unmodified networks and tunnel them to wherever the packets need to be tunneled - not just to ETRs in the network in which the border router is located.) Some kind of global database, distributed databases etc. to control all ITRs at the same time. (The current Ivip proposal for how that data is altered by users and distributed to ITRs is very different from the current LISP proposals.) 5.2. LISP principles and mechanisms not used by Ivip The EID and RLOC terminology. Ivip uses new terms which are comparable, but which have somewhat different meanings. Likewise, while Ivip may to some extent be an instance of an Id/Loc separation system, I haven't made this a focus of the design or terminology. LISP-00 (January 2007) used IP-in-IP encapsulation, as does Ivip. LISP-01 (July 2007, or at least the draft I have seen) uses UDP encapsulation instead. Whittle Expires January 16, 2008 [Page 50] Internet-Draft Ivip Architecture July 2007 Recursive tunneling and recapsulating tunnels. There is nothing in Ivip to prevent an ETR sending packets in a tunnel to somewhere else, but it is not contemplated in the Ivip architecture, except for TTRs, which are ETRs with a two-way tunnel established by the mobile node. LISP header, UDP encapsulation and UDP message packets between ITR and ETR. Ivip uses no special header or UDP - just IP-in-IP encapsulation. It could use UDP or some other encapsulation method, but at present I think IP-in-IP is fine. The Ivip IP-in-IP header contains no extra information, nonces etc. So Ivip isn't really a protocol between ITR and ETR, but a plan for running ITRs and ETRs, with protocols, databases, replicators etc. so the end-users can securely and quickly alter the tunneling behavior of all the world's ITRs. Ivip only works with a centralised or set of centralised databases. There is no equivalent to the more ad-hoc arrangement of LISP 1.0 or 1.5, in which ETRs are the authoritative source of mapping information and in which they communicate back to the ITR in various ways. Ivip encapsulated packets never have the ITR's own IP addresses as the source address, as is done for at least some encapsulations with LISP 1.5 and perhaps other variants too. Ivip's outer IP header has a DA of the ETR's address and the SA of the sending host. I think LISP includes some functions for testing and confirming reachability between ETR and ITR. Also, I think the ETR can signal to the ITR which of the various alternative ETRs should currently be used or not used. Ivip doesn't involve any reachability tests or communication about reachability. A practical multihoming arrangement would require some separate monitoring system which constantly tests reachability, and/or receives reports about non- reachability, so it can send update information to the Ivip database to select a different ETR. I understand that with at least some LISP variants - perhaps only 1.0 or 1.5 - the ITR in a network with some sending hosts with EID (I call them "LISP-mapped") addresses will NAT the packets sent by these hosts to ordinary non-LISP-mapped addresses. So the packet received by the ordinary destination host has a SA of the ITR, which is an ordinary non-LISP-mapped address which is reachable via BGP. This enables the destination host to send packets back in the opposite direction, making the hosts with EID addresses reachable, as long as these EID hosts initiate the communication. (I don't understand how the destination host could then distinguish between the packets sent from different sending hosts behind this NAT arrangement.) The encapsulated packet leaving an Ivip ITR has the original SA and a new DA (of the ETR) in its outer IP header. Whittle Expires January 16, 2008 [Page 51] Internet-Draft Ivip Architecture July 2007 LISP's explicit TE parameters in the database and which are communicated to the ITR (Priority and Weight) may require the ITR to make some complex decisions, which I think would be difficult or impossible in existing FIB hardware. Although I think these TE functions are powerful and elegant, this extra complexity in the database and the ITR is not a part of Ivip. Ivip's database maps each IPv4 address (or each /64 for IPv6) to a single 32 bit (128 bit for IPv6) address. There is no other data in each element of the database array, although "0" means "drop the packet" and it is possible that some ranges of values, such as above 224.0.0.0, may be used for special purposes in the future. (I have no idea what for.) LISP is intended to handle multicast in some way. I don't yet understand this or understand how multicast is used over the public Internet, so Ivip at present is only concerned with global unicast addresses. Ivip is not concerned with the reachability of what for LISP are the RLOC addresses of ETRs. The Ivip ITR tunnels packets to a particular IP address for each Ivip-mapped IP address. What happens if there is no ETR there, or no host, or no route to that address, is not defined within Ivip. It is up to the end-user and whatever multihoming monitoring system they use to ensure the database contains an IP address which does something useful for them - and which does not create problems for others. The LISP-01 I-D only discusses LISP 1.0 and 1.5, so it is not possible to know what encapsulation techniques are to be used for LISP 3.x. In the RAM mailing list (2007 July 13 - msg01703) Dino Farinacci discussed whether there would be Map-Reply messages from ETRs to ITRs resulting from the ETR receiving a data packet: not for CONS and NERD and perhaps for APT. It is not clear what communication, if any, would take place between the ITR and ETR in these 3.x variants - either in the encapsulation scheme for getting raw packets to the ETR, or in separate message packets in either direction. Nor is it clear whether LISP 3.x would use for the source address of the outer header of its encapsulated packets the ITR's address or that of the sending host. For LISP 1 and 1.5, the encapsulation is UDP with the outer SA (source address) being one of the ITRs RLOC addresses. Ivip, as currently defined, uses IP-in-IP encapsulation with the SA of the outer IP header being that of the sending host - the SA of the inner IP header. Please see the "Loose Ends: ETRs checking src & dest addresses" section below for discussion of why I believe it is best to use the outer SA being the same as the inner SA - because it greatly simplifies enforcing a particular kind of security filtering. Whittle Expires January 16, 2008 [Page 52] Internet-Draft Ivip Architecture July 2007 There are some disadvantages to this if "outer SA = inner SA" is used with IP-in-IP encapsulation, because the tunneled packet carries no indication of which ITR encapsulated it. If this is found to be a serious enough problem, then Ivip could have its encapsulation changed to use UDP, like LISP, retaining the "outer SA = inner SA", but including some Ivip data at the start of the UDP body, containing at least the address of the ITR. Any such use of UDP and an extra "Ivip header" structure, which would of course have a structure which supports adding new elements to it in the future, would add to the overhead in every tunneled packet, and require further work at the ITR and ETR. Such an extendable header would have the advantage of enabling Ivip to do things in the future which are not currently contemplated. Most of the complexity of LISP 1 and 1.5 ITR and ETR behavior is absent from Ivip. It is not known how complex LISP 3.x ITR and ETR behaviour is, but LISP in general attempts explicit TE functions which are not part of Ivip. The LISP 3.x database contains more information and is more complex than Ivip's - which makes it more difficult to implement full database (push) ITRs in LISP than in Ivip. Likewise, any push mapping database distribution system for LISP is likely to be even more challenging than Ivip's, for a given rate of updates and number of recipient devices. This Ivip I-D is longer than LISP's because I explore a wide variety of ways in which Ivip could be used and because I explore the full user-control, database and update distribution system in some detail. 5.3. Additional principles and mechanisms in Ivip Any of these principles and mechanisms might be applied to another system, including LISP - but some of them would only make sense in LISP if other aspects of LISP were changed to more closely resemble Ivip. Ivip has three kinds of ITR function: ITRD, ITRC and ITFH, while LISP has only the first two: the full database and the caching variety. ITFHs could easily be added to LISP. Ivip could have the ETR function performed in the destination host. This resembles mobile IP in that the host must have additional TCP/IP software and must have a conventional care-of address. This may conflict with some goals, such as efficient use of address space. This could easily be added to LISP. Ivip's tree-structured UAS system, with multiple such systems feeding update packets via a distributed replication network to ITRDs and QSDs is quite different from any current LISP proposal I know of. This could easily be adopted by LISP. Whittle Expires January 16, 2008 [Page 53] Internet-Draft Ivip Architecture July 2007 I think the QSD and QSC system, with Notification (cache invalidation) messages is unlike anything in LISP. This could be adopted by LISP, I think. Support for TTRs for mobile IP. I don't think this is explicitly mentioned in LISP I-Ds, but it could be adopted by LISP, as long as the TTRs fulfilled all the ETR functions of LISP, which are more demanding than what Ivip requires. Whittle Expires January 16, 2008 [Page 54] Internet-Draft Ivip Architecture July 2007 6. Ivip's goals, non-goals and challenges [To do.] Whittle Expires January 16, 2008 [Page 55] Internet-Draft Ivip Architecture July 2007 7. User Interface and Update Authorities Here are some ideas about how mapping information might be controlled. This is different from keeping the authoritative data in multiple locations and having to work backwards to find the authoritative server, as in the DNS, and as described in draft-meyer-lisp-cons. Nonetheless, the authority to change the mapping for the IP addresses is delegated in a distributed fashion similar to the DNS. The entire Ivip system has a number (perhaps thousands or even tens of thousands) of IMABs (Ivip-Mapped Address Blocks). The mapping for each IMAB is controlled by a single body of data - an IMAB-DB. This is maintained by a single system, typically comprising multiple redundant servers separated from each other, called a RUAS (Root Update Authorisation System). In the diagram below, I draw this as a single entity, but in fact it acts as one entity while being physically distributed over several servers. Each RUAS can be authoritative for one or more IMABs. Once an ITRD has a complete, real-time updated, slave copy of a particular IMAB's IMAB-DB, it can program its FIB to match and tunnel these packets according to every entry in the IMAB-DB. So then it can advertise the BGP prefix for this IMAB and start accepting and tunneling packets. For this reason, it is desirable that there be a 1:1 relationship between the IMAB, which is defined as a contiguous range of IMIP address space which can be advertised as a single BGP prefix, and a single IMAB-DB database which carries the mapping for this range. The single IMAB-DB generates two things which are used by ITRDs and QSDs. Firstly a regular (every 10 minutes?) IMAB-DBD dump file is generated, and can be downloaded by any recently booted ITRD or QSD. Secondly, there is a stream of update packets - a US-IMAB - which is specific to this particular IMAB. The Replicator system gets this stream to the ITRDs and QSCs which need it - and they typically need it fast, without missing packets, and with the US-IMAB of every other IMAB in the Ivip system. The idea is that for scaling purposes, especially due to problems with FIB rule capacity and packet handling speed, it may be desirable to split a total ITRD function over several co-located ITRDs each of which handles a fraction of the total set of IMABs. In this case, each ITRD needs to be able to tell its one or more Replicators to send it only US-IMABs only for the IMABs it is handling. Likewise, when each such ITRD boots, it only needs to download a subset of the IMAB-DBDs to get started. The format of these US-IMAB streams is discussed in the section on Replicators. Whittle Expires January 16, 2008 [Page 56] Internet-Draft Ivip Architecture July 2007 In this section, I depict a single tree of delegated responsibility for the user control of mapping of one IMAB. The Root UAS at the base of the tree is run by Company X - RUAS-X. RUAS-X could be authoritative for other IMABs, and each such tree of delegation may have the same set of other UAS systems, or it could be different. Each delegation tree is separate from the delegation trees of other IMABs, even if they look similar, because the tree includes specific subsets of the whole IMAB address range as one of the defining characteristics of its branches and leaves. The initial action which leads to the database being changed is a user generated (manually or by the user's equipment or by a system authorised by the user) UMUC (User Mapping Update Command). For authorising and feeding UMUCs to the RUAS-X, there is a tree as depicted in Figure 4. Delegation of authority flows up the tree as the total address range of the UMAB is split at each branching junction. This tree structure involves data, in the form of SUMUCs (Signed User Mapping Updated Commands) flowing down towards the root of the tree. (Data would also flow up the tree so each user- interface leaf could tell end-users what their current mapping was, could test their requests against constraints etc.) The idea is that RUAS-X could delegate control of one or more subsets of the UMAB's total range of addresses to some other system, which in turn would delegate control to other systems. There would be no absolute limit on the height (usually called depth) of these hierarchies. Ultimately, from the point of view of the end-user, there needs to be a username and password (or some crypto private key challenge response system) by which they manually (via a web interface), or in some automated way, control the DID to TELOC mapping of their one or more Ivip-mapped addresses. The system would also tell them what the current mapping is, and enable them to test a potential new mapping address to see whether it was valid, given that it must be a BRIP, and be outside certain well-known ranges which are protected from being tunnel endpoints in order to ensure critical infrastructure can't accidentally or maliciously be interfered with by tunneled packets from ITRs. The servers which handle the end-user interaction needs to be one of the leaves of this tree structure, so as not to burden the RUAS-X database servers themselves with this messy stuff. This enables various companies to give different kinds of control for the Ivip- mapping of the IP addresses their branch of the tree controls. Figure 4 does not show RUAS-X having any user interface servers, but it could. The simplest arrangement would be the RUAS having simply a user-interface server and no tree of other UASes. Whittle Expires January 16, 2008 [Page 57] Internet-Draft Ivip Architecture July 2007 There would need to be IETF standardised methods by which some server could execute a UMAC with the user-interface servers of any of these UASes. This standardisation would be especially important for multihoming, because some reasonably trusted company could run an automated monitoring system, and have the credentials (username, password, key etc.) stored in their system so their system can change the mapping of one or more IP addresses the moment one link seems not to be working. Also, the company which controls a particular range of the Ivip-mapped space (such as X, Y or Z in Figure 4) may offer such a multihoming monitoring system itself. The tree in this example controls an IMAB with the address range 20.0.0.0 to 20.3.255.255. Let's say company X has authority (perhaps direct from the Ivip system or because X assigned this space which it got from an RIR to the Ivip system) over the entire range 20.0.0.0 to 20.3.255.255. It sublets to Y a quarter of this: 20.1.0.0 to 20.1.255.255. I am making these examples on binary boundaries, but there is no reason why the divisions should be like this. It would be just as possible for X to delegate to Y an arbitrary subset of the whole range, or the entire range, or just one IP address. X's Root Update Authorisation Server (RUAS) has a private key for signing all the IMAB-DBD dumps it periodically creates and makes available. (Actually, it probably signs a message which attests to the MD5 hash of each IMAB-DB file.) This key is also used in some way with a corresponding public key so Replicators and/or ITRDs and QSDs can check that the US-IMAB they are getting has not been corrupted. (This could be quite an onerous task.) The rest of the Ivip system - the Replicators, ITRDs and QSDs - neither know nor care about company Y or Z, or about any particular end-user. All the rest of the Ivip system knows is the various instances like RUAS-X, each an organisation with a public key for authenticating streams of US-IMAB update packets they generate, and the corresponding IMAB-DBD dump files, for a given subset of the total Ivip-mapped address space. This could be any arbitrary subset, but for simplicity I will assume that X only has authority over this one IMAB 20.0.0.0 to 20.3.255.255. Let's say Y delegates control of some of its space to company Z, and that Z has an end-user U, who needs to control the mapping of one or more IP addresses in Z's range. Z has various interfaces by which U can do this, with its own arrangements for authentication, for monitoring a multihoming system and making changes automatically etc. Hopefully there would be one or more automated, host-to-server, IETF-standardised protocols so all end users could have standardised software for talking to whichever Whittle Expires January 16, 2008 [Page 58] Internet-Draft Ivip Architecture July 2007 company's servers they use to control the mapping of their IP address(es). Whittle Expires January 16, 2008 [Page 59] Internet-Draft Ivip Architecture July 2007 User-R User-S User-T User-U Multihoming \ \ | | Monitoring \ \ | | Inc. \ ................. / \----. Web interface .---/ . other protocols . . etc. . ....UAS-Z........ | Other companies | like Y and Z | /-----<----/ | | \ | / | | \|/ | | UAS-Y \ | | \ | /----<-----/ \ | / \|/ RUAS-X Root Update Authorisation Server company X | \ | \ V \->-[Multiple web servers for IMAB-DBD files] | | | Other RUASes like RUAS-X, each authoritative | for mapping one or more IMABs and producing | regular IMAB-DBD dumps and streams of US-IMAB | update streams to securely control the ITRs | and Query Servers. \ \ | | | / \ | | | / \ | | | / \ | | | / \ | | | / \ | | | | | | | | | V V V V V | | | | | Each line depicts 30 or so streams of identical packets for each US-IMAB stream - one for each Level 1 Replicator, which are depicted in the next section. Whittle Expires January 16, 2008 [Page 60] Internet-Draft Ivip Architecture July 2007 Figure 4: Delegation tree of UASes above one RUAS. Let's say User-U wants to change the mapping of their one IMIP via a web interface - or a range of IMIPs - to a new TELOC. User-U does this via Z's website, authenticating him-, her- or it-self, by whatever means Z requires, and gives the command (UMUC) to map their IMIP to a new IP address (typically the address of another ETR). This causes UAS-Z to generate a signed copy of this update command (a SUMUC, according to some future IETF standard, of course) and to send it to UAS-Y. The SUMUC consists of three items (assuming IPv4 for simplicity): A starting address for which IMIP address this update covers, a range, being at least one, and a new mapping value, which will also be a 32 bit integer. It could also consist of a time in the future the update should be executed. Exactly how these UASes communicate is for future consideration. I guess TCP-IP, with multiple links and each set of servers which constitutes a UAS somehow behaving as one entity. UAS-Y trusts this SUMUC because it can authenticate UAS-Z's signature. It strips off the signature and adds its own, before passing the SUMUC down to the next level: RUAS-X. RUAS-X likewise has a copy of UAS-Y's public key and within a fraction of a second of U initiating the UMUC, the IMAB-DB is altered accordingly. Authority is delegated up the tree, because UAS-Y will only accept update commands if they are signed by one of its branch UASes, and for the particular address range that UAS has been authorised to control. User-U may have given their username and password etc. to Multihoming Monitoring Inc. so this company can monitor their multihoming links and change the mapping as soon as one link goes down. UAS-Z doesn't know or care who actually makes the change - as long as they can authenticate themselves for whatever IMIP or range of IMIPs they want to change the mapping of. There is no need for PKI in any of this, I think. I believe that a pure "pull" system such as draft-meyer-lisp-cons will be too slow to respond. draft-meyer-lisp-cons has "push" elements, but that is not pushing data towards ITRs, just information about where the authoritative CAR can be found. Since we are going to build a global system of ITRs, we might as well build a really Whittle Expires January 16, 2008 [Page 61] Internet-Draft Ivip Architecture July 2007 fast way of controlling them. Whittle Expires January 16, 2008 [Page 62] Internet-Draft Ivip Architecture July 2007 8. Replicators Please consider the following section, which depicts an unencrypted UDP-based system for collecting and fanning out the update streams to hundred of thousands of ITRDs and QSDs as being just an attempt at finding a solution to this major engineering problem. In the "Loose ends" section below, titled "Is fast, secure, Replication possible on the Internet?", I suggest that a secure, fast, Replicator system will require robust authentication of each packet's data, whether the packets are sent via TCP or UDP. I also discuss the difficulty of ensuring that the RUASes and first and second levels of Replicators can withstand DDoS attacks - so perhaps this part of the system would best be done with private network links. This section of the Ivip design not yet close to finding a good approach to the problem of pushing mapping information securely and rapidly to hundreds of thousands of ITRDs and QSDs. Hopefully this ambitious work will inspire others to contribute ideas or develop different and better plans. I believe it is vital to make the system a fast, secure, push system, rather than the likely very slow system based on querying and caching of LISP-CONS. Multiple companies, organisations etc. which have one or more IMABs in the Ivip system each have their own RUAS (Root Update Authorisation Server) system, as described in the previous section. RUAS-X in Figure 4 is the central store of the mapping database for at least one IMAB. RUAS-X could handle multiple separate IMABs but the following example only considers one IPv4 IMAB. There could be potentially large number of such RUAS systems, maybe hundreds or up to tens of thousands. Ideally there would be no more than a few dozen or a few hundred. Each RUAS periodically, say ever 10 minutes, generates a compressed IMAB-DBD "dump" file for each of its IMAB-DBs and makes it available for download by HTTP or FTP on multiple redundant servers. Each dump file has a timestamp and a sequence number which matches a message in the US-IMAB stream for this IMAB. Each RUAS continually generates a UDP stream of updates, also timestamped - the US-IMAB - for each of its IMABs. One of the messages in that stream may be "a dump file was generated now". Each RUAS system generates many (say 30) identical streams from different locations. Maybe it generates an update message packet (actually 30 identical such packets, each to a different Level 1 Replicator as discussed below) as soon as the incoming updates fill a UDP packet or Whittle Expires January 16, 2008 [Page 63] Internet-Draft Ivip Architecture July 2007 after one second elapses. If no SUMUCs come in for ten seconds, maybe it sends a time-stamped update message anyway, with no updates. Each message needs a 64 bit sequence number, and a 32 bit or similar identifier for which IMAB it is updating the mapping of. The distributed system of Replicators is configured to reliably distribute the contents of the update streams produced by RUAS-X - and likewise ever other RUAS which has one or more IMABs in the Ivip system. A newly booted ITRD (Ingress Tunnel Router with full database) or QSD (Query Server with full Database) performs the following procedure, for each of the IMABs in the Ivip system. The ITRD or QSD is receiving from the replicator system many individual UD-IMAB streams of updates, including the one the IMAB this example concerns, which is coming from RUAS-X. The ITRD/QSD monitors the US-IMAB stream, waiting for the flag which says a dump has been created. It then buffers all subsequent updates in the stream, waits until the IMAB-DBD dump file is available (which could take some seconds) and then starts to download the IMAB-DBD file. By the time it arrives, perhaps ten or twenty seconds of updates will have been buffered. The ITRD/QSC unpacks the dump file into an array in RAM which is 4 bytes for every IP address in the master-subnet. (This is an IPv4 example.) It then applies the buffered updates, bringing the data totally up-to-date with the last received update. Then it continues to apply all subsequent update messages as they arrive from the replicator system. At this point in time, the ITRD/QSD has an up-to-date copy of RUAS-X's IMAB-DB for this master prefix. A QSD can start answering queries about it. A ITRD can advertise this IMAB's BGP prefix. Soon it will receive packets addressed to this IMAB and can encapsulate them and forward them to its BGP peers, according to ordinary BGP- derived FIB rules for the TELOC destination addresses in the outer IP headers. These packets are soon forwarded to their ETRs. It would be important to have a close or perfect match between the address range of each IMAB and the BGP advertisement which the ITRs make for it. We want each ITR either advertising or not advertising a BGP prefix. We don't want excessive churn in the advertisements, such as advertising a small subnet (longer prefix) when one IMAB's mapping is complete and then withdrawing this to advertise a larger subnet when an adjacent IMAB's mapping data is complete. Whittle Expires January 16, 2008 [Page 64] Internet-Draft Ivip Architecture July 2007 On the other hand, if there was a massive IMAB, like a whole /8, it would be good for some ITRDs to advertise a subset of this, if the total ITR load was to be split among several ITRDs by making each one only handle a subset of Ivip-mapped address space. Also, by splitting something big like a /8 into four or 16 smaller IMABs, there is only a slight extra burden on the global BGP routing table, but the process of booting, downloading IMAB-DBD files, buffering a stream of US-IMAB updates, etc. can be done in smaller chunks, including especially smaller allocations of RAM inside the ITRD or QSD. This would also facilitate finer control of load balancing when a single ITRD couldn't handle the traffic in one location, and several ITRDs were used there with different IMABs for each one. Periodically the ITRD or QSC could repeat the process of downloading a IMAB-DBD dump file, buffering the US-IMAB stream (which it would also be applying to its working copy of the IMAB-DB) and then building a second array in RAM while the current one is being updated. When the process was complete, it would switch to using the second one for its queries or mapping functions, freeing the first area of memory. Theoretically, the two bodies of data at switchover time should be the same. This rolling complete refreshing of the local copy of each IMAB-DB would be done for Justin - Justin Case. Perhaps the ITRD or QSC uses non error-corrected RAM and a high- energy particle, such as from radioactive decay ripped through one of the chips. Even if it did use ECC RAM (much more expensive . . .) debris from an upper-atmosphere cosmic ray impact shower can rip through a CPU or other chip in the system and write false data there. (I have had two occasions where a perfectly stable Pentium III and Pentium IV system simply froze. I figure it was soft errors in the CPUs, probably from cosmic ray debris. Burying the server underground would help.) An alternative might be to periodically send, as part of the US-IMAB stream, some hash or CRC values for parts of the IMAB-DB as the RUAS currently sees it. This can be applied at each ITRD or QSC, and if there is a mismatch, this could trigger a complete reload as just described. Since some or many of the packets coming from the UAS systems to the Level 1 Replicators might be short, perhaps the Level 1 Replicators should have a way of combining shorter packets into longer ones, to reduce the total number which need to be sent through the rest of the Replicator system. This could be dodgy, since a single missing packet at that point could cause some difference in the streams leaving different Level 1 Replicators. If there were 30 Level 1 Replicators, RUAS-X might generate streams to every such Replicator. If RUAS-X consisted of three servers, each Whittle Expires January 16, 2008 [Page 65] Internet-Draft Ivip Architecture July 2007 could send 10 streams, or maybe more for some kind of redundancy. (Maybe two streams to every Level 1 replicator?) Level 2 replicators typically receive streams from two level 1 replicators for redundancy. There could be hundreds of systems like RUAS-X feeding UDP update streams to the Level 1 replicators. There are major scaling problems here, but by judicious design, I hope they can be overcome. This removes any central system for handling the data, with all the reliability, administrative and political dependencies that would probably entail. So an ITRD would boot and advertise various prefixes as it acquired the full mapping information for each IMAB. (See "Figure 4 Tree of UASes above one RUAS".) \ | / } Update information from end-users \ V / } directly or via child UAS systems. \ | / \|/ RUAS-X --------->-------------------[IMAB-DBD HTTP server 1] /|\ \ / | \ \----[IMAB-DBD HTTP server 2] / | \ \ / V \ \-- etc. | \ | | 30 UDP streams of identical realtime | updates to the 30 Level 1 Replicators | for each IMAB. | | \ \ | / / Each of the 30 Level 1 Replicators gets a \ \ V / / stream from every RUAS such as RUAS-X - \ \ | / / one stream for every IMAB. [Replicator-N] / / | \ \ / / V \ \ Each of 30 Level 1 Replicators sends 30 / | | | \ "full streams" (the sum of all the streams | it receives from systems like RUAS-X) to | Level 2 Replicators. \ \ / \ / \ / Level 2 Replicator gets two (ideally Whittle Expires January 16, 2008 [Page 66] Internet-Draft Ivip Architecture July 2007 [Replicator] identical) full streams from two of the / / | \ \ Level 1 Replicators. From this pair / / V \ \ of US-IMAB streams it constructs a / | | | \ single stream with (hopefully) no | missing packets. It sends that to / each of 30 Level 3 Replicators. / \ / \ / \ / Level 3 Replicator gets two (ideally [Replicator] identical) full streams from two of the / / | \ \ Level 2 replicators. It does the same / / V \ \ as described above - constructs a single / | | | \ complete stream and sends it to 30 / | | | \ ITRDs or QSDs. / | | | \ / | | | \ All these replicators are cheap | | diskless Linux/BSD servers with one or | | two gigabit Ethernet links. They would | | ideally be located on stub connections | | to transit routers, though the Level \ | | 3 (or 4 etc. if desired) might be at \ | | the border of, or inside, provider and \ | | ASN-end-user networks. \ | \ ITRD QSD ITRDs and QSDs ideally get two or more /|\ ideally identical full feeds of updates - / | \ so generally a missing packet from one / QSC \ is fine since the other stream has the / /|\ \ same packet. / / | / / | Both therefore have a real-time updated / QSC | copy all the IMAB-DB databases of all ITRC /| ITRC RUASes like UAS-X. Queries go up to the / | QSD or to a QSC which has a cached answer. / | Responses go back down to the requester ITFH ITRC which is either a QSC or one of the two "pull and be notified" types of caching ITRs: ITRC (ITR with Cached mapping) and ITFH (Ingress Tunnel Function in Host). Figure 5: Three levels of Replicator drive ITRDs and QSDs. The figures quoted below are wild guesses for the example. Exactly what the database sizes will be, the update rates, the data rates of updates etc. depends on many factors. I want a system which can ultimately scale to handling one or two billion IPv4 addresses, with Whittle Expires January 16, 2008 [Page 67] Internet-Draft Ivip Architecture July 2007 some of these having reasonably frequent updates due to mobility - with those mobile end-users paying per update to help finance this system. The average Level 2 Replicator gets two full streams from two widely separated Level 1 replicators. This means there can be 450 Level 2 Replicators, each of which sends out 30 streams to Level 3 Replicators. The pattern continues with 15 * 450 = 6750 Level 3 Replicators, each of which has 30 output streams, with most ITRDs and QSDs getting two streams, from widely separated Level 3 Replicators. With this push to ITRDs and QSDs, pull via queries to QSDs, and the QSDs notifying (carefully directed push) their child QSCs or ITRCs of an update - those which recently (10 minutes?) made a query whose answer covered one or more addresses affected by an update - the entire global ITRD, ITRC and ITFH system should get updates within a few seconds of the end-user making their change. There would be some agreed, centrally coordinated, system by which the Level 1 Replicators and the ITRDs and QSDs could recognise which RUAS systems were currently a part of the Ivip system, and the IP address ranges of each of their IMABs. That could be as simple as some organisation pointing to them with DNS from a domain of theirs. It could also be in the form of an agreement for the Replicators and ITRs to accept updates from a generally expanding list of UASes. This would involve central coordination, but it doesn't involve centralised flows or storage of data - just non-real-time configuration information which the operators of Replicators, ITRDs and QSDs would follow. For scaling purposes, some ITRDs may not cover the entire set of IMABs. Then two or more ITRDs at the same site could split the load among themselves. In that case, an ITRD which covers a subset should be able to request of its upstream (typically two, but maybe three or more) Replicators only to send those US-IMAB streams for the IMABs it is advertising. This means the packet format needs to be easy for the Replicators to recognise and classify, which would be more complex if some or all of the full stream packets contained data collected from separate packets sent by separate RUAS systems to the Level 1 replicators. In a RAM mailing list message I mentioned how a Replicator, ITRD or QSD could request of an upstream Replicator another copy of some packets it was missing. This sounds messy. Maybe the same result could be achieved via making the packets available in a structured manner, such as with file names with their sequence number in ASCII Whittle Expires January 16, 2008 [Page 68] Internet-Draft Ivip Architecture July 2007 as the name, available in the same web servers which are used to supply IMAB-DBD dump files. These files only need to be kept for 10 or 20 minutes at most, assuming there is a dump every 10 minutes or so. Whittle Expires January 16, 2008 [Page 69] Internet-Draft Ivip Architecture July 2007 9. Query Servers - QSD and QSC I was going to write about the Query Servers here, but there is a pretty complete description of them in the "Definition of Terms, Concepts and Functions" section. Whittle Expires January 16, 2008 [Page 70] Internet-Draft Ivip Architecture July 2007 10. Ingress Tunnel (ITR) strategies It should be quite attractive to make an ITRD from a mass-market motherboard with one or two gigabit Ethernet interfaces on board, a dual core CPU, Linux/BSD booting from a USB stick or Flash drive, and suitable software. Depending on the number of addresses which are Ivip-mapped, multiple gigabytes of RAM will be required, but this is now very cheap. Perhaps a 1Gbps ITRD can be made for USD$1000 or so. I discuss this in: http://www1.ietf.org/mail-archive/web/ram/current/msg01628.html and will probably write up a better version for a later version of this I-D. Servers are not for everyone, but they are cheaper than routers. Similar ordinary, low-cost, mass-market motherboards are also ideal for making Query Servers and Replicators. Here is an idea for combining numerous caching (pull) ITRCs with one or a few full-database (push) ITRDs, to achieve generally optimal paths whilst not delaying the first packets - as might otherwise be the case with a caching ITRC. However, if there is a QSD not far away, perhaps there will be no delay, but more a problem with the ITRC not wanting to do a query and an FIB entry for every unique IMAB destination address incoming packets might have. In that case, the backup ITRD(s) is to catch packets which the ITRCs stand back from, and in cricket metaphor, "let go to the (wicket) keeper". I also discuss having a caching ITRH function as part of host operating systems, to reduce the load on ITRs in the edge network or beyond. Let's say the operators of a provider or AS-end-user network are totally hip to Ivip. Ideally, they would install ITRDs all over their network. However this will be costly in terms of traffic flow if US-Complete updates to each ITRD - and in terms of the cost of an ITRD due to its need to decode the updates, store them, and write them to its FIB (which must have a huge capacity) as they arrive. Let's say the edge network is rather large, and the operators only want to have a single full database ITRD. They could rely on one in the core, but they want to ensure their users don't depend on anything outside their network which might be loaded by traffic from other not so swinging networks. The operators want to have a few hundred ITRCs, all over their network. These query a system of QSD and QSC query servers. Here are some ways caching ITRCs can forward packets in the Ivip- mapped address ranges (meaning they are addressed to one of the Whittle Expires January 16, 2008 [Page 71] Internet-Draft Ivip Architecture July 2007 IMABs) if it doesn't already have some mapping information. If the IRTC does have mapping for the address of the packet, its FIB will encapsulate the packet and send it on its way to the ETR, wherever this may be. Let's say these caching ITRCs have some mechanism for detecting the fact that a significant number of packets it receives have a destination address which is within one of the IMABs, but for which it has not yet asked for mapping information about. Ideally this would be a counter per IP address, but that would be extremely unwieldy or impossible, so perhaps there is some simpler system such as a sampling scheme which examines 1% of the incoming packet's destination addresses (when the router's CPU has nothing else to do). Then an algorithm searches for two packets in the last few minutes which are for the one IP address within an IMAB, but for which the ITRC has not yet asked for mapping information. Then the ITRC can ask for the mapping information for this address and update its FIB whenever it arrives. If the address is part of a larger subnet which has the same mapping, the response will say so and so the FIB response for that subnet will be in place for future packets. I think a distributed system for handling mapping requests such as that of draft-meyer-lisp-cons could take several seconds to get a reply back to this ITR. Even with the snappier system of QSDs and QSCs we may not want to delay the packets for which the ITRC doesn't yet have mapping information for. Nor we do want to insist the ITRC encapsulates every packet. The task then is to organise the ITRs so that every packet gets handled by an ITR, with the "unmatched" ones (those the ITRC - the first ITR the packet is handled by - doesn't have mapping for) being handled by some probably longer path than is ideal, while the bulk of the traffic has no path delays, because the local ITRCs have already got mapping data for the destination addresses of these "bulk" packets. The diagrams below use ITRCs, ITRDs, IRs (ordinary internal router) and BR (border router - connects to the global BGP system). I assume the ITRD and ITRC functions are performed by routers which also do all the other things routers are expected to do in such a location. This plan (Figure 5) is all inside the provider or AS-end-user network. Whittle Expires January 16, 2008 [Page 72] Internet-Draft Ivip Architecture July 2007 ................ . AS network . . ITRD . } / \ . / } H--\ / \ . / } \ / \ ./ } H----ITRC0--------BR--- } / | \ / .\ } H--/ | \ / . \ } | \ / . \ } | IR . } | / \ . / } H--\ | / \ . / } \ | / \ ./ } BGP transit & H----ITRC1--------BR--- } border routers / | \ / .\ } of the Internet H--/ | \ / . \ } | \ / . \ } | IR . } | / \ . / } H--\ | / \ . / } \ | / \ ./ } H----ITRC2--------BR--- } / | \ / .\ } H--/ | \ / . \ } | \ / . \ } | IR . } | / \ . / } H--\ | / \ . / } \ | / \ ./ } H----ITRC3--------BR--- } / .\ } H--/ . \ } . \ } . } .................. Figure 6: Internal ITRD for unmatched packets. Plan A for Figure 6 is that every caching ITRC has its FIB set up so that packets addressed to every IP address in the IMABs will either be encapsulated and tunneled to the ETR near the host with the mapped address, or will be encapsulated and tunneled to a single IP address of the full database ITRD (at the top of the diagram). Whittle Expires January 16, 2008 [Page 73] Internet-Draft Ivip Architecture July 2007 This way, all packets addressed to an address within an IMAB, but which are "unmatched" (and therefore not tunneled to an ETR) by the first ITR they reach will probably go on a longer than optimal path via ITRD, before being encapsulated and tunneled to the proper ETR. ITRD needs to be expecting encapsulated packets arriving on one of its IP addresses, but it would be easy for it to pop off the IP-in-IP header and then put the packets through its FIB which will quickly encapsulate each one addressed to an IMIP, tunneling it to its ETR. Ideally, these initially "unmatched" packets will be a small proportion of the total outgoing traffic addressed to IMIPs - and the main traffic will flow through the internal routers along optimal paths and then out via the nearest border router. In Figure 6, the first internal routers are all ITRCs but ITRCs don't need to be the closest router to the sending hosts - as I depict in Figure 7. Plan B is for the Figure 6's network's routing system to correctly handle each ITRC spitting out a packet it knows is addressed to an IMIP but which it doesn't yet have mapping data for - and the internal routing system forwarding that packet to the one internal ITRD. The internal routing system needs to ensure these "unmatched" packets are always forwarded towards ITRD. It would be acceptable or desirable if they pass through one or more further ITRCs on their way to ITRD. If a packet did reach an ITRC which had mapping information for it, then that would be fine, because it would be tunneled from there. Maybe it would work fine if each IRTC accepts packets addressed to the IMABs and forwards those which were not mapped and encapsulated by its FIB on a link which leads them closer to ITRD - which advertises these IMABs. For this to work, it would be vital for none of the border routers to announce paths for these IMABs - unless the border router sent the packets towards the IRTD rather than out to the Internet. The border routers would advertise (inject) routes in the internal routing system for all the BGP announced prefixes other than the Ivip IMABs and those prefixes for which this local network is the destination. So any packet sent from inside the network would eventually find its way to an ITR. Figure 6 could be applied to LISP. Fig 7 can't be, because LISP doesn't involve EIDs being part of prefixes which are advertised in BGP. Whittle Expires January 16, 2008 [Page 74] Internet-Draft Ivip Architecture July 2007 Alternatively, in Figure 7, the network has no full database ITRD and would rely on the closest ITR(s) (presumably an ITRD) in the BGP system to handle packets which its local ITRCs let pass without encapsulation. This would be cheaper for the network, and would not require a constant inflow of update data for an ITRD. However, ITRCs work best when there is a local QSD, so any substantial network probably needs to bring in one or two full US-IMAB feeds to keep at least one QSD fully updated. Whittle Expires January 16, 2008 [Page 75] Internet-Draft Ivip Architecture July 2007 ................ . AS network . . . . / H--\ . / / \ ./ / H----ITRC0--------BR-----TR / | \ / .\ \ H--/ | \ / . \ \ | \ / . \ \ | IR . TR-----ITRD--- | / \ . / H--\ | / \ . / \ | / \ ./ BGP transit & H----ITRC1--------BR--- border routers / | \ / .\ of the Internet H--/ | \ / . \ | \ / . \ | ITRC2 . TR---- | / \ . / H--\ | / \ . / / \ | / \ ./ / H-----IR BR-----ITRD--- / | \ / .\ / H--/ | \ / . \ / | \ / . \ / | ITRC3 . TR | / \ . / H--\ | / \ . / \ | / \ ./ H-----IR BR--- / .\ H--/ . \ . \ . .................. Figure 7: ITRCs but no ITRD in the network. This shows how all paths taken by packets generated by hosts will need to pass through at least one ITRC before exiting a border router. A feature of both Figure 6 and 7 is that there are a large number of ITRCs. The difference is the location of the IRTD is which packets Whittle Expires January 16, 2008 [Page 76] Internet-Draft Ivip Architecture July 2007 go to which are not mapped by the ITRCs, because the ITRC hasn't yet made a query or received a response yet. A smaller network which doesn't want to have either an ITRD (expensive, because of its large RAM and massive FIB capabilities, unless implemented in software on a server) or a QSD receiving the full database updates, will need to rely on some external system to answer the queries of its ITRCs (and any ITFHs in hosts). Ideally, there would be a way that every ITRC (and ITFH) could automatically discover: 1 - Two or more addresses (of QSDs or QSCs) to which mapping queries should be sent. 2 - How to handle packets for which it has not yet cached any mapping information. For instance, what IP address to tunnel them to so they reach an ITRD, or some other way of handling them. The ITRC would need to be able to discover how these change after boot time too, so perhaps the information could come with a caching time. Whittle Expires January 16, 2008 [Page 77] Internet-Draft Ivip Architecture July 2007 11. Egress Tunnel (ETR) strategies [To do. I think most people understand the simple role of an ETR. They can generally be placed at border routers, internal routers, Provider Edge routers etc. However please see the Loose ends section for thoughts on ETRs filtering packets. ] Whittle Expires January 16, 2008 [Page 78] Internet-Draft Ivip Architecture July 2007 12. Mobile-IP with TTRs [To do. I will probably write some more here in the future. See the RAM list discussions around 17 to 18 June for discussion of TTRs and mobile-IP.] Whittle Expires January 16, 2008 [Page 79] Internet-Draft Ivip Architecture July 2007 13. IPv6 and longer term strategies [To do.] Whittle Expires January 16, 2008 [Page 80] Internet-Draft Ivip Architecture July 2007 14. Loose ends I plan to refine the following material and integrate it properly with a future version of this I-D. 14.1. ETRs checking src & dest addresses 14.1.1. Short version A short version of this section, which is based on a RAM-list message of 2007 July 14is: The easiest and most robust way to enable a network to enforce on its ETRs the rule that encapsulated packets from ITRs outside the network must not contain inner packets with a source address (SA) which matches one of the network's own prefixes is (along with some other requirements) to break with convention and require ITRs to tunnel the packet with the outer SA = the inner SA. That is, the packet sent by the ITR to the ETR has the same source address as the sending host. This means that no-one or nothing in the destination network (or after the ITR), including the ETR itself, can find which ITR tunneled the packet - unless the encapsulation method carries extra data which includes the ITR's address, which is not the case with Ivip or with current LISP plans. The reason is that it is probably very difficult or perhaps impossible to make all ETRs inside the network filter the decapsulated packets to drop those which arrived from an external ITR and have the inner SA matching a local prefix (this would be a packet with a spoofed source address) - so it is better to achieve the same desired protection by: 1 - Requiring all ITRs to set the outer SA = inner SA. 2 - Let the border routers continue to drop packets arriving from outside the network with SA matching any one of the network's local prefixes. 3 - Require all ETRs to drop all decapsulated packets with an SA (inner SA) which is not identical to the SA of the outer header (outer SA). Having the outer SA = inner SA also has the benefit that traceroute functions normally. The current LISP 1 and 1.5 definition has the SA = ETR's address, which means the sending host gets no traceroute results for any router between the ITR and the ETR - and perhaps not from the ETR either. Whittle Expires January 16, 2008 [Page 81] Internet-Draft Ivip Architecture July 2007 I also explore the idea for LISP or Ivip of there being a service so the current and recent history of the database (multiple databases perhaps, as is the case with Ivip) can be queried to see when any mapping - of any EID (LISP) or DID (Ivip) address - has been to a particular (RLOC) IP address. This would be vital for debugging problems with end-users setting the mapping incorrectly, for determining why streams of encapsulated packets arrived in some unwelcome fashion etc. It would be impossible to prevent this sort of analysis of the mapping data. 14.1.2. ITR tunneled packet with source address of sending host I tend to agree with what Iljitsch van Beijnum wrote: "I don't think it's a good idea to have node Y send packets where the source address is X, both because this claims that the sender is different from his/her actual identity and because return traffic, such as ICMP messages, will then end up at (arguably) the wrong node. "Knowing the address of the encapsulating TR is also useful if the decapsulating TR ever wants to get in touch with it." However, I think there are some strong arguments for making the outer SA = the inner SA, which is contrary to the conventional sensible notion that any packet created by node Y should have its SA = Node Y's address. The general arrangement is: HA-----ITR~~~~~~~~~~~~~~~~~~ETR------HB Figure 8: Basic ITR-ETR tunnel. where HB has the LISP/Ivip-mapped address (maybe HA has such an address too, but that doesn't matter). The current LISP-01 I-D only describes LISP 1 and 1.5, so there's no way of knowing whether with LISP 3.x the outer SA (Source Address of the UDP packet which encapsulates data packets when the ITR tunnels a data packet to an ETR) will be the original packet's SA (HA's IP address) or the ITR's address. In LISP 1 and 1.5 the outer SA is definitely the ITR's address (see Definition of terms: ITR). In LISP 3.x I am not sure to what degree the ITR sends messages to the ETR, or whether the ETR sends anything back to the ITR. Whittle Expires January 16, 2008 [Page 82] Internet-Draft Ivip Architecture July 2007 Dino's recent message (RAM list msg01703) indicates that the 3.x approaches using CONS or NERD do not involve the ETR sending a Map- Reply message to the ITR and that perhaps with APT (http://tools.ietf.org/html/draft-jen-apt-00) there would be such messages. So I am not sure whether some or all 3.x variants of LISP involve no messages at all from the ETR to the ITR. If there is no requirement for such messages or information exchange, then maybe the ETR doesn't need to get any information from the ITR in data packets, or presumably by any other means. In that case, the ETR wouldn't need to know the address of the one or more ITRs which are tunneling packets to it. In that case, it may be possible for LISP to adopt the same "outer SA = inner SA" approach I currently favor for Ivip. As far as I can tell at present, this would have the same benefits for LISP as for Ivip - much greater ease of a network protecting itself from a particular form of attack which I will call "internal source address spoofing", and the retention of the sending host's ability to fully traceroute the path taken by the packets it sends. If LISP or Ivip uses the ITR's address for the outer SA then HA will find that traceroute does not produce any results for any routers between the ITR and ETR. Whether or not it would produce a response from the ETR depends on whether the ETR treats the decapsulated packet just like any newly arrived packet or not. I regard this as a substantial argument against using the original packet's SA for the outer SA. I am assuming in all this that both LISP and Ivip ITRs and ETRs follow the principle that the ITR copies the TTL (Time to Live) from the packet it is encapsulating to the header of the new packet which contains it (the IP-UDP IP header for LISP and the IP-in-IP IP header for Ivip). Similarly, the ETR takes the TTL from the outer IP header and copies it to the TTL field of the IP header of the decapsulated packet. (These operations are specified for Ivip, but are not actually specified in LISP, except for similar concepts with recursive or re-encapsulating tunneling.) My previous message contained some detailed thoughts on why requiring ITRs to make the outer SA = inner SA makes it much easier for the network in which the ETR is located to ensure that packets arriving from outside the network and being decapsulated by its ETRs do not produce decapsulated packets with SAs from local prefixes. Here I develop this line of argument further - in support for the Whittle Expires January 16, 2008 [Page 83] Internet-Draft Ivip Architecture July 2007 initially unpalatable notion of having the ITR tunnel the packet with outer SA = the packet's original SA (inner SA). I then examine what benefits and difficulties would result from following convention and having the ITR use one of its own addresses as the outer SA. Ivip has no communication from ITR to ETR or from ETR to ITR. There is no header, other than the outer header of IP-in-IP encapsulation. Ivip could be redefined to use the ITR's address as the outer SA, but at present I think it is best not to. Ivip could be redefined to use UDP encapsulation as is currently defined for LISP 1 and 1.5 - then various other items such as the ITR's address could be included in the encapsulated packet - but I am trying to keep Ivip simple. If the problem of not being able to find an errant ITR was considered great enough, then Ivip could use UDP encapsulation and include the ITR's IP address and maybe other items of information in the body of the UDP packet, before the raw data packet itself. In this discussion, when I refer to "network" I mean any Autonomous System network (provider or for an end-user) in which ETRs are deployed. I don't refer in this discussion to the edge networks of a single-homed or multihomed end-user which relies upon LISP-mapped or Ivip-mapped addresses. Those edge networks have different requirements for preventing "internal source address spoofing", which I discuss in Note 1 at the end. Here is a description of the particular security problem I am concerned about: ............. .................... N1 . . N2 . . . . H1---ETR1~~~BR1~~~~~~~~TR~~~~~~BR1~~~~~ETR1---MN1 . / . \ \ ............. / . \---H2 \--MN2 / . \ / . \-H3 / . AT1~~~~~~~~~~~~~~~~~~~TR .................... Figure 9: Diagram for explaining SA filtering problem. N1 has a host H1 which is sending a packet to Mapped Node MN1. Whether MN1 is a host with one or multiple IP addresses, or a link to a router at a multihomed end-user's site, doesn't matter. Nor does it matter whether H1's address is Ivip-mapped or not. Whittle Expires January 16, 2008 [Page 84] Internet-Draft Ivip Architecture July 2007 Also, it probably doesn't matter whether the ETR gets the decapsulated packet to MN1 via a direct link or via relying on N2's internal routing system to forward the packets to MN1. This problem only concerns packets flowing left to right in this diagram. How MN1 gets packets to H1 is a separate matter and is not affected by this filtering and security problem. I assume N2 has BR2 set up to drop any packet which arrives from the outside world where the SA matches any of the one or more prefixes N2 advertises to its BGP peers. If N2 doesn't bother to do this, there is no point in fussing over how the ETRs should filter packets to achieve the same purpose. I assume (for reasons discussed in my first message in this thread and in Note 2 below) that all ETRs will drop any packet which has a DA for any address other than the set of hosts/routers which it knows it can deliver decapsulated packets to, with LISP/Ivip-mapped addresses. This means that if the ETR decapsulates a packet and finds its DA is for H2, then it drops the packet. This also means that if it decapsulates a packet and finds its DA is for some other LISP/Ivip-mapped address which it can't deliver packets to (either directly or via support from the local routing system) then this packet will be dropped too. The purpose of this "internal source address spoofing" protection is to stop MN1 receiving a packet with SA matching any of N2's BGP- advertised prefixes, for instance the address of H2. Without this protection, an attacker AT1 can easily create an encapsulated packet, with outer DA = ETR1, inner DA = MN1 and inner SA = H2. It is not the purpose of "internal source address spoofing" to stop ETR1 from decapsulating and forwarding an inner packet to MN1 when its SA is any LISP/Ivip-mapped address, including some address of another host/node MN2, which happens to be "within" N2 or potentially (multihomed link which may not be working) connected to N2. An attacker can already do this by sending a packet with a spoofed SA to any ITR, or by generating its own encapsulated packet. The only purpose is to prevent attackers spoofing source addresses of the non-LISP/Ivip-mapped addresses within N2. Attackers are assumed to be outside N2. (It's all over if an attacker is inside N2.) I start with some further assumptions: A1 - Whatever new architecture is adopted - LISP, Ivip etc. - the new Whittle Expires January 16, 2008 [Page 85] Internet-Draft Ivip Architecture July 2007 architecture must not force a lower level of security than currently exists (RRG Design Goals 3.9) and should not make it significantly more difficult, costly or error-prone to ensure the same levels of security are maintained. A2 - Therefore, for networks which protect against "internal source address spoofing" the new architecture must make it easy to maintain this protection for packets being decapsulated by ETRs. A3 - That the ETR needs to decapsulate packets which were encapsulated by ITRs in this same network. I argue in Note 2 at the end of this message why this is a reasonable requirement. A4 - We can't expect the border router to perform deep packet inspection on every incoming packet - for instance to find any packet which looks like it might be intended to be decapsulated by an ETR, and to then decapsulate it, and filter it according to the SA of its inner packet. So we can assume that BR1 drops any packet arriving from outside if the SA matches, for instance, H3's (ordinary, BGP reachable) address. 14.1.2.1. Preventing SA spoofing when outer SA = inner SA In the above framework, it becomes very easy to protect against "internal source address spoofing" if all ITRs make their outer SA equal to the source address of the sending host (the inner SA). All that needs to be added is that ETRs drop any packet whose inner SA does not match the outer SA. This means that there are two classes of spoofed SA packets being filtered: 1 - Those sent as ordinary packets, from outside. This is handled by the border router's existing filters. 2 - Those in inner packets with an outer header with DA = an ETR. This is performed in two stages - by the border router and then by the ETR dropping packets where inner and outer SA do not match. This seems to be a bullet-proof arrangement, and works fine with encapsulated packets created by ITRs in the network. These are assumed not to arise from attackers, since attackers are defined to be outside the network. Whittle Expires January 16, 2008 [Page 86] Internet-Draft Ivip Architecture July 2007 14.1.2.2. Preventing SA spoofing when outer SA = ITR's address In this arrangement, the ETR can't drop packets with inner SA != outer SA. So there is no way the ETR can use this simple technique to extend the border router's filtering to the inner SA. The only alternative is for the ETR to drop the inner packet if both these conditions are met: 1 - If the inner SA matches any of the network's BGP advertised prefixes. and 2 - The outer SA does not match any of these prefixes. These two conditions would be met if the packet arrived in encapsulated form from outside the network while pretending to be sent from inside the network. Assuming the ETR needs to accept encapsulated packets from ITRs inside the network (Note 2 below) then both these tests are required. However, this is likely to be prohibitively difficult for an ETR to perform. Firstly, the FIB hardware of a proper router isn't necessarily able to perform these gymnastics. (The decapsulation and "drop if inner SA != outer SA is still tricky, but does not involve any knowledge of the potentially numerous local prefixes.) Secondly, the list of prefixes this network advertises could be very large indeed. This would make it perhaps impossible for FIBs to cope with such a list. Thirdly, we want to be able to do ETR functions in servers, not just hardware-FIB "routers", including in the destination host (if it has a BGP-reachable care-of address). There is no way with ordinary software of applying a huge list of rules to the inner SA to decide if the packet should be dropped, and then applying the same set of rules to the outer SA as well. Finally, even if routers and software ETRs could do this, there are serious problems with the network's control system finding all these ETR functions ensuring they comply with these rules. With my idea towards the end of Note 2 below - that ETRs should know how to deliver packets directly to the destination host, rather than use the internal routing system (which is compatible with LISP-01 page 11 point 7: "attached destination host") - there is no need for Whittle Expires January 16, 2008 [Page 87] Internet-Draft Ivip Architecture July 2007 a general system to control all ETRs at once (as would be required if every ETR was to decapsulate packets for any host in the network with a LISP/Ivip-mapped address, relying on the local routing system to get the packets there). 14.1.2.3. What is lost by making outer SA == inner SA? By defying convention and having ITRs send tunneled packets without their own IP address in the SA of the outer header, we lose certain things: 1 - We can't find directly which ITR tunneled the packet, once it left the ITR. 2 - Therefore, we can't get a message to that ITR, or to whoever runs it. It is an uncomfortable thing to propose such an arrangement, but here I explore exactly what would be lost. As with all this stuff, I could be mistaken and be missing many important things - so please let me know what I have missed. Ivip requires no communication from an ETR to an ITR, or vice-versa, so nothing is lost with this arrangement. Some variants of LISP do require this communication, so if this "outer SA == inner SA" was adopted for LISP (for instance because it seems to be the only practical or reasonable way of allowing a destination network to maintain current security limits) then the LISP header would need to contain the ITR's address. I think the remaining reasons for wanting to know the ITR's address are to do with coping with unwelcome packets. Here is a possibly incomplete list of the scenarios which could lead to the perception of unwelcome packets arriving from an ITR. In the case of packets from an attacker, I will assume that the outer SA (Ivip) or any "ITR address" in a LISP header contains a bogus address, which may be part of the attack by encouraging victim V1 (whoever's host gets the unwelcome packets) to send messages to victim V2 who probably runs an ITR, or to V3 which is whoever runs the LISP/Ivip database system, with possible negative consequences for further victims who might use LISP/Ivip-mapped addresses: a - The packets are sent to an ETR but have inner DAs for LISP/ Ivip-mapped addresses which the ETR is not configured to deliver to a destination host. This could include an address which is a BGP Whittle Expires January 16, 2008 [Page 88] Internet-Draft Ivip Architecture July 2007 reachable address, or some other LISP/Ivip-mapped address other than the small subset of those addresses for which the current ETR can deliver packets. b - The packets are sent to an ETR and have inner DAs which is one the ETR is configured to deliver packets for - however the flow of packets is excessive in volume, is regarded by that host as irrelevant or unwelcome etc. c - The packets are sent to an address which is not an ETR - it may be of a host, an ordinary router or to some address which has no destination node. In all these cases, if the encapsulated packets come direct from an attacker (that is they have not been generated by a proper ITR) then there is no point in looking at the outer DA. That will probably not lead to any clues about the location of the attacker. Any attempt to complain about an attack from that outer DA address will probably cause V1 to drag other victims into the attacker's ploy. If the packets do come from one or more genuine ITRs, then I think one of the following must be true: e - The one or more ITRs are functioning properly, with fully updated databases (an ITRD or an ITRC or ITFH with access to properly updated query servers, and getting notifications from those query servers in the event of a database change for some mapping information they cached). f - The one or more ITRs are not functioning correctly. Maybe their FIB is broken and doesn't reflect their RIB (copy of database or cached mapping information). Maybe they are not properly updated. For this reason, ITRDs which for some reason are not getting updates or which have detected some corruption should probably stop forwarding packets. This is a tricky business I will write more about in the future. I also want to write about methods of detecting errant ITRs from the sender's end, perhaps by some method such as sending commands to ITRs, or making an ITR respond to a traceroute in some way which indicates the address is going to tunnel the packet to. In the case of 'e' above, there is no point in knowing which one or more ITR is tunneling the packets, because there is nothing wrong with these or any other ITR - and similar problems can be expected with packets being tunneled by any of the world's hundreds of thousands of ITRs (or millions, with ITFHs widely deployed). The problem is either with the contents of the mapping database(s) or Whittle Expires January 16, 2008 [Page 89] Internet-Draft Ivip Architecture July 2007 with the behavior of sending hosts. In Note 3 I discuss how to resolve the first problem. The second problem has nothing to do with the LISP or Ivip system, although perhaps changing the mapping to "drop" or pointing it to some other ETR could resolve some problems. In the case of 'f', it would be good to find the ITR which is sending the packets which are considered unfriendly. Ivip as I am currently proposing it would prevent anyone from finding out which one or more broken or out-of-synch ITRs are causing the trouble. This problem is similar to having some router out in the Net malfunction, forwarding packets to some place they don't belong. Generally, the packets wouldn't get to an ETR or a host, because not even a malfunction in a local router could do this (unless the victim host was on a single link, rather than a LAN, from the errant router. The broken ITR is not generating the packet, but it is tunneling a packet which should be tunneled to some ETR which would be happy about it to some other address where the packets are not welcome. If the outer SA was the ITR's address, then victim V1 could potentially find who runs this ITR and complain - but it could be tricky finding out who to complain to, unless there was some global register of ITRs, which won't include the hundreds of millions of ITFHs on host computers if Ivip is widely implemented in the future. There is something lost by not being able to identify the genuine ITR which mistakenly tunneled the packets. However, if the ITR was functioning properly, there is no point in finding out its address or who owns it - the problem is not with the ITR but can only be resolved by either changing the mapping information or by altering the behaviour of sending hosts - which is no different from the situation with unwelcome packets today without LISP or Ivip. Even if it was possible to identify the genuine ITR which tunneled the unwelcome packets, why should the operator of that ITR shut it down just because V1 complains about it? Who is to say that the complaint is not the work of an attacker? There does need to be is a way of V1 finding out the recent history of its IP address being involved in the mapping database. I write more about this in Note 3 below. 14.1.2.4. Note 1 - Edge networks and internal source address spoofing A single-homed or multihomed "edge network" which uses purely LISP- mapped or Ivip-mapped addresses has a very different set of conditions in which it might protect against "internal source address Whittle Expires January 16, 2008 [Page 90] Internet-Draft Ivip Architecture July 2007 spoofing". Firstly, it has no BGP connections to the Internet. It only receives incoming packets via one or more links to provider networks. Here, I assume that it relies upon ETRs in those provider networks (see Figure 3 in the Ivip I-D). These ETRs feed the inner packets (that is the packets with DA = some LISP/Ivip-mapped address, where this is one of the edge network's address range, such as the address of IH9 in Figure 3) directly to the edge-network's router, Ethernet switch or whatever. However, if the edge network gets from each of its one or more providers one or a few of the provider's PA addresses, on which it runs its own ETRs, and then routes only the inner packets produced by these ETRs to the rest of the edge network, then similar principles apply. The edge network does not contain any ETRs, because ETRs do not reside on addresses which are LISP/Ivip-mapped. Therefore, the edge network doesn't need to worry about packets emerging from ETRs within its own network. I can think of two scenarios which require different approaches to protecting against an attacker (implicitly any host outside this edge network) from sending packets with spoofed local addresses - meaning addresses within the range of LISP-Ivip-mapped addresses of this edge network. 14.1.2.4.1. No ITRs in edge network The edge network has no ITRs - including any ITFHs - which might be tricky to establish if ITFHs became a common part of operating systems . . . but then an ITFH will always send queries to some Query Server, so if there was a way of preventing this from succeeding at the network's router, then this would prevent any ITFH function working. In this case, the edge network relies on its internal routing system to forward packets from its hosts to its hosts. In this case, raw packets with DA matching a LISP or Ivip-mapped address range will be forwarded directly to the correct host if that is their DA, or to the router and out one of the links to the provider network if they don't match one of the edge network's addresses. There they will soon be encapsulated by one or a series of ITRCs, ITRD etc. Whittle Expires January 16, 2008 [Page 91] Internet-Draft Ivip Architecture July 2007 Protection against packets from the outside with spoofed local SAs must be done by the edge network's router - it must drop any incoming packet with a SA which matches one of the edge network's LISP/ Ivip-mapped addresses. 14.1.2.4.2. Edge network contains ITRs The edge network has its own ITRD, ITRC and/or ITFH functions - and these may encapsulate packets which were addressed to one of the edge network's LISP/Ivip-mapped addresses. This is probably a bad idea, since it makes it much more difficult, or impossible, to protect against "internal source address spoofing". There are various reasons I won't explore here why this is a messy arrangement to be avoided, but for instance how can the edge network's router know whether decapsulated packets from an ETR originated in its own network or were decapsulated from a packet of an attacker? 14.1.2.5. Note 2 - ETRs must handle packets from ITRs in the same network (See assumption A3 above.) In a large network, for scaling purposes, there needs to be lots of ITRs. We can't make all the ITR functionality be in the border routers. Also, at least with Ivip, it would be advantageous to allow and encourage caching ITR functions in sending hosts (both those on ordinary BGP reachable IP addresses and those with Ivip-mapped addresses). This is an ITFH function. It needs to send queries to QSD or QSC query servers, and it doesn't necessarily have to tunnel every packet - because those it doesn't tunnel will (in a well designed network) be forwarded (or perhaps explicitly tunneled to) one or more ITRCs or an ITRD which can encapsulate it. The ITFH greatly reduces the load on ITRCs and ITRDs, without any cost to anyone and with the path taken by the packets to the ETR being entirely optimal, since they never need to go via an ITRC or ITRD. (ITRCs and ITFHs may not tunnel all packets, so they need a backup of some other ITRCs or ideally an ITRD to handle these.) I assume ITFHs can't easily be detected (except by detecting or blocking their requests to query servers - maybe the autodiscovery system returns a message "ITFHs not supported here", which may be as simple as an empty list of Query Servers) and that they can't be highly managed, at least in terms of rapid changes to their behavior. For instance there could be thousands or hundred of thousands of Whittle Expires January 16, 2008 [Page 92] Internet-Draft Ivip Architecture July 2007 ITFHs in a large network, in many hosts or in DSL modem NAT functions (I call this "function in host" because the "router" is just a CPU and software, with no FIB hardware etc.). Nonetheless, I expect there to be autodiscovery arrangements for an ITRC or ITFH to find where to send queries to and perhaps where to tunnel packets it doesn't encapsulate for some reason, but which should be - for instance because it doesn't have mapping information for that packet's DA. It is probably much more robust and easier to plaster ITRs all over the network and to encourage the adoption of ITFHs in many hosts and DSL etc. modem-routers - than to try to ban ITFHs and centralise all ITRs in a few places where their activities can be carefully controlled. If all the ITRs in a network could be carefully controlled, then it would be possible to ensure that the local routing system took precedence over encapsulation, so that if a host H3 wanted to send a packet to MN1, then the packet would be sent via the local routing system to MN1, and not encapsulated and tunneled to ETR1. However this is unlikely to be practical, because it could be difficult or impossible to immediately change the internal routing system and all local ITR behaviour to reflect the fact that MN1's one or more LISP/Ivip-mapped addresses have just become reachable inside this network, or have just become unreachable. If the local routing system forwards packets to MN1 - this must be stopped the moment MN1 is no longer reachable - such as if MN1 has moved and changed its mapping to some other ETR or another network's ETR. Even if the local routing system forwards packets to MN1, then it would still be best if any packet which is encapsulated by an ITR anywhere will be delivered properly. Generally, I think there should be no exception to the rule "if the raw packet finds its way to an ITR which encapsulates it according to the current state of the database(s), then it definitely will be delivered to the currently selected ETR". A local routing system which attempts to get the packet to the destination host might be acceptable, provided it changed its behavior very rapidly to reflect the contents of the mapping database. Overall, I think it will be best if: 1 - One or more ETRs have some explicit tunnel system for getting decapsulated packets to the destination host, rather than relying on the local routing system. 2 - Hosts inside the same network should rely on ITRs (and perhaps Whittle Expires January 16, 2008 [Page 93] Internet-Draft Ivip Architecture July 2007 their own ITFH function) to deliver the packets, in accordance with the current state of the database(s) - and not have this interfered with by the local routing system trying to take packets directly to the destination host. (See the discussion below in "TTRs and Mobility" about an equivalent situation where a TTR has a route to a mobile node, and might forward a packet there directly. It would be best if it allowed an ITR, and therefore the mapping database, to decide where the packet should go to, since the mobile node may want packets sent over some other link.) 3 - We don't want to concentrate all the ITR functions in border routers. To maintain optimal path lengths within the network, we want the packets to encounter an ITR ASAP, including in the sending host's own ITFH function. 4 - Therefore, we need ITRs all over the network, and these ITRs will be encapsulating packets to be sent to ETRs in the network, which will directly get the decapsulated packets to the proper destination. All this only applies to Autonomous System networks - not the end- user networks which consist only of LISP/Ivip-mapped addresses. 14.1.2.6. Note 3 - A search-mapping service to debug LISP/Ivip mapping This proposal would work with Ivip as currently defined - and perhaps with some forms of LISP. I think there really needs to be some kind of service like this, for LISP or Ivip. There are some security and privacy implications of such a service being open to anyone to query about any IP address, but there is absolutely no way of preventing such a service being created, so there is no point in trying to prohibit misuse of such a service. This service could be implemented as part of the main Ivip system, but there is no need for this to be done, since one or more separate companies, organisations or individuals could set up their own system to do the same job. The service has a server which gets a full feed of the update stream - "US-Complete" in the current Ivip I-D. It can also download the IMAB-DBD periodic dumps of the complete set of mapping databases. It is absolutely required that this information be freely available to anyone, so anyone can set up their own ITRD and QSD-QSC-ITRC-ITFH systems. No practical measures could prevent anyone from gaining access to this information. The service analyses the data and stores a copy of its analysis in some database, covering the last few weeks of activity. Then, the Whittle Expires January 16, 2008 [Page 94] Internet-Draft Ivip Architecture July 2007 service is able to definitively answer queries such as: 1 - What is the history of RLOC (LISP) or TELOC (Ivip) mapping of this particular EID (LISP) or DID (Ivip) address? 2 - What is the history of the RLOC/TELOC mapping of any EID/DID over the past minutes, weeks etc. which involved the ITRs being told to tunnel packets to this particular IP address? This means that if the unwelcome packets resulted from something in the mapping system, now or in the past, that V1 could find out for sure which one or more EID/DID addresses was involved. Then, it could quickly establish which RUAS (Ivip Root Update Authorisation System) the update was made through. If that RUAS considered the complaint to be genuine, it could try to resolve it with the end-user who it authorises (directly or via one or more branch and leave UASes) to control the mapping of this IP address. Spies, dodgy detectives and the security authorities will be watching the changes in the database, so anyone with something to hide (which includes ordinary folk who are targeted by nosey authorities and those with malicious intent) need to consider how changes to their mapping might leak information to others. BTW, future generations will want to know why LISP, Ivip or whatever was foisted on the Internet - because it is certain to add to the difficulty of understanding and managing the system, with its own set of gotchas and security problems. They will hopefully realise that BGP couldn't be asked to do much more and that IPv6 wasn't ready for mass adoption, and doesn't solve the major problems anyway. 14.1.2.7. Note 4 - Finding errant ITRs The problem of finding an errant ITR is tricky or impossible for the recipient of the tunneled packets if the outer packet header and other information doesn't identify the ITR. This can be resolved, while keeping the outer SA = inner SA to help with filtering, by adopting UDP encapsulation, with the first part of the packet's data including some field of data including the ITR's address, followed by the raw data packet. The sender of the original packets is in a better position to find which ITR is handling them by doing a traceroute. (Unless the ITR is an ITFH in the sending host.) A full traceroute should also show the ETR the packets go to and the final destination host. This debugging situation is made possibly messier by a number of things. Firstly, the routing of packets to ITRCs and ITRDs is not necessarily Whittle Expires January 16, 2008 [Page 95] Internet-Draft Ivip Architecture July 2007 stable. Secondly, the packets may at first have passed through one or more ITRCs (and perhaps the host's own ITFH) before being encapsulated by some ITRC or ITRD which is errant - but now the packets are handled by the ITFH or a nearby ITRC and so never go through the errant ITR. Thirdly, this traceroute really needs to be performed from the original sending host, or some host in the same part of that network, to exactly the same destination IP address - the Ivip-mapped address of the destination host. Whoever does this test should check the mapping of that address first, to find the correct address of the ETR these packets are supposed to be tunneled to. If that doesn't appear on the traceroute, then maybe the ITR is doing something wrong - such as operating from corrupt or incomplete update information, or has something wrong with its FIB data. Finally, the problem might have occurred for a while when the mapping was in one state, but now the mapping has changed and no problem can be found. There are going to be lots of ITRDs and ITRCs around the Net, most of them probably not very closely managed. An errant one of these, or an errant Query Server, will cause some packets to be dropped or at least not sent to the correct ETR. But that behavior will only happen for the specific Ivip-mapped address, from a some set of sending hosts - and the fault may disappear when new mapping information arrives or after the errant ITR decides it has lost synch with the mapping update stream and has either taken itself offline (letting the packets be tunneled by another ITR) or has since reloaded the mapping data. This is an area which will require a lot of thought. 14.2. Scaling the Replicator network I don't know of any system resembling the Replicator system - so a great deal of work will be required to figure out an architecture which can reliably deliver streams of packets to hundreds of thousands of millions of ITRDs and QSDs all over the Net. The system described above assumes that a single Replicator can receive two complete US-Complete streams of packets and send out some number of copies, such as 30 or so. The idea is that since each Replicator is generally going to get two copies of some packet, from its two previous level Replicators, that it sends 30 copies of a packet of a particular IMAB number and sequence number as soon as one is received, and then ignores the second copy (assuming it contains Whittle Expires January 16, 2008 [Page 96] Internet-Draft Ivip Architecture July 2007 the same information). Replicators need to have pretty reliable levels of packet reception and delivery - which can be difficult to ensure. There can be peaks in the streams of packets - I am not sure how to regulate this, except by some feedback from the first level of Replicators to the various RUASes, causing them - or some of them - to hold back on updates for a moment so as not to overload the Replicator system. If the volume of updates becomes too much, the simple expedient is to build a second parallel system of replicators, with the new system handling updates from one subset of the RUASes and the original one handing updates from the remainder. The ITRD system could also be split, with one set of ITRDs handling one set of IMABs and therefore advertising them, and another set handling the remainder. Perhaps one set is more optimised for rapid changes for mobile end-users, so these end-users would get Ivip- mapped addresses in the IMABs of the higher-speed network, and their pay-per-update fees would fund that system. It is not obvious how a similar split could be achieved for QSDs and QSCs. That would require two sets of ITRCs, or at least a single ITRC or ITFH which knew two separate QSCs or QSDs to query, depending on whether the DID in question belonged to one set of IMABs or the other. That would be very messy. 14.3. Is fast, secure, Replication possible on the Internet? There are probably various ways of using UDP packets for updates with detection of missing packets and of spoofed packets. This could limit the time incorrect data was being sent to ITRDs and there could be various methods of recovery. In order to protect against false information being used by the ITRD, authentication of each update packet's data will be required. TCP could be used, such as with "HMAC Protected TCP Connections" as suggested in LISP-CONS. I guess this is draft-touch-tcpm-tcp-simple-auth (work in progress). But this can be disrupted by spoofed packets. Even if attacks aimed at creating bogus mapping information into ITRDs could be prevented, Level 1 of the Replicator system could be disrupted by a flood of packets from a botnet. RFC 4732, Internet Denial-of-Service Considerations [RFC4732] describes various types of DoS, but notes that there is no absolute way of protecting against them: "As a result, almost all Internet services are vulnerable to denial- of-service attacks of sufficient scale. In most cases, sufficient Whittle Expires January 16, 2008 [Page 97] Internet-Draft Ivip Architecture July 2007 scale can be achieved by compromising enough end- hosts (typically using a virus or worm) or routers, and using those compromised hosts to perpetrate the attack." One way of making parts of the system invulnerable to DDoS attacks would be to have parts of the RUAS and Replicator system interconnected with private network links - so RUASes and first few levels of Replicators are not using Internet addresses at all. This adds enormously to the difficulty and cost of setting the system up. Perhaps it is best to design a system which is as robust as possible for deployment on the open Internet and consider using private network links closer to the time of deployment. 14.4. TTRs and Mobility The global ITR system of LISP, or Ivip etc. could be used to direct packets to "Translating Tunnel Routers" (TTRs). These would be located in multiple locations, and a mobile node would find one or more of them topologically nearby and establish a two way tunnel to each TTR. Each TTR would be capable of being somewhat like a home- agent - accepting packets to be sent to the mobile node and forwarding outgoing packets from the mobile node to the local network and the Internet. This mobile use of Ivip does not involve the database or the ITRs in any new type of functionality, other perhaps than "mobility" implying a higher rate of updates than for multihoming or simple portability, and with the general hope or expectation that a change in the database will result in changed tunneling very quickly - ideally in a fraction of a second. Traditional Mobile IP involves a fixed home-agent router, and the mobile node usually having an address from the network that router handles. Sub-optimal paths usually result, since the correspondent node may be near the mobile node, but both are far from the home- agent. Traditional Mobile IP works with IPv6 and requires no new functions in the correspondent node as long as the (typically) suboptimal paths via the home-agent are used. New software in the correspondent host enables it to send packets more directly to and from the mobile node. Ivip will enable IPv4 and IPv6 correspondent nodes with no special mobility software to have generally optimal paths to and from the mobile node - which will require additional mobility software. Normal Ivip does not require the destination host to have any IP address other than its Ivip-mapped address. Mobility usually involves the mobile node acquiring a care-of address in whatever network it is currently using (or multiple networks, if it is using multiple radio, wired Ethernet etc.) and establishing a tunnel from there to a home-agent. The mobile use of Ivip also involves the Whittle Expires January 16, 2008 [Page 98] Internet-Draft Ivip Architecture July 2007 mobile node having one or more care-of addresses - which may be behind NAT, as long as the tunnel arrangement to the TTR can be established from behind a NAT. Using the ITR system to direct packets from correspondent nodes all over the Net to the currently active TTR will lead to generally optimal, or close to optimal, paths to that TTR. Since the TTR is typically close to the mobile node, the total path length will generally be close to optimal. The ability of the mobile node to choose its own TTR as it acquires new connections to the Net means it can physically move and establish new TTRs, and have the ITRs tunnel packets to whichever TTR it chooses. So a mobile node could move physically across the world, if it could maintain some kind of Internet connection, whilst retaining all along the one Ivip-mapped address (or multiple addresses, or a /64 for IPv6), on which long-lasting sessions could be conducted. If the mobile node had two TTRs at one time, with the ITRs tunneling to TTRA, it wouldn't matter that the database and ITR network might take some seconds to change the tunneling to TTRB. As long as the mobile node accepted incoming packets from both TTRs at once, then there should be few problems. Switching to another TTR because the current one is unreachable (to the Net or from the mobile node) is likely to take a few or many seconds - so it would not be possible to use this global Ivip network to achieve split-second changeovers and so have only sub-second loss of connectivity. A mobile node would need its own mobile software to find TTRs and to establish tunnels to them. The mobile node would also need to decide which TTR to send its outgoing packets on. Access to TTRs would probably involve paying a fee, unless it was within the network the mobile node is currently connecting with. Some central system to help mobile nodes find nearby TTRs would also be needed. This centralised system would probably be a commercial service, not directly connected with the Ivip system, but would have the credentials required to alter the mapping data for the end-user's Ivip-mapped address(es). This centralised system would probably monitor connectivity to the mobile node via the multiple TTRs and direct the mobile node about which one is best to send outgoing packets on. This central system would also probably control the mapping, so if the currently used TTR and its link to the mobile node became non-functional, the central system would quickly change the mapping to another TTR. In this respect, the system would be doing the same job as a centralised multihoming monitoring and failure detection system. Whittle Expires January 16, 2008 [Page 99] Internet-Draft Ivip Architecture July 2007 A router or server which performs TTR functions may also be an ITRC or ITRD, at least for encapsulating and tunneling packets which are sent by the one or more mobile nodes it connects with. Two mobile nodes which are sending packets to each other while using the one TTR would have their packets either routed directly within the TTR or would have them encapsulated by the ITR function in that device and then decapsulated by the ETR part of the TTR function. In principle, ITFH could be used in the mobile host, but this would add mapping query packets to the traffic of the link, which can reasonably be assumed to be a slow and expensive radio link in many cases. It is better to leave the ITR function to the TTR-ITRC device, which has connections to a nearby QSD/QSC and to an ITRD to handle packets it doesn't yet have mapping for. The TTR could also integrate an ITRD, but this would require it to get a continual feed of mapping updates. Generally, the more TTRs there are and the closer they can be to wherever mobile devices connect, the better - so an integrated ITRC is probably the best choice. The basic diagram of using a combined TTR and ETR is as follows. ................ ............ . N1 . . N2 . . . . . . CN1----ITR1~~~~~BR~~~TR~~~BR~~~~~TTR1===PE==\ . . \ . . = ................ | ............ = | = | = | MN1 ......... | ........... = . N4 . / . N3 . = . ./ . . = . TTR2==BR======BR==========PE==/ . . . . ......... ........... ~~~~ 1-way Ivip tunnel ==== 2-way tunnel established by Mobile Node to TTR Figure 10: Mobile IP with two TTRs. This shows only one correspondent node CN1, but of course any number of correspondent nodes would be using their nearby ITRs to tunnel packets to the currently chosen TTR, which is TTR1. Packets sent by Whittle Expires January 16, 2008 [Page 100] Internet-Draft Ivip Architecture July 2007 CN1 travel to the internal ITR ITR2, where they are tunneled through N1's BR (Border Router) the TR (Transit Router), N2's BR to the tunnel endpoint (DA of outer IP header) TTR1. There, the ETR function of TTR1 decapsulates the raw packet and then recencapsulates it whatever way is required for the 2-way tunnel to MN1. The mobile node MN1 has established 2-way tunnels to two TTRs. TTR1 is inside the access network N2 - for instance this might be a GRPS or UMTS cellular mobile link to N2. MN1 also has an airport lounge or in-flight WiFi link to N3. Alternatively, the link could be via an Ethernet cable in an office LAN setting, or an Ethernet or WiFi link in a home which gives it an address behind a NAT. In any of these cases, it establishes a 2-way tunnel to TTR2 which is in a separate network from N3. Perhaps TTR2 is operated by a commercial TTR network operator and the end-user pays to use this TTR. The TTRs of this company could be located all over the Net, close to various provider networks, or located within them - and the mobile nodes find them with the help of some centralised control system this company provides. Exactly what sort of 2-way tunnels are established is a matter for the mobile node and TTR to decide - this has nothing directly to do with Ivip. Currently, the mapping for MN1's IP address in whichever IMAB it is located, causes the ITRs to tunnel packets to TTR1. Assuming MN1 is currently directing its outgoing packets along the line to TTR1, the flow of packets from MN1 follows the 2-way tunnel to TTR1, which decapsulates the packets from the 2-way tunnel. Since CN1 is on an ordinary BRIP address, there is no involvement of the ITR function I which TTR1 is assumed to have. From TTR1, the packet uses ordinary BGP forwarding, via N2's BR (Border Router), the TR (Transit Router), N1's BR, ITR1 (operating as an ordinary router, since the DA of this packet is not within an Ivip-mapped IMAB) and to the destination: CN1. In general TTR1 forwards the packets it receives from MN1 to the rest of the Net. If the packet's DA is within an Ivip IMAB then this packet would typically be handled by the ITR function which I suggest should usually be integrated into an ITR. An exception might be made if the packet was addressed to some other mobile node - MN2 (not shown) TTR1 has a tunnel from. If TTR1 routes those packets directly to the MN2, then this is the equivalent of the internal routing system directly forwarding packets with Ivip-mapped addresses to local hosts which have those addresses - as in point 2 of the discussion above about "Note 2 - ETRs must handle packets from ITRs in the same network". As noted in that discussion, this is probably a bad idea. What really matters is the current mapping for that Whittle Expires January 16, 2008 [Page 101] Internet-Draft Ivip Architecture July 2007 Ivip-mapped address of MN2, not the fact that TTR2 happens to have a link to it. It is best for TTR1 to either have its own ITR function to decide where the any packet with DA = an Ivip-mapped address (including those with DA = a mobile node TTR1 has a tunnel from) should be tunneled to. If the mapping database has TTR1 as the TELOC for MN2's DID, then TTR1's ITR function could either encapsulate the packet and tunnel it to TTR1's own ETR function (which would lead to it being sent on the 2-way tunnel to MN2) - or it could forward the packet to its 2-way tunnel input system directly. If TTR2 didn't have an internal ITR function, it would be best if it let the packet out where it would find a nearby ITR which would tunnel it according to the current state of the mapping database. This may tunnel the packet back to TTR1's ETR function - or to some other TTR - whichever MN2, the end-user or the centralised management system has decided is best. The mapping database and therefore the ITRs know the best way to handle any packet addressed to MN2's Ivip-mapped address. The fact that TTR1's "local routing system" has a link to MN2 is not as important as the mapping information for MN2's address. A centralised control system, perhaps operated by the same company which runs TTR2 and hundreds of other TTRs, is not shown here. Suppose this system determines that it would be best to use the TTR2 link instead. It simply changes the mapping (using the credential previously supplied by the end-user) and within a few seconds all ITRs will be tunneling packets to TTR2 instead. The centralised control system would probably be in regular communication with its corresponding software in MN1. This system doesn't need to rely on the Ivip system (database and ITRs) for this communication, since it can easily create its own encapsulated packets and send them to TTR1 and TTR2. Whittle Expires January 16, 2008 [Page 102] Internet-Draft Ivip Architecture July 2007 15. Security Considerations There are clearly a plethora of potential security problems with Ivip. Any system which controls the tunneling of all packets addressed to one or more Ivip-mapped addresses is a tempting target for many attackers. Due to the limited time available to prepare this 00 draft, consideration of security matters is deferred until subsequent versions. Whittle Expires January 16, 2008 [Page 103] Internet-Draft Ivip Architecture July 2007 16. IANA Considerations [To do.] Whittle Expires January 16, 2008 [Page 104] Internet-Draft Ivip Architecture July 2007 17. Informative References [I-D.farinacci-lisp] Farinacci, D., "Locator/ID Separation Protocol (LISP)", draft-farinacci-lisp-01 (work in progress), June 2007. [I-D.ietf-shim6-proto] Bagnulo, M. and E. Nordmark, "Shim6: Level 3 Multihoming Shim Protocol for IPv6", draft-ietf-shim6-proto (work in progress), April 2007. [I-D.irtf-rrg-design-goals-01] Li, T., "Design Goals for Scalable Internet Routing", draft-irtf-rrg-design-goals (work in progress), July 2007. [IAB-RAWS-website] Meyers, D., "IAB Workshop on Routing and Addressing - resources and presentations", December 2006. [ICANN-DNS-attack] "DNS Attack Factsheet 1.1", March 2007. [ISC-Anycast] Abley, J., "Hierarchical Anycast for Global Service Distribution", March 2003. [RFC1546] Partridge, C., Mendez, T., and W. Milliken, "Host Anycasting Service", RFC 1546, November 1993. [RFC4732] Handley, M., Rescorla, E., and IAB, "Internet Denial-of- Service Considerations", RFC 4732, December 2006. [RW ping survey] Whittle, R., "Probing the density of ping-responsive-hosts in each /8 IPv4 prefix and in different sizes of BGP advertised prefix", March 2007. [iPlane] "iPlane Datasets", July 2007. [van-Beijnum-BGP] van Beijnum, I., "Encoding routing information in bitmaps", August 2001. Whittle Expires January 16, 2008 [Page 105] Internet-Draft Ivip Architecture July 2007 Appendix A. Acknowledgements Thanks to the following people for LISP and for helping me in other ways: Noel Chiappa, Olivier Bonaventure, Brian Carpenter, Dino Farinacci, Vince Fuller, Joel M. Halpern, Geoff Huston, Ved Kafle, Eliot Lear, Simon Leinen, Tony Li, Jeroen Massar, Dave Meyer, Chris Morrow, Dave Oran, Robert Raszuk, Jason Schiller, John Scudder, K. Sriram, Markus Stenberg, Christian Vogt and Kilian Weniger. This I-D is the first attempt at documenting Ivip proposal - a month after I first began devising it. Hopefully one or more ideas within this proposal will prove to be of lasting value. Whittle Expires January 16, 2008 [Page 106] Internet-Draft Ivip Architecture July 2007 Appendix B. The Ivip acronym The Internet is widely known for its positive commercial, cultural and political impacts. Perhaps in the longer term the Internet's interpersonal benefits may become better recognised. I have lived most of my life in Melbourne, Australia. Without the Internet's free, open, global, person-to-person and one-to-many communications, I would never have known my wife Tina, who comes from Houston, Texas. One evening we were watching Doris Day, Rock Hudson and Tony Randall in the 1961 romp "Lover Come Back". Advertising executive Jerry Webster (Rock Hudson) finds himself in trouble - from which he believes he can extract himself by convincing a dancer (Edie Adams) that he will introduce her to Hollywood by making her the star of a promotional campaign for a hot new product. She is keen and keeps asking him what the product is. Casting his eyes around the room, he sees a newspaper with a headline about a VIP. "Vip!" he exclaims. He spends the rest of the movie trying to figure out what this great new product will be. The next night I thought up "anycast ITRs in the core, with EIDs advertised in BGP" to make the LISP proposal incrementally deployable. I wanted a name for a new proposal . . . Initial meanings for ViP (later Ivip) included "Versatile redIrection of Packets" and some others. "Internet Vastly Improved Plumbing" came a few days later, and is the most memorable so far. Ivip's semantics are user extendable. "Ivip" is brief, distinctive and easy to pronounce ("eye-vip" as in "ivory"). Capitalisation is user-configurable, but the first character, upper case 'i', SHOULD be capitalized, because I believe the Internet richly deserves its name remaining a proper noun - and to discourage pronunciation such as "ivip" as in "itch". The capital "I" raises a potential problem with sans-serif fonts such as Helvetica, since it is indistinguishable from lower-case "L". This has bedevilled the 3GGP term "Iub" (capital 'i') which is far more widely known outside the organisation as "lub" (lower-case 'L'). "IViP" looks good in print but is annoying to type. Like "iViP", "IViP" is reminiscent of the 1990s, while Ivip is in fact a 1960s engineering product: www.firstpr.com.au/ip/ivip/tv-ad/. Whittle Expires January 16, 2008 [Page 107] Internet-Draft Ivip Architecture July 2007 Author's Address Robin Whittle First Principles Email: rw@firstpr.com.au URI: http://www.firstpr.com.au/ip/ivip/ Whittle Expires January 16, 2008 [Page 108] Internet-Draft Ivip Architecture July 2007 Full Copyright Statement Copyright (C) The IETF Trust (2007). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Intellectual Property The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Acknowledgment Funding for the RFC Editor function is provided by the IETF Administrative Support Activity (IASA). Whittle Expires January 16, 2008 [Page 109]