Network Working Group R. Whittle Internet-Draft First Principles Intended status: Experimental February 18, 2008 Expires: August 21, 2008 Ivip Mapping Database Fast Push draft-whittle-ivip-db-fast-push-00.txt Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on August 21, 2008. Copyright Notice Copyright (C) The IETF Trust (2008). Whittle Expires August 21, 2008 [Page 1] Internet-Draft Ivip DB Fast Push February 2008 Abstract Ivip (Internet Vastly Improved Plumbing) is a proposed map-encap system which is intended to provide a solution for the routing scaling problem - supporting growing numbers of end-user networks with multihoming, traffic engineering and portability, without further growth in the global BGP routing table. Ivip is also intended to provide other benefits, including a new form of IPv4 and IPv6 mobility and better utilization of IPv4 address space. To achieve these benefits, Ivip relies on a "fast mapping database push" system, which is required to securely and reliably deliver updates to the mapping database to hundreds of thousands - or potentially millions - of ITRs (Ingress Tunnel Routers) and Query Servers (QSes) all over the Net, ideally within a few seconds. This ID describes the requirements of such a system and how it could be implemented so as to cope with very large numbers of updates and ITR/QS sites. Whittle Expires August 21, 2008 [Page 2] Internet-Draft Ivip DB Fast Push February 2008 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1. Outline of the RUAS, Launch and Replicator systems . . . . 6 1.2. Background assumptions . . . . . . . . . . . . . . . . . . 7 1.3. It may not be so daunting... . . . . . . . . . . . . . . . 9 2. Objections to push and hybrid push-pull schemes . . . . . . . 10 3. Ivip compared with other map-encap schemes . . . . . . . . . . 11 4. Benefits of Fast-Push . . . . . . . . . . . . . . . . . . . . 13 4.1. Modular separation of the multihoming restoration functions . . . . . . . . . . . . . . . . . . . . . . . . 13 4.2. Reduction in the size of the mapping information . . . . . 15 4.3. Reduced ITR and ETR functionality . . . . . . . . . . . . 17 4.4. Greater security through simplification and modularization . . . . . . . . . . . . . . . . . . . . . . 17 4.5. IPv4 and IPv6 mobility with generally optimal path lengths . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.6. Better suited to future enhancements . . . . . . . . . . . 19 5. Goals, Non-Goals and Challenges . . . . . . . . . . . . . . . 20 5.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 20 5.2. Non-goals . . . . . . . . . . . . . . . . . . . . . . . . 22 5.3. Challenges . . . . . . . . . . . . . . . . . . . . . . . . 23 6. Definition of Terms . . . . . . . . . . . . . . . . . . . . . 25 6.1. RLOC address space . . . . . . . . . . . . . . . . . . . . 25 6.2. Mapped address space . . . . . . . . . . . . . . . . . . . 25 6.3. MAB - Mapped Address Block . . . . . . . . . . . . . . . . 25 6.4. UAB - User Address Block . . . . . . . . . . . . . . . . . 25 6.5. Micronet . . . . . . . . . . . . . . . . . . . . . . . . . 26 6.6. RUAS - Root Update Authorisation System . . . . . . . . . 26 6.7. UAS - Update Authorisation System . . . . . . . . . . . . 26 6.8. UMUC - User Mapping Update Command . . . . . . . . . . . . 27 6.9. SUMUC - Signed User Mapping Update Command . . . . . . . . 28 6.10. MABUS - Update Stream specific to one MAB . . . . . . . . 28 6.11. Launch server . . . . . . . . . . . . . . . . . . . . . . 28 6.12. Replicator . . . . . . . . . . . . . . . . . . . . . . . . 29 6.13. QSD - Query Server with full Database . . . . . . . . . . 29 6.14. QSC - Query Server with Cache . . . . . . . . . . . . . . 30 6.15. ITR - Ingress Tunnel Router . . . . . . . . . . . . . . . 30 6.16. ITRD - Ingress Tunnel Router with full Database . . . . . 30 6.17. ITRC - Ingress Tunnel Router with Cache . . . . . . . . . 31 6.18. ITFH - Ingress Tunneling Function in Host . . . . . . . . 31 6.19. ETR - Egress Tunnel Router . . . . . . . . . . . . . . . . 32 6.20. TTR - Translating Tunnel Router for Mobile-IP . . . . . . 32 7. Update Authorities and User Interfaces . . . . . . . . . . . . 33 7.1. RUAS Outputs . . . . . . . . . . . . . . . . . . . . . . . 33 7.1.1. Updates every second . . . . . . . . . . . . . . . . . 33 7.1.2. MAB snapshots . . . . . . . . . . . . . . . . . . . . 33 7.1.3. Missing packet servers . . . . . . . . . . . . . . . . 36 Whittle Expires August 21, 2008 [Page 3] Internet-Draft Ivip DB Fast Push February 2008 7.2. Authentication of RUAS-generated data . . . . . . . . . . 36 7.2.1. Snapshot and missing packet files . . . . . . . . . . 36 7.2.2. Mapping updates . . . . . . . . . . . . . . . . . . . 37 7.3. RUAS - UAS interconnection . . . . . . . . . . . . . . . . 38 8. The Launch system . . . . . . . . . . . . . . . . . . . . . . 42 8.1. Phase 1 - collecting updates from RUASes . . . . . . . . . 42 8.2. Phase 2 - checksum comparison . . . . . . . . . . . . . . 43 8.3. Phase 3 - identical update streams . . . . . . . . . . . . 44 9. Replicators . . . . . . . . . . . . . . . . . . . . . . . . . 45 9.1. Scaling limits . . . . . . . . . . . . . . . . . . . . . . 46 9.2. Managing Replicators . . . . . . . . . . . . . . . . . . . 48 10. Security Considerations . . . . . . . . . . . . . . . . . . . 49 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 50 12. Informative References . . . . . . . . . . . . . . . . . . . . 51 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 52 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 53 Intellectual Property and Copyright Statements . . . . . . . . . . 54 Whittle Expires August 21, 2008 [Page 4] Internet-Draft Ivip DB Fast Push February 2008 1. Introduction The aim of this ID is to establish that the fast push approach to map-encap schemes is practical and desirable for very large numbers of micronets (EIDs in LISP terminology) and rates of change of the mapping database. It is too early to quantify scaling limits and costs - and likewise there are no concrete design goals for the future. This ID is the first detailed step to developing at least one kind of global, fast push, mapping distribution system. Others may well be developed. Each such proposal provides a challenge to those who advocate a "full pull" global query server system (such as LISP-ALT), since the arguments for "full pull", with its inherent delay of some or many initial packets, rely largely on how impractical or undesirable it would be to use a full push or hybrid push-pull system instead. This ID describes in some detail the most novel and perhaps difficult part of the Ivip system. The rest of Ivip's functionality will be comparatively easy to implement compared to the equivalents in other systems. For instance, the fast push system means that ITRs do not need complex mapping information, do not need to probe ETRs for reachability and do not need to make decisions about which ETR to tunnel packets to. While contemplating this ambitious proposal, the reader is requested to remember that a successful implementation of something like Ivip would add immense value to the Internet - and not just by saving money due to solving the routing scalability problem. Immense value would be added by better utilisation of IPv4 address space and by the system's ability to provide a new form of mobility, for both IPv4 and IPv6, with generally optimal path lengths, few changes to the mobile host and no changes required for the correspondent host. The benefits of a scheme such as this should motivate considerable effort to develop and deploy some kind of fast push map-encap scheme. These benefits are not just for the long-term good of the Net or Humanity, but include direct benefits to those who provide the new form or address space, and to those end-users who adopt it. Ivip's overall architecture is described in [I-D.whittle-ivip-arch]. This ID is the first in a series to describe particular aspects of the proposed system, starting with the most ambitious feature of the design: a system to push large numbers of small items of "mapping data" to potentially millions of sites all over the Net, securely and reliably - and ideally within a few seconds. Please see the Ivip homepage http://www.firstpr.co.au/ip/ivip/ for further material and latest updates, including the text of IDs which are delayed by the Whittle Expires August 21, 2008 [Page 5] Internet-Draft Ivip DB Fast Push February 2008 IETF submission cut-off dates. Ivip is one of several "map-encap" schemes currently being considered by the IETF Routing Research Group. Others include LISP (Locator/ID Separation Protocol) [I-D.farinacci-lisp], APT (A Practical Transit Mapping Service) [I-D.jen-apt] and TRRP (Tunneling Route Reduction Protocol) [TRRP]. The most unusual and demanding part of Ivip's fast-push system is the network of "Replicator" servers which fan the mapping updates out to the full database ITRs (ITRDs) and full database Query Servers (QSDs) at recipient sites. Before describing this, several subjects are discussed in some detail: 1. The benefits which the fast push system brings to Ivip, compared to other map-encap schemes. 2. The goals, non-goals and challenges of this fast push system. 3. How multiple RUAS (Root Update Authorisation System) systems combine their mapping changes into a form which can be fanned out to the "Replicators". 1.1. Outline of the RUAS, Launch and Replicator systems In this ID, the largest part of the fast push system is comprised of thousands (perhaps several hundred thousand in the long term future) of essentially identical "Replicator" servers. There may be other, better, approaches, but this serves as a starting point. There is a single stream of packets which carry the combined mapping updates for the whole Ivip mapped address space. A finite number (ten to a few dozen at most) of RUASes work together with a shared "Launch system" of distributed servers, which generates multiple identical streams of update packets over secure links to the first level of Replicators. At the first level, each Replicator receives two identical streams, over separate authenticated and encrypted links, from two different Launch servers in different geographical locations, and over different physical long distance links. The Launch system and perhaps the first level (1) of Replicators will probably be implemented with private network links, rather than relying on open Internet addresses which are subject to flooding attacks. If a packet goes missing from one stream, it will probably be present in the second. As the packets arrive, the Replicator takes the first one from either stream and sends its contents out simultaneously on a Whittle Expires August 21, 2008 [Page 6] Internet-Draft Ivip DB Fast Push February 2008 larger number of similar links to the next level of Replicators. Consequently, the delay time for update information passing through a Replicator is measured in milliseconds, and is comparable to the delays experienced in routers. In this way, each Replicator consumes two identical streams from geographically and topologically different sources, and fans the content of the streams out to some larger number of Replicators, ITRDs or QSDs at the next level. This number of output streams per Replicator may be in the tens to one hundred range, depending on the volume of updates. Initially, it would be quite high, when update rates are low - meaning that the initial global Replicator network could serve the growing number of ITRDs a QSDs with few levels of Replicators, and with each one fanning out updates to a large number of Replicators at the next level. (It is possible to imagine multiple parallel Replicator networks to share the load, but this is not contemplated further in this ID.) After some number of levels of replication, determined by local conditions, the streams deliver the update information at an ITRD or QSD. Ideally, each such end-point receives two streams from two geographically dispersed Replicators. These need not be at the same level, so the system is relatively flexible, and each Replicator will generally be sending a complete streams of packets. The Launch system generates the stream as a variable number of packets on a regular schedule, such as every second. Data within each packet enables ITRDs and QSDs to authenticate the mapping information, and to request from remote servers any packets which did not arrive. 1.2. Background assumptions For the purposes of this discussion, it is assumed there will be a single global Ivip system, with multiple organisations being responsible for the management of the various blocks of address space which are managed with Ivip. It would be technically possible to run multiple Ivip systems, or Ivip-like systems, in parallel, with separate networks of ITRs, or with separate database fast push systems and some separate ITRs with some ITRs handling traffic for multiple such systems. It would also be possible for an organisation to establish an Ivip- like system, without reference to any IETF RFCs, and to conduct a business renting out address space in small, flexible, chunks, with portability and multihoming via any ISP who provides the requisite, relatively simple, ETRs. Likewise for the mobility potential of Ivip. Whittle Expires August 21, 2008 [Page 7] Internet-Draft Ivip DB Fast Push February 2008 However, for simplicity, this ID assumes that Ivip development will be coordinated into a single global system, as DNS is, following appropriate IETF engineering work and administrative decisions in RIRs and other relevant organisations. A development timeframe of 2009 to 2011 is assumed, with widespread deployment being achieved in the 2013 to 2015 timeframe. Except where noted, it is also assumed that all full database ITRs and Query Servers receive a single global body of mapping data. An alternative to be considered in the future is more complex, and has various problems, but may be of value: that each site may choose to receive a full push feed of mapping information for only some parts of the global database, and rely on access to query servers in another network when packets must be handled which are addressed to micronets not included in the pushed subset. This approach is contemplated in LISP-NERD [I-D.lear-lisp-nerd]. In addition to the global fast push database update distribution system discussed in this ID, Ivip also involves Query Servers sending "notifications" to ITRs which recently requested mapping for a micronet whose mapping has just changed. This is a second form of push - on a local scale - and will be discussed in a future ID concerning ITRs and Query Servers. (It is also discussed in the ivip-arch-00 ID.) The fast push system is complemented by a second system (discussed later in this ID) by which ITRs or Query Servers initiate downloads of snapshots of sections of the database - for initial boot up - and by which they can request specific update packets which did not arrive via the fast push system. This ID concentrates on IPv4, since the future map-encap scheme is urgently needed for IPv4, but will not be so urgent for IPv6 for at least several more years. In principle, the same arrangements will apply for IPv6, with a different and more verbose data format than the 12 or so bytes required for each IPv4 mapping update. It may make sense to defer finalisation of any future IPv6 map-encap scheme until substantial operational experience was gained with the IPv4 scheme. A contrary perspective is that IPv6 will never be widely adopted until end-users have multihomed (and portable) address space. Since SHIM6 cannot provide the required network-centric approach to multihoming (though SixOne [I-D.vogt-rrg-six-one] may achieve this), the only way of providing multihoming to large numbers of IPv6 end- user networks without the unwanted bloat in the DFZ routing table is to deploy a good map-encap scheme ASAP. Whittle Expires August 21, 2008 [Page 8] Internet-Draft Ivip DB Fast Push February 2008 1.3. It may not be so daunting... Ivip documentation is written with a preference for detailed discussion over terseness. So Ivip IDs may appear rather daunting at first. Hopefully these IDs will be clearly understandable, and the reader will recognise that the future map-encap scheme is a momentous development, requiring detailed consideration. This ID focuses on handling billions of micronets and potentially thousands or tens of thousands of updates a second. Ideally, with good design, some more elegant approaches can be found than those presented below. Also, during initial deployment, the demands on the fast push system will be far lighter than those anticipated below, so the system might initially be somewhat simpler. In the initial stages of introduction, there may be little need to deploy dedicated servers for the "Replicator" functions, since the volume of updates may be so light as to make it practical to run this software on existing servers, such as nameservers. Furthermore, in the early years of introduction, when there are hundreds of thousands or a few million micronets, the low level of update packets (compared to the highest imaginable levels contemplated below) should enable each Replicator to fan out to many more next-level Replicators than would be possible when hundreds of millions or billions of micronets are handled by the system. This would mean fewer levels of Replicators, fewer Replicators and generally faster delivery of the mapping information than would be possible with current technology if the system was handling billions of micronets. Whittle Expires August 21, 2008 [Page 9] Internet-Draft Ivip DB Fast Push February 2008 2. Objections to push and hybrid push-pull schemes Objections to a full push or hybrid push-pull map-encap schemes constitute arguments for full pull schemes such as LISP-ALT, including: 1. The size of the database (primarily the number of micronets multiplied by the average size of the mapping data) will grow to be so large that it will be impractical or undesirable either to concentrate the data in any one place, or to make copies of it to multiple locations. This ID is intended to show that a push scheme, in this case fast push, can scale to very large update volumes and numbers of micronets. 2. The rate of change to the database will grow to the extent that it will be impractical or undesirable to send all those changes to all ITRs (full push) or to some ITRs and Query Servers (hybrid push-pull). Ivip's flexibility addresses this second question to a significant degree by enabling the one consistent architecture to be deployed with local decisions about how far to push the mapping data, and therefore how much remaining distance from the location of most ITRs to be sending map requests and getting responses. As long as the optimal number of full database query servers in the world is a few hundred or more, then Ivip's hybrid push-pull approach is clearly superior to a global query server system, because the paths of queries and responses will be much shorter and therefore more reliable and cheaper. 3. That any degree of push typically involves sending mapping data to sites which will not use it. This is a valid concern and Ivip is not intended to provide mapping changes for end-users necessarily free-of-charge, just as the TCP/IP protocols are not intended to be used in a way in which any one party persistently sends unwanted packets to the service of any other party. Administrative and business arrangements for this, to deter frequent changes and/or to ensure end-users' mapping changes involve a contribution to the cost of the fast push system, will be discussed in the planned ivip-deployment ID. Whittle Expires August 21, 2008 [Page 10] Internet-Draft Ivip DB Fast Push February 2008 3. Ivip compared with other map-encap schemes LISP-NERD [I-D.lear-lisp-nerd] is a "full push" map-encap system, in which the full mapping database and updates are "pushed" to every ITR. Updates are sent from servers in response to periodic requests from ITRs. Ivip's fast push involves a dedicated network of "Replicator" servers, which push a continual stream of updates to all full database ITRs (ITRDs) and full database Query Servers (QSDs). These devices passively receive the updates, which arrive ideally within a few seconds of the end-user changing their mapping. Because Ivip involves caching ITRs (ITRCs), there is no need to push the full set of database updates to every ITR, thus overcoming the primary inefficiency and scaling objections to a "full push" scheme. LISP-ALT [I-D.fuller-lisp-alt] is a "full pull" system, with a global ALT network by which ITRs send mapping queries to the authoritative query servers, which are typically ETRs. (ALT also involves sending initial traffic packets by this global network, where they also constitute a request for mapping information.) The primary benefit of a "full pull" system is that the mapping database is fully distributed, and no traffic or hardware is involved in pushing the mapping data anywhere. This means the end users can have as much mapping information as they like, and change it as frequently as they desire, without requiring that these changes be sent to ITRs and QSes all over the world. The primary objection to such a scheme is that the necessarily global nature of the query server network will often delay the delivery of initial packets by times which are likely to cause significant slowdowns in session establishment, causing potential difficulties for higher level protocols and dissatisfaction for users. Other objections include difficulty trading off caching time for faster responses to mapping changes, and bottlenecks in the ALT network and in the few authoritative Query Servers (ETRs). TRRP [TRRP] too involves a global query server system, based on separate DNS-like network, so the same difficulties arise with initial packets in a new communication session potentially being delayed for large fractions of a second, or longer. In Ivip, all mapping queries are handled by local query servers, which are likely to be faster, more reliable and involve less overall query-response traffic than any global system such as LISP-ALT or TRRP. APT [I-D.jen-apt] is the only proposal other than Ivip which involves "hybrid push-pull" - pushing the full database to a subset of the ITRs and to full database Query Servers (APT's Default Mappers integrate both functions), with the remainder of the ITRs sending Whittle Expires August 21, 2008 [Page 11] Internet-Draft Ivip DB Fast Push February 2008 their mapping queries to a local Default Mapper. APT involves a new instance of BGP operating on existing routers, to flood the mapping changes to all participating ISPs. This is a much slower form of push than is intended with Ivip's new protocols and specialised "Replicator" servers. Whittle Expires August 21, 2008 [Page 12] Internet-Draft Ivip DB Fast Push February 2008 4. Benefits of Fast-Push Many of the benefits of Ivip are entirely dependent upon the ability to convey to every full database ITRD and QSD in the world an end- user's command to change the mapping of one of their micronets (one or more contiguous IPv4 addresses or /64 prefixes for IPv6). Before describing the goals and potential implementation of the fast-push system, the benefits will be discussed in some detail. The future map-encap architecture should be as powerful and flexible as possible - to solve the immediate routing scalability problem (which is closely bound to the IPv4 address depletion problem) and to provide as many other benefits as possible. For instance Ivip is intended to provide a new form of efficient mobility. A widely deployed map-encap scheme is a powerful piece of infrastructure which may in the future play a role in migrating from IPv4 to IPv6 or some other future Internet addressing architecture. The high speed with which information can be transmitted to the sites containing ITRDs and QSDs is likely to make such a system more suitable for architecturally important tasks in the future which cannot be foreseen today. Since the future map-encap architecture is a major addition to the Internet, with its new kind of address space ideally being adopted ubiquitously be end-users large and small, it makes sense to implement the architecture with specifically designed protocols and servers which enhance the new architecture's modularity, power, speed and scope for future enhancements. This general principle and the specific reasons listed below are strong arguments for developing an ambitious and novel proposal such as Ivip. However, the new protocols and software which will be needed for this fast push system are not necessarily highly demanding. All elements of the proposed fast push system can be implemented as software on conventional servers. The overall fast push system will ideally be a much more secure, reliable, predictable and easy to manage system than any global query server system such as LISP-ALT. 4.1. Modular separation of the multihoming restoration functions Map-encap schemes other than Ivip (LISP, APT and TRRP) are based on the assumption that due to the vast size of the mapping database and/or its rapid rate of change, that it is or will in the future be impossible for the end-user's wishes to be conveyed to all the world's ITRs within a few seconds. This assumption is untested. Perhaps this ID will convince some people that the assumption is wrong. Perhaps it will fail to do so, Whittle Expires August 21, 2008 [Page 13] Internet-Draft Ivip DB Fast Push February 2008 and hopefully a better proposal for a fast push mapping distribution system will be developed. It would be a terrible lost opportunity if the new architecture was built on the assumption that it must be based on pure pull, or slow push, when in fact it is possible and clearly more desirable to use fast push. With the assumption that fast push is impossible, or for some reason undesirable, each ITR must make its own decisions about multihoming service restoration. For instance, the ITR must be given two or more ETR addresses and some criteria for choosing which one to tunnel traffic packets to. The decision could involve Traffic Engineering (TE) functions such as load balancing, but the most important decision is which ETR to send traffic to when one or more of the ETRs is unreachable. This means that each individual ITR needs to determine reachability to each ETR listed in the mapping information and to make decisions based on this reachability and the criteria contained in the mapping information. Consequently, these proposals would result in the following tasks being built into the map-encap scheme itself: 1. The exact methods by which each ETR's reachability could be determined, presumably by each ITR operating in isolation. 2. Similarly, any other reachability functions, such as determining whether and ETR is capable of delivering packets to the destination network. 3. The logic of all decisions regarding ensuring continued connectivity for multihomed networks, and likewise for TE. These need to be codified as part of the map-encap protocol, because they need to be part of the functional specification for all ITRs. 4. Similarly, the logic of these decisions needs to be fixed as part of the map-encap system in order that a format for mapping information can be defined. 5. Since these functions involve ITRs probing ETRs, it is also necessary for the map-encap scheme to standardise the ways ETRs respond to such probes. This may involve ETRs making decisions based upon their own reachability and the reachability of other ETRs (however determined). Consequently, a great deal of complex functionality needs to be defined in RFCs and implemented in every ITR and ETR. This becomes frozen into the map-encap scheme, making it difficult to implement Whittle Expires August 21, 2008 [Page 14] Internet-Draft Ivip DB Fast Push February 2008 even minor variations on these functions once the system is widely deployed. The inability of these other schemes to give the end-user direct real-time control of how ITRs handle packets whose destination address falls within one of their micronets means that the map-encap scheme is a monolithic system. In addition to tunnelling packets from ITRs to ETRs, these systems force all end-users to rely on each system's inbuilt functions for detecting reachability, making decisions about where to send packets etc. Unless the map-encap scheme is made exceedingly complex (with consequent development delays, costs and security problems with ITRs and ETRs) it is likely that some or many end-users will be dissatisfied with the limited functionality the system provides. Similarly, the system cannot be used for any other purpose without a complete upgrade to all ITRs, and possibly ETRs. Ivip provides a map-encap scheme whose sole function is to collect traffic packets into ITRs and to tunnel them to the ETR the end-user specifies for whichever micronet the packet is addressed to. Although ITRs and ETRs do need to work together to solve some Path MTU Discovery and Fragmentation problems, the ITRs are not involved at all in determining reachability or making any decisions. The rapid (ideally, a few seconds or less) response of all Ivip ITRs to the end-user's mapping commands means that end-users can (and must) supply their own multihoming monitoring system and make their own decisions about how to control the behaviour of ITRs, for multihoming, TE, portability or whatever other purposes the end-user requires. There may well be a role for IETF work regarding detecting reachability of multihomed networks via various ETRs, but this is not part of the current Ivip proposal. End users can supply their own systems, make manual decisions, or hire the manual or automated services of other organisations to control the mapping of their micronets. This is a completely modular approach to multihoming etc. - in contrast to the other proposals which monolithically build those functions into their proposed global networks and protocols. 4.2. Reduction in the size of the mapping information With real-time end-user control of ITR behavior, it is not necessary to provide multiple ETR addresses, together with priority information regarding multihoming etc. Consequently, the quantity of mapping Whittle Expires August 21, 2008 [Page 15] Internet-Draft Ivip DB Fast Push February 2008 information in each update can be greatly reduced. In Ivip, for each micronet, only three items of information are specified: 1. Start address of the micronet: 4 bytes for IPv4, 8 for IPv6 (assuming /64 granularity). 2. Length of the micronet, as an integer number of IPv4 addressees or IPv6 /64s: in principle 4 bytes for IPv4 and 8 for IPv6, but in practical terms, half these figures are probably adequate. 3. Address of the sole ETR to which packets addressed to this micronet should be addressed: 4 bytes for IPv4 and 16 bytes for IPv6. Note: Ivip is less functional than the other schemes in one important respect. The other schemes provide TE in the form of load spreading over multiple ETRs for each given micronet (EID prefix, in LISP terminology). Ivip has no such capability. TE for a single Ivip micronet consists solely of steering the traffic for this micronet to one ETR or another. Load sharing for a single IPv4 address or IPv6 /64 is not possible with Ivip. However if the traffic can be split over multiple such IPv4 addresses or /64s, then each can be made into a separate micronet so that load sharing can be achieved by mapping each micronet to a different ETR. Despite this limitation, Ivip may prove to be better for many TE applications due to end-users being able to fine-tune the mapping in real-time. All other schemes involve the specification of (typically) two or more ETR addresses, plus other information regarding priorities and service restoration decisions. Ivip's more compact mapping information makes the task of distributing updates easier than for a monolithic scheme in which ITRs make multihoming restoration decisions. Ivip may involve a greater number of updates, so this advantage may be reduced or reversed. However Ivip's functionality is different from that of competing schemes, so direct comparisons of the compactness of mapping updates are not particularly illuminating. Since mapping information must be stored in every full database ITR or Query Server, Ivip's more compact mapping is an advantage in terms of storage space compared to that required for LISP-NERD or APT. Another consideration is that the other schemes use full prefixes for their micronet/EID lengths, which is more compact, but less flexible, than Ivip's integer number of IPv4 addresses or IPv6 /64s. Whittle Expires August 21, 2008 [Page 16] Internet-Draft Ivip DB Fast Push February 2008 4.3. Reduced ITR and ETR functionality As noted above, the fast push system enables real-time end-user control of the world's ITRs, removing the need for decision making and reachability probing from ITRs and ETRs. This contributes to Ivip being simpler to design, deploy and manage. 4.4. Greater security through simplification and modularization Similarly, the many security problems, including Denial of Service (DoS) problems, which arise in other schemes when ITRs receive mapping information from distant, unknown, ETRs are avoided when the ITR no longer needs to make decisions about reachability and multihoming service restoration. Instead, security of the mapping information needs to be assured as part of the design of the fast push system. Since this consists of a limited number of streams of data, from well-established sources, this should be easier in general than relying on ITRs and ETRs to communicate across the Net, without prior arrangements, and without prior knowledge of each other's existence. 4.5. IPv4 and IPv6 mobility with generally optimal path lengths Ivip enables end-users to exercise fast, essentially real-time, control of which ETR packets addressed to their micronet(s) are tunnelled to by the global system of ITRs. This enables a new form of mobility with some unique and favourable characteristics compared to traditional approaches to mobile IP. This is discussed further in [I-D.whittle-ivip-arch] and in a forthcoming ID devoted to Mobility. Briefly, the idea is that the mobile host (or whatever device is the recipient of traffic for a micronet of addresses) retains its IP address wherever it is located, and establishes one or more care-of addresses in various networks. For instance, a laptop or cellphone may have a WiFi connection to ISP A and so a temporary care-of address (perhaps or probably behind NAT) in that network. It then establishes a link via 3G to ISP B, with another care-of address there. The mobile device needs to establish tunnels from each care of address to one or more ETR-like devices, which are optimised for mobility. These Translating Tunnel Routers (TTRs) combine ITR and ETR functions with the ability to authorise and service a two-way encrypted tunnel established from the mobile device. An external, distributed system of servers enables the mobile host's software to choose TTRs which are either within, or close to, the network it is currently connected to. The TTRs and the TTR location systems would typically be operated by companies who Whittle Expires August 21, 2008 [Page 17] Internet-Draft Ivip DB Fast Push February 2008 charge end-users. The mobile device sends outgoing packets to the TTRs, which are able to forward them to the rest of the Internet, perhaps performing ITR encapsulation at that point. The mobile device and/or some external system controls the mapping of the micronet for this device's address space, causing all the world's ITRs to tunnel traffic packets to one or the other of the two TTRs which the device has connections to. Assuming the TTRs are relatively close to each point of connection to the separate networks, then total path lengths from corresponding hosts will generally be optimal or close to optimal. There is no "home agent" or "triangle routing". The system should work fine with both IPv4 and IPv6, with no changes required for corresponding hosts, and only some additional software, rather than actual host stack changes, for the mobile host. Ivip's fast push system is instrumental in enabling this new form of mobility. Mobility such as this cannot be achieved with a slow push system, or with a pure pull system such as LISP-ALT - unless perhaps such a system had a fast, global-scale notify (cache invalidation and mapping data update) system, which would probably be more complex and less secure than Ivip's fast push system. Even when not used for multihoming or mobility, the real time control of mapping enables the micronet address space of end-users to be completely portable between any ISPs with suitable ETRs. Portability and multihoming are the most important goals being considered by the RRG [I-D.irtf-rrg-design-goals] (though "portability" is generally described in other terms). These are marketable attributes of the new address space. The real time mobility which Ivip can provide is still more marketable, and a further reason to expect that the new architecture will be adopted willingly and profitably by ISPs and end-users alike, rather than due to them having to be cajoled into using it, for instance on the basis that it is the responsible way to obtain address space compared to gaining conventional BGP-managed PI space. This form of mobility is not available via other map-encap schemes. It does not seem to be widely known, or considered to be a possibility by most mobile IP developers - probably because they either haven't heard of the concept of a global ITR-ETR network, they don't think any such thing will be built, or they haven't contemplated that such a network could be driven by a fast push mapping distribution system. Whittle Expires August 21, 2008 [Page 18] Internet-Draft Ivip DB Fast Push February 2008 4.6. Better suited to future enhancements A well designed centralised database update distribution system may be more suitable than a global query system such as LISP-ALT for enhancement in the future, in which the ITR-ETR system is required to perform new and unanticipated functions. For instance, perhaps the ITR-ETR system could be used in some creative way, with special addressing arrangements, to provide automatic communication between IPv4 hosts and IPv6 hosts, via gateways which the ITRs would tunnel packets to. Perhaps this could be done with no IPv4 host changes and some minimal IPv6 host changes. This is a highly speculative suggestion, but is an example of how the ITR-ETR network could be used to create, or support, an important new architectural development. Some ITRs and Query Servers could be upgraded to the new functionality and the information to control these new functions would be sent as part of the main stream of updates, in a distinct format which would be ignored by standard ITRs and Query Servers. Whittle Expires August 21, 2008 [Page 19] Internet-Draft Ivip DB Fast Push February 2008 5. Goals, Non-Goals and Challenges 5.1. Goals The overall goal of the fast push system is to enable end-users, who manage the mapping of their one or more micronets of address space, to securely, reliably and easily communicate their mapping change command to some organisation with which they have a business relationship, so that that change will be propagated to every full database ITR and Query Server as soon as possible. "As soon as possible" means typical delay times of a few seconds, ideally zero seconds, but in practice probably four to five seconds. (Most of this delay is in the RUAS and Launch systems, which could be optimised in the future to process the updates much faster than this, without affecting the much larger Replicator system. "Reliably" means that in the great majority of cases, the ITRs and Query Servers receive every mapping change as expected, but that in the relatively rare event of this being impossible due to packet loss, that the device can recover from this situation within one or at the most two seconds by requesting a copy of the packet from a remote server. Reliability also involves robustness against DoS attacks. This can never be completely protected against for any device on the open Internet, since its link(s) can easily be flooded by packets sent from botnets etc. "Securely" means that each full database ITR and Query Server which receives the updates will be able to instantly verify that the updates are genuine, rather than the result of an attacker who might, for instance, send forged packets to that device or to some other part of the fast push system. The mapping change command, as received by the ITR or Query Server, consists, as noted above, of a starting address and length specification of the micronet, followed by the address of the ETR. A zero for the ETR address indicates the ITR should drop the packets. Multiple mapping updates would be embodied in a datastream providing suitable context for a stream of such updates for IPv4, with a separate set of packets probably handling another, similar, type of mapping information for IPv6. The data format needs to provide for open-ended extensions in the future and to support authentication at the time of reception. The mapping change command, as sent by the end-user, or by some other organisation or device which has the end-user's credentials, would involve the length of the micronet being checked to ensure it is the same as the currently configured length of the micronet which starts Whittle Expires August 21, 2008 [Page 20] Internet-Draft Ivip DB Fast Push February 2008 at that location. The end-user's command might be part of an encrypted exchange involving a challenge-response protocol and the end-user's private key. Alternatively, an encrypted link could be used, such as via HTTPS, and a conventional username and password given as part of the command. The end-user would previously have communicated directly or indirectly with their RUAS to configure their total assigned address space into one or more micronets. This ID concentrates on the changes to existing micronets. The ITR and Query Servers should reject change commands for micronets which overlap previously defined micronets which had a non-zero ETR value. So to the ITR or Query Server, a micronet mapped to zero can be remapped in whole or in part to any address, including zero, or can become part of another encompassing micronet mapped to any address. Micronets which are currently mapped to a non-zero address can only have their mapping changed for the entire micronet. From this it can be seen that the ITRs and Query Servers perform minimal sanity checking on the mapping changes they receive, once they have been authenticated. A considerable level of sanity checking is therefore to be performed in each RUAS - for instance to ensure that micronets are never mapped to an address which is part of any micronet. (In LISP terminology: "the ETR address must be an RLOC".) There may also be additional lists of addresses which all RUASes are prohibited from using as ETR addresses. RUASes and the multiple servers of the Launch system are few in number and will be administered carefully, so this ID does not consider automated aids to their management and debugging. However, the Replicators will be numerous and operated by a wide range of organisations. It is a goal of this proposal to maximise the degree to which this network can be robustly and easily managed, rather than requiring a great deal of manual configuration etc. This goal is discussed addressed in the current ID, but is for future work. In order to debug the way the Ivip system is used, such as transient erroneous or malicious mapping updates which cause packets to be tunnelled to addresses where they are not welcome, there will need to be a system which monitors all mapping changes and keeps a lasting record of them. Then, aggrieved parties can search such a system for the address on which the received the unwanted packets, and so determine the micronet involved. This enables the aggrieved party to complain to the RUAS which is responsible for that micronet. This "mapping history" function could be performed by one or multiple separate systems, each simply taking a feed from the Replicator system. Something like this needs to exist for all map-encap schemes. This is not pursued in greater detail in the current Ivip Whittle Expires August 21, 2008 [Page 21] Internet-Draft Ivip DB Fast Push February 2008 IDs. 5.2. Non-goals Apart from checking the ETR address against any specific exclusion lists (such as specific prefixes, private and multicast space) and to ensure it is not part of a Mapped Address Block (MAB - a BGP advertised prefix containing micronets), the entire Ivip system takes no interest in whether there is a device at that address, whether the address is advertised in BGP, whether there is or was an ETR at that address, whether the ETR is reachable or whether the ETR can deliver packets to the micronet's destination device. These are all matters which fall under the responsibility of the micronet's end user. It is not a goal of the system to keep mapping changes secret from any party. This would be impossible. Therefore, it cannot be a goal of this or probably any map-encap scheme that in a mobile setting, the movement of an individual's device from one network to another could not be inferred by anyone who monitors the mapping updates. Consequently, there are fundamental privacy and security limitations to the use of this new form of address space. End users who want or need to keep their physical location secret will need to make other arrangements than direct reliance on Ivip. Query Servers will issue map replies with a caching time of their own choosing. It is not a goal of the fast push system to allow end- users to affect that caching time. This reduces the amount of data in each update, and enables operators of Query Servers to use their own rules or algorithms to optimise the various costs and benefits of longer or shorter caching times in their own network. The longer the caching time the less often the Query Server will be queried about a particular micronet, but the longer it must send notifications for to any ITR which made such a query. Long caching times may burden the memory of ITRs which handle many micronets, and the proliferation of P2P traffic means that ITRs will often be handling packets addressed to a broadly scattered set of micronets. As part of handling PMTUD and Fragmentation, ITRs may discover that an ETR to which they are attempting to tunnel packets is unreachable. There is no provision in the current Ivip proposal for this to be communicated back to other ITRs or to the RUASes. There could be some benefits to this if it could be done securely and so as not to allow DoS attacks, but in the current proposal, it is the sole responsibility of the end-user to determine that the ETR selected is reachable. This could be achieved quite well by hiring the services of a widely distributed monitoring service, with servers at many Whittle Expires August 21, 2008 [Page 22] Internet-Draft Ivip DB Fast Push February 2008 physical and topological locations in the Net. These servers tunnel packets to the ETR, just as an ITR would, so they are sent to the destination network, where some process reports their arrival to the monitoring system. This could be a good area for IETF engineering work, but is not part of the current proposal. Replicators perform a best-effort copying of mapping update packets. They do not store these packets for any appreciable time or attempt to request a packet in the sequence which is missing from their two or more input streams. 5.3. Challenges There are obvious challenges building a global network which is distributed, to avoid any single point of failure whilst also being highly reliable, coordinated and secure. For this network to propagate information from one of many input points to a very large number (potentially millions) of endpoints, with very low levels of loss, is a further challenge on the open Internet. The Replicator system needs to operate on the open Internet, as do the end-users' methods of interaction with the RUASes, directly or indirectly. However the RUASes, the Launch servers and the level 1 Replicators are probably best connected using private network links. The closest existing technology to what is required may be Reliable Multicast, but this is optimised for long block lengths. This technology should be considered in greater depth as an alternative to what is proposed here, but the rest of this ID is based on the assumption that novel techniques are required. Building a new, moment-to-moment crucial, architectural structure into the Internet is a serious undertaking, and conservative approaches using established techniques have obvious advantages because the component protocols are already implemented and well known. Assuming no such techniques can do the job, it is a challenge to devise some new techniques which RRG members will confidently assess as being capable of robust implementation, without significant risk of the design later being found to have fundamental flaws. Every map-encap scheme faces challenges in convincing first the RRG, then the IESG, that the proposed architecture is necessary, desirable and better than all alternatives. Assuming the proposal is developed to the point of becoming Standards Track RFCs, the proposal needs to be enthusiastically adopted by ISPs and end-users of all sizes. A proposal which relies for its adoption on notions of impending doom if not adopted, or on coercion, cajoling or appeals to benevolence is not going to be widely adopted. The future map-encap scheme needs to Whittle Expires August 21, 2008 [Page 23] Internet-Draft Ivip DB Fast Push February 2008 be very widely adopted in order to solve the immediate problem of routing scaling, and to make a serious contribution towards better utilization of IPv4 address space. Ivip's difficulties in this respect will hopefully be fewer than those of competing schemes, because money can probably be made from the outset not just by renting out micronet space for multihomed end- users of all sizes, but from using the same techniques, plus a global network of TTRs, for the new approach to mobility. Internet history is littered with ambitious protocols and business ventures which never delivered. Ivip, or any other map-encap scheme, will need broad support from ISPs, end-users and RIRs before it can be widely adopted. Hopefully, fast push will be widely regarded as both practical and desirable. Whittle Expires August 21, 2008 [Page 24] Internet-Draft Ivip DB Fast Push February 2008 6. Definition of Terms 6.1. RLOC address space Borrowing LISP's Routing Locator term, RLOC describes any address or range of addresses in which packets are delivered to the destination via conventional BGP routing mechanisms. All BGP advertised address space today is RLOC space. 6.2. Mapped address space Once Ivip is operational, a growing subset of the total space used will be handled by ITRs tunnelling the packets to an ETR, which delivers the packets to the destination. As such, this address space is "mapped" by the Ivip map-encap scheme. Therefore, it can be divided into smaller sections than is possible with BGP (256 granularity for IPv4, due to restrictions on lengths of advertised routes) and each such section can be used via any ETR in the world. 6.3. MAB - Mapped Address Block A MAB is a BGP advertised prefix which is Mapped address space rather than RLOC space. ITRs all over the Net advertise this prefix, tunnelling the packets to ETRs according to the current mapping for the destination address of each packet. A MAB could, in principle, be as large as a /8. Larger MABs are preferred in general, because each one burdens the BGP system with only a single advertisement, but includes the Mapped address space of many end-users. However, for reasons discussed below - including load sharing between ITRs and ease of initially loading snapshots of the mapping database - it may be best if MABs are more typically in the /12 to /17 range. 6.4. UAB - User Address Block Each MAB typically contains address space which has been assigned by some means to many (perhaps tens of thousands) separate end-users. A UAB is a contiguous range of addresses within a MAB which is assigned to one end-user. A MAB could be assigned entirely to one end-user - as might be the case if the end-user converted a prefix of theirs which was previously conventional RLOC space to be managed by the Ivip system. Generally speaking, MABs are ideally large (short prefixes) and each contains space for multiple end-users. An end-user might have multiple UABs in a MAB, but for simplicity is assumed each has a single UAB. UABs are specified by starting address and length - they Whittle Expires August 21, 2008 [Page 25] Internet-Draft Ivip DB Fast Push February 2008 need not be on power of two boundaries. UABs are important constructs for the entities which control the mapping information, but are not seen or used by ITRs or the fast push mapping distribution system. 6.5. Micronet Following Bill Herrin's suggestion, the term "micronet" refers to a range of Mapped address space for which all addresses have the same mapping. In LISP and APT, these are known as EID prefixes. In Ivip, a micronet need not be on binary boundaries - it is specified by a starting address and a length, in units of single IPv4 addresses or IPv6 /64 prefixes. An end-user could use their entire UAB as a single micronet, or they could split it into as many micronets as they wish, and change these divisions dynamically. Any micronet which is mapped to address zero will cause ITRs to drop packets addressed to this micronet. A micronet can be defined within the whole or part of a contiguous range of address space which is currently mapped to zero, by the fast push mapping distribution system carrying an update message specifying the new micronet's starting address, its length, and a non-zero address for its mapping. 6.6. RUAS - Root Update Authorisation System Multiple RUASes collectively generate the total stream of mapping update messages. Each RUAS is responsible for one or more MABs. There may be a dozen to perhaps a hundred RUASes. End-users with Mapped address space have an arrangement either directly with the RUAS which handles the MAB their space is located within, or indirectly through an organisation such as a UAS. 6.7. UAS - Update Authorisation System A UAS is the system of an organisation which accepts mapping change commands from end-users, and conveys them directly - or perhaps indirectly via another UAS - to the RUAS which handles the relevant MAB. An RUAS which accepts mapping update commands from end-users does so via its own UAS system. A UAS accepts upstream input from end-users and/or other UASes. It generates output to downstream RUASes and/or other UASes. One UAS may have relationships with multiple RUASes. A MAB may be assigned to an RUAS and control of parts of this may be delegated to multiple UASes. A single UAS may work only with a single RUAS, or with Whittle Expires August 21, 2008 [Page 26] Internet-Draft Ivip DB Fast Push February 2008 multiple and perhaps all RUASes. Whether the MAB itself is administratively assigned (by an RIR, or some national Internet Registry) to the UAS or to the RUAS is not important in a technical sense. End-users will choose address space according to the RUAS (and any UASes) it depends upon with care, because the reliability of this MAB's address space will forever be dependent on these organisations. The number of RUASes will be limited to enable them to efficiently and reliably work together to create a single stream of updates for the entire Ivip system. The ability of UASes to act as agents for RUASes and/or to have their own MABs which they contract a RUAS to handle the mapping for, enables a large number of organisations to compete in the sale/rent of Mapped address space. 6.8. UMUC - User Mapping Update Command A UMUC is whatever action the end-user performs on one or more different user-interfaces of whatever UAS they use to change the mapping of their one or more micronets. The system would also be able to tell the user the current mapping and also confirm that a requested change to the mapping was acceptable address. For instance, the system would generate an error if the mapping was to a disallowed address - multicast, Mapped address space, private address space or to some other prefixes which the Ivip system does not support the tunnelling of packets. Similarly, and error would be generated if the end-user attempted to change the mapping for some address space outside their UAB, or if they defined a new micronet within that space with non-zero mapping, which overlapped some addresses for which the mapping was currently non-zero. For the sake of discussion, it will be assumed that all UMACs have passed these basic sanity tests at the UAS and are for valid mapping addresses - so a UMAC is a successfully accepted update command from the end-user, or some person or system or with the end-user's credentials. There could be many methods by which this command is communicated, including HTTPS web forms with username and password authentication. Challenge response SSL sessions might be more suitable for automated mapping change systems, such as a multihoming monitoring system which the end-user authorises to control the mapping of some or all of their UAB. In addition to authentication, the command takes the form of the starting address of the micronet, the length of the micronet, and a Whittle Expires August 21, 2008 [Page 27] Internet-Draft Ivip DB Fast Push February 2008 single IP address to which this micronet will have its mapping changed to. 6.9. SUMUC - Signed User Mapping Update Command This is the information contained in a UMUC, signed by the UAS which accepted it from the user (or by some other UAS), being handed down the tree to another UAS or to the RUAS of the tree, so that the recipient UAS/RUAS can verify the signature and regard the UMUC as authoritative. 6.10. MABUS - Update Stream specific to one MAB This is a stream of data by which the real-time updates to the mapping data for any one IMAB are conveyed. For the purposes of discussion, the RUASes and the Launch system are assumed to work in a synchronized fashion, generating a body of updates for each MAB once a second. (Probably the case of no updates will be codified specifically in the update stream, rather than just resulting in no mention of the MAB.) Each RUAS will generate one MABUS for each of its MABs. So each second, the RUASes collectively generate a variable length body of update information for every MAB in the Ivip system. The MABUS consist primarily of mapping updates: micronet starting address, length and mapping address. These are all covered by a common authentication system for this MAB, so that ITRDs and QSDs can verify that the updates are genuine. The MABUS also periodically contains other messages for the ITRDs and QSDs. At present, the only such message is to the effect that at the snapshot of the mapping database for this MAB has been made, and is available with a particular filename from multiple servers The RUASes work together with the Launch system and the Replicator network to deliver every one second body of the MABUS, for every MAB, to every ITRD and QSD in the Net. 6.11. Launch server A small (such as 8) number of widely dispersed Launch servers are operated by the RUASes and work together to generate, every second, multiple identical streams of packets to Replicators in the first level (1) of the Replicator system. The Launch server receives its input in the previous second from the RUASes. Whittle Expires August 21, 2008 [Page 28] Internet-Draft Ivip DB Fast Push February 2008 6.12. Replicator A cross-linked, tree-like, system of Replicators form a redundant, reliable, high-speed distribution system for delivering mapping updates to full database ITRs and Query Servers all over the Net. Each Replicator receives one or more (typically two) streams of update packets from an upstream Replicator or Launch server. These two source streams should come from widely topologically separated sources, ideally over two separate physical links. For instance a Replicator in Berlin might receive its update streams from London and Berlin, two sources in Berlin which are in different ISP networks, or in any combination which minimises the likelihood that both sources will be disrupted by any one fault. The Replicator identifies the packets in each input stream by a simple sequence number in the start of the payload. It expects a particular set of packet numbers, and for each number, the first packet to arrive is replicated to its multiple output streams. In this way, unless the same numbered packet is lost from both input streams, each Replicator receives the full set of mapping update packets for this second, and sends them to tens or perhaps hundreds of downstream devices, which are other Replicators, or full database ITRs and Query Servers. The receive and send links use UDP packets which are encrypted separately for each link, as discussed below. This prevents an attacker from spoofing these packets and so altering the behavior of ITRs. Replicators could be implemented in routers, but are probably best implemented in ordinary software on a GNU-Linux/BSD etc. server. They do not cache information and they don't need hard drive storage. A full database ITRD or Query Server could also operate as a Replicator. 6.13. QSD - Query Server with full Database Like ITRDs, QSDs get a full feed of updates from one or more Replicators. Like ITRDs, when they boot, they download individual snapshot files for each MAB in the Ivip system. This is discussed further in a later section. Query Servers, ITRs and ETRs will be are discussed in greater detail in future Ivip IDs, and are discussed in ivip-arch-01. QSDs respond immediately to queries from nearby caching ITRs and from caching Query Servers - and send notifications to these if mapping Whittle Expires August 21, 2008 [Page 29] Internet-Draft Ivip DB Fast Push February 2008 data changes for a micronet which was the subject of a recent query. QSDs have no routing or traffic handling functions. They need a lot of memory, so the best way to implement a QSD is probably on an ordinary server with one or more gigabit Ethernet interfaces. No hard drive is required, except perhaps for logging purposes. A QSD could be integrated with a Replicator function, and perhaps an ITRD function - or for that matter an ETR function too. 6.14. QSC - Query Server with Cache A QSC could be implemented in a router. It does not route packets, but its memory and computational requirements are likely to be modest compared to those of a QSD. There is no need for a full feed of updates from the Replicator system. However, each QSD must be able to get mapping information from one or more upstream QSDs - or perhaps via QSCs which themselves access upstream QSDs. The easiest way to implement this would be software on a modest server, which would only need a hard drive for logging purposes. 6.15. ITR - Ingress Tunnel Router "ITR" is a general term for a router or server which accepts packets with Destination Address = a Mapped address (that is, an address managed by Ivip, and not delivered directly by conventional BGP routers). The ITR determines the mapping for the micronet which encompasses the destination address, and encapsulates the packet with an outer header, to that address - where it will presumably be decapsulated by an ETR. ITRs need not be located on RLOC addresses. However, it is likely that the larger ITRs will be. ITRs can be on Mapped addresses, but cannot be behind NAT. 6.16. ITRD - Ingress Tunnel Router with full Database An ITRD is an ITR with a full copy of the current mapping database. When it boots, it downloads snapshots and then brings the data up-to- date, and maintains it in this state, with updates received from one - or ideally two or more - Replicators. Consequently, an ITRD is able to tunnel every packet addressed to Mapped address space to the appropriate ETR. ITRDs can be implemented in a suitable router with lots of RAM, CPU power and high capacity dedicated FIB hardware. Lower traffic rates could be handled by a suitably powerful server, without any hardware Whittle Expires August 21, 2008 [Page 30] Internet-Draft Ivip DB Fast Push February 2008 FIB. An ITRD might also implement the Replicator, QSD and/or ETR functions. 6.17. ITRC - Ingress Tunnel Router with Cache An ITR without a full copy of the mapping database - and so not requiring a constant stream of updates from one or more Replicators. The ITRC gains mapping information from a nearby QSD, perhaps via one or more intermediate QSCs. It may buffer every packet it needs to map, but is awaiting mapping information for, until it requests and receives mapping information. Since the QSD is local (within metres, kilometres or at most a few hundred km), the maximum buffering time should be milliseconds or tens of milliseconds. Subsequent packets can be tunnelled immediately. Alternatively, rather than buffering the packet, it may be passed on to where it will enter a full database ITR, or perhaps another ITRC which already has the mapping information for the relevant micronet. Like an ITRD, an ITRC could be implemented in a conventional router with high-speed FIB - assuming the FIB is capable of the tunnelling function - or in a server without any specialised FIB hardware. While an ITRD requires large memory capacity and a constant stream of updates from two or more Replicators, an ITRC requires memory only according to the number of micronets for which it is currently handling traffic. This makes the ITRC function much more practical to implement in "hardware routers", which have generally smaller and more expensive memories than whatever is possible with commonplace PC-like servers. An ITRC might also implement the QSC and/or ETR function. 6.18. ITFH - Ingress Tunneling Function in Host A host which is not behind a NAT could have additional software in its TCP/IP stack to perform the ITRC functions described above. It needs a good link to a nearby QSD/QSC system - so this would not be suitable over a dialup modem or radio link. Host software, CPU power and RAM is generally free of incremental cost in this setting. This would greatly reduce the load on any ITRCs and perhaps ITRDs in the rest of the network. An ITFH function would be desirable in every web server in a hosting company, assuming the servers had sufficient CPU and RAM resources. A host performing NAT functions for some hosts on a private network Whittle Expires August 21, 2008 [Page 31] Internet-Draft Ivip DB Fast Push February 2008 is a good place to implement ITFH, as long as this host is not behind NAT itself. The most common NAT situation is a DSL or cable modem or an optical home/SOHO adaptor. Technically these are routers, but they are inexpensive and purely software based, and therefore might be thought of as "hosts". ITRCs and ITFHs could be overwhelmed by a large number of different micronets inside the caching period, so they need to be able to drop old cached mapping data when their RAM or FIB can't handle it. Then, they need to be in a network position where an upstream ITRD will always find the packets they emit which they cannot encapsulate. With Ivip, this is always the case, depending on how congested the nearest "anycast ITR in the DFZ" is. 6.19. ETR - Egress Tunnel Router An ETR is a router or a server which receives encapsulated packets on one of its one or more RLOC addresses, strips off the outer IP header, copying its hop-count to the internal packet, and then by some means ensures the resulting packet is delivered to the destination host or network. Unlike in other schemes, Ivip ETRs are not involved in reachability testing by ITRs. However ITRs need to do some probing for PMTUD and Fragmentation management purposes. ETRs will also generally need to respond to probing by other systems such as a multihoming management system, which is independent of the Ivip system, and which decides how mapping for a micronet should be changed to ensure continued service via alternative ETRs. 6.20. TTR - Translating Tunnel Router for Mobile-IP A TTR behaves, in part, as an ETR - a device with an RLOC address to which packets are tunnelled so that they will be decapsulated and delivered to the destination host or network, which in this case is a Mobile Node (MN). The MN establishes a two-way tunnel to the TTR from its care-of address, which can be behind NAT. The MN may have such tunnels to other TTRs, including via different edge networks. A TTR is also a means by which the MN can send packet out to the Internet at large. The TTR may simply emit the packets, or may integrate an ITRD or ITRC function within itself. Whittle Expires August 21, 2008 [Page 32] Internet-Draft Ivip DB Fast Push February 2008 7. Update Authorities and User Interfaces We now commence a detailed discussion of the fast push mapping distribution system itself, starting with the systems which accept commands from end-users (or their authorised representatives or systems) and prepare the information for the Launch system. This is the early stage of an ambitious design, so a number of options are contemplated. The final authority to control mapping information is fully devolved to end-users, who by means of a username and password or some other authentication method, are able to issue commands to define micronets within their UAS, and to map each micronet to any ETR. However the physical authority to control the mapping of all Mapped space within a single MAB rests with a single RUAS. That RUAS may be acting for a UAS who is the assignee of the MAB. The RUAS may be the assignee and may delegate control to one or more UASes. The RUAS may have relationships directly to the end-users of this MAB, through its own UAS. Here we discuss the flow of information and trust between these various entities, in real-time, so that every second (for example, the actual time period will need to be carefully considered) each RUAS assembles a body of update information for each of its MABs. In the diagrams below, each RUAS or UAS is depicted as a single entity. Each such entity acts as a single functional block, but will typically be implemented as a redundant system over several servers. 7.1. RUAS Outputs 7.1.1. Updates every second Every second, for each MAB the RUAS is authoritative for, the RUAS generates a set of mapping updates, and works with other RUASes to integrate this into the next second's output from the Launch system. As previously mentioned, these updates are primarily actual mapping updates for individual micronets within the MAB, but also contain occasional messages to the effect that a snapshot of this MAB's full mapping database has been made and is, or soon will be, available via various servers. 7.1.2. MAB snapshots Every few minutes (or some other time period, as chosen by the RUAS, but with some reasonable maximum defined by a BCP) the RUAS makes a Whittle Expires August 21, 2008 [Page 33] Internet-Draft Ivip DB Fast Push February 2008 copy of the complete mapping information for a MAB. Snapshots for each MAB are independent of each other, and so can be done with different frequencies. The snapshot is in a format which needs to be standardized, so it can be downloaded and understood by any ITRD or QSD, now and in the future. This data format needs to be extensible to cover new kinds of mapping information and other functions not yet anticipated - which will be ignored by devices which are not capable of these functions. The exact format for this is for future work, but for instance would begin with some identifying information about the MAB, a block defining that the following data concerns IPv4 micronet mapping information (and snapshot announcements), with the possibility of other blocks containing different kinds of data. Binary format would probably be best, and the file could be gzipped for distribution. Each such file will be given a distinctive name, according to a standardised format, which indicates at least the MAB starting address and length, and the time of the snapshot. The snapshot process will take a second or two to complete from the time it is initiated, and the resulting file will be copied to a number of servers, ideally located in a variety of locations around the Net. Each such server would be run by the RUAS directly, or as part of all RUASes working together. The servers can probably be conventional HTTP servers, so that ITRDs and QSDs can download the snapshots when needed. There is scope for some careful design with DNS so that there is an automatic structure in the domain names of these servers, enabling an expandable system to be automatically used by ITRDs and QSDs without manual configuration. These files will be publicly available, and need to be made available for somewhat longer than the cycle time of snapshots. So with a ten minute snapshot cycle, the previous snapshot should be available for a while - probably 10 minutes or so - after the new one is available. Snapshots are downloaded by ITRDs and QSDs when they boot, and if they suffer a disruption in mapping updates which necessitates a reload of this part of the complete mapping database. To facilitate this, MABs should not be too large - or at least contain so many micronets - as to make individual snapshot files excessively large. At boot time, or when resynching, the ITRD or QSD will monitor the update streams for each MAB until a snapshot announcement is found. Whittle Expires August 21, 2008 [Page 34] Internet-Draft Ivip DB Fast Push February 2008 It will then buffer all subsequent updates and download the snapshot as soon as it is available. Once the snapshot has arrived, and been unpacked to RAM, the buffered updates are applied to it. Then, this MAB's part of the mapping database is up-to-date and the ITR can begin advertising this MAB, and therefore tunnelling all packets which are addressed to this MAB. In order to reduce total path lengths, it would be desirable if an ITRD or QSD in a given location could access a nearby snapshot server. It may be desirable to have every snapshot of ever MAB in a single server, or a single set of servers which are accessed by geographically close ITRDs and QSCs. Anycast is not a good technology for this, since file retrieval is best done via TCP sessions. The ITR system itself can't be used, to avoid circular dependencies - so the servers must be on RLOC addresses. Likewise, any DNS servers involved in this server system need to be strictly on RLOC addresses. Each ITRD or QSD needs to be configured with, or to automatically discover, two or more such servers which are relatively close, so the data can be found despite one server being down. Perhaps these servers could be identified in a carefully structured DNS hierarchy: xxxxx.yyyy.ipv4.ivipservers.net Where xxxxx is one of an extendable list of localities and where yyyy uniquely identifies the RUAS. If snapshots from all RUASes were pooled into a single server, the latter would not be necessary. However, it may be better to let each RUAS run its own network of servers, which may involve a choice to use the same servers in some or many instances as are used by other RUASes. Initially, an RUAS may have a single update server for Australia, and some standardised list of xxxxx locations defines "au" as being the value to be used by any ITRD or QSD which seeks this RUASes server which is closest to Australia. Later, the list could be extended for more specific locations, such as "syd-au", "mel-au" etc. Then, every RUAS would need to generate DNS entries for these as well, and point them to whatever server was appropriate. In the event they had no server in Melbourne, they could make that FQDN resolve to the same IP address as their only Australian server, in Sydney. From the point of view of the ITRD or QSC, seeking an update for a given MAB of a particular RUAS, the address to request the file from could be made up from the RUAS identifier yyyy which is contained in the snapshot announcement (in the stream of mapping updates), Whittle Expires August 21, 2008 [Page 35] Internet-Draft Ivip DB Fast Push February 2008 concatenated with a locally configured "xxxxx" and "ipv4.ivipservers.net". In the event that this server was unavailable one or more locally configured alternatives to this initial "xxxxx" value could be tried - including one or more for nearby countries. The most significant 24 bits of the MAB's starting address (probably 48 bits for IPv6, assuming this is the granularity of BGP advertisements) for would be transformed into a text string such as 150.101.072. A similar transformation of the precise time of the snapshot would result in a second text string, and these would be used to reliably identify the appropriate directory and file in the server. 7.1.3. Missing packet servers The cross-linked tree-structured Launch and Replicator systems should provide a robust method of delivering the complete set of MAB updates every second, to every ITRD and QSD. There may be more subtle and efficient methods than this somewhat brute-force approach, which involves typically a doubling of the amount of update traffic in the pursuit of robustness. However, the rate of updates will only be problematic by current standards at a date so far in the future that the technology of the day will render the task far less daunting that it would now be. In the event that an ITRD or QSD misses one or more packets, it will be able to easily identify which are missing, due to the sequence numbers built into their payloads. This will transform easily into an address to use by which the missing one or more packets can be retrieved, probably via HTTP. Similar arrangements - probably the same servers to those just mentioned - would be used to locate the missing packet and download it. 7.2. Authentication of RUAS-generated data Careful consideration must be given to how ITRDs and ITRCs can quickly and reliably ensure that the information they receive ostensibly from each RUAS is genuine. At this early stage of development, the model is pretty simple. 7.2.1. Snapshot and missing packet files Each RUAS has a key pair and signs the MAB snapshot and missing packet files. ITRDs and ITRCs can verify the signature by reference to certificates signed by some higher authority, or by some alternative arrangements. Whittle Expires August 21, 2008 [Page 36] Internet-Draft Ivip DB Fast Push February 2008 Both these types of files are only handled occasionally, so the overhead in performing crypto operations is insignificant. 7.2.2. Mapping updates This principle does not apply to the update information contained in packets received from the Replicator system. It would be onerous to individually authenticate each packet, or each body of updates from each RUAS contained in potentially multiple packets. Instead, at the current early stage of development, a different model is proposed. No doubt this can be improved upon. The Launch system servers will receive signed information, each second, from all the RUASes. Only when all such servers agree that the information they received is authenticated will any of them send that RUAS's updates to the Replicator network. The first level (1) of the Replicator network involves manually configured, encrypted, links to Launch servers, with each Replicator receiving a full stream of update packets from two or more widely distributed Launch servers. Those links will involve encrypted UDP packets so that each stream can be known to have originated at a specific Launch server. The destination device will establish the encrypted link with the source device. It is proposed that the subsequent levels of Replicators use the same techniques, so that there is implicit trust in the data received from the two (or perhaps more) upstream Replicators. This would be a fragile arrangement with a single upstream source, but since there are two sources, with identical contents, it will be a simple matter in each Replicator to detect a condition in which one stream differs from another. That will not prove which stream is correct, but it would be enough to show that an attacker has gained control of one upstream Replicator - enabling the current Replicator to shut down and so not propagate bogus mapping information. Loss of a single Replicator will generally not affect the reliable delivery of updates, due to the cross-linked nature of the network. However, there remains a chance that an attacker's packet could be replicated all the way to an ITRD or QSD. There, it could cause traffic packets to be tunnelled to the attacker's chosen location. One approach to preventing this is to have each ITRD and QSD authenticate every packet, or multi-packet body of update information, from each RUAS, by each packet carrying a digital signature. This seems expensive, but perhaps it would be practical. Another approach would be to have the Launch system add one or more Whittle Expires August 21, 2008 [Page 37] Internet-Draft Ivip DB Fast Push February 2008 packets to the stream, containing MD5 (or some other hash function) "checksums" of either each packet, or each body of update information from each RUAS. It would be trivial to have a checksum for the entire second's worth of updates, but then a single missing packet would make it impossible to check the rest. The MD5 checksums could be sent twice, for robustness, and some care would be needed in deciding on their granularity. A separate checksum for every packet would be conceptually simple and enable individual packets to be accepted immediately, even if another packet was not received and so required a "missing packet" request. However, this increases the number of MD5 checksums to transmit. The current proposal is to have an MD5 checksum for each MAB for which updates are received, which may be less than a packet, or perhaps more. 7.3. RUAS - UAS interconnection This section depicts a single tree of delegated responsibility for the user control of mapping of one MAB. The Root UAS at the base of the tree is run by Company X - RUAS-X. RUAS-X could be authoritative for other MABs, and each such tree of delegation may have the same set of other UAS systems, or it could be different. Each delegation tree is separate from the delegation trees of other MABs, even if they look similar, because the tree includes specific subsets of the whole MAB address range as one of the defining characteristics of its branches and leaves. The initial action which leads to the database being changed is a user generated (manually or by the user's equipment or by a system authorised by the user) UMUC (User Mapping Update Command). For authorising and feeding UMUCs to the RUAS-X, there is a tree as depicted in Figure 1. Delegation of authority flows up the tree as the total address range of the MAB is split at each branching junction. This tree structure involves data, in the form of SUMUCs (Signed User Mapping Updated Commands) flowing down towards the root of the tree. (Data would also flow up the tree so each user- interface leaf could tell end-users what their current mapping was, could test their requests against constraints etc.) The idea is that RUAS-X could delegate control of one or more subsets of the MAB's total range of addresses to some other system, which in turn could delegate control to other systems. There would be no absolute limit on the height (usually called depth) of these hierarchies. The servers which handle the end-user interaction needs to be one of the leaves of this tree structure, so as not to burden the RUAS-X Whittle Expires August 21, 2008 [Page 38] Internet-Draft Ivip DB Fast Push February 2008 database servers themselves with details of user interaction. This enables various companies to give different kinds of control for the Mapping of the IP addresses their branch of the tree controls. Figure 1 does not show RUAS-X having any user interface servers, but it could. The simplest arrangement would be the RUAS having simply a user-interface server and no tree of other UASes. There would need to be IETF standardised methods by which some server could execute a UMAC with the user-interface servers of any of these UASes. This standardisation would be especially important for multihoming, because some reasonably trusted company could run an automated monitoring system, and have the credentials (username, password, key etc.) stored in their system so their system can change the mapping of one or more micronets the moment one link was detected to be faulty. Also, the company (such as X, Y or Z in Figure 1) which controls a particular range of the Mapped space may offer such a multihoming monitoring system itself. The tree in this example controls an MAB with the address range 20.0.0.0 to 20.3.255.255. In this example, company X has been assigned by an RIR the entire range 20.0.0.0 to 20.3.255.255. Company X sublets to Y a quarter of this: 20.1.0.0 to 20.1.255.255. These divisions are on binary boundaries, but they need not be. It would be just as possible for X to delegate to Y an arbitrary subset of the whole range, or the entire range - or just one IPv4 address or IPv6 /64. X's Root Update Authorisation Server (RUAS) has a private key for signing all the MAB snapshot files it periodically creates and makes available. In this example, company Y delegates control of some of its space to company Z, and Z has an end-user U, who needs to control the mapping of a UAB containing one or more micronets in Z's range. Z has various interfaces by which U can do this, with its own arrangements for authentication, for monitoring a multihoming system and making changes automatically etc. Ideally there might be one or more automated, host-to-server, IETF-standardised protocols so all end users could have standardised software for talking to whichever company's servers they use to control the mapping of their IP address(es). Whittle Expires August 21, 2008 [Page 39] Internet-Draft Ivip DB Fast Push February 2008 User-R User-S User-T User-U Multihoming \ \ | | Monitoring \ \ | | Inc. \ ................. / \----. Web interface .---/ . other protocols . . etc. . ....UAS-Z........ | Other companies | like Y and Z | /-----<----/ | | \ | / | | \|/ | | UAS-Y \ | | \ | /----<-----/ \ | / \|/ RUAS-X Root Update Authorisation Server company X | \ | \ V \->-[ Multiple web servers for MAB snapshot ] | [ and missing packet files. ] | | Other RUASes like RUAS-X, each authoritative | for mapping one or more MABs and producing | regular MAB snapshots and update streams to | which are sent to all ITRDs and Query Servers \ \ | | | / \ | | | / \ | | | / \ | | | / \ | | | / \ | | | | | | | | | V V V V V | | | | | Each line depicts 8 streams of packets with identical payloads - one stream for each of the 8 Launch servers. Figure 1: Delegation tree of UASes above one RUAS. Whittle Expires August 21, 2008 [Page 40] Internet-Draft Ivip DB Fast Push February 2008 When user-U (or a device or system with user-U's credentials) changes the mapping of their micronet via a web interface this is achieved via Z's website, authenticating him-, her- or it-self, by whatever means Z requires. This causes UAS-Z to generate a signed copy of this update command (a SUMUC) and to send it to UAS-Y. The SUMUC consists of three items (assuming IPv4 for simplicity): A starting address for which micronet this update covers, a range (>=1), and a new mapping value (ETR address), which will also be a 32 bit integer. The SUMAC could also consist of a time in the future the update should be executed. UAS-Y trusts this SUMUC because it can authenticate UAS-Z's signature. It strips off the signature and adds its own, before passing the SUMUC down to the next level: RUAS-X. RUAS-X likewise has a copy of UAS-Y's public key and within a fraction of a second of U initiating the UMUC, the master copy of this MAB's database, in RUAS-X is altered accordingly. (This would be a distributed, redundant, database system.) Authority is delegated up the tree, because UAS-Y will only accept update commands if they are signed by one of its branch UASes, and for the particular address range that UAS has been authorised to control. User-U may have given their username and password etc. to Multihoming Monitoring Inc. so this company can monitor their multihoming links and change the mapping as soon as one link goes down. UAS-Z doesn't know or care who actually makes the change - as long as they can authenticate themselves for whatever micronet they want to change the mapping of. Whittle Expires August 21, 2008 [Page 41] Internet-Draft Ivip DB Fast Push February 2008 8. The Launch system In this discussion 8 Launch servers will be assumed. The exact number could be varied over time. Initial introduction could no- doubt be done with a simpler system, but the purpose of this discussion is to explore how a the system could scale to very large numbers of micronets and updates per second. The exact logic of the Launch system remains to be determined. The following is a rough guide to how it might be done. The task of the Launch system is every cycle - in this example every second - to collate the update information from all the RUASes, agree on what has been collected, and then to generate multiple streams of packets containing that information, from multiple locations, to the widely geographically dispersed level 1 Replicators. Links between the Launch servers might best be done via private links to avoid packet flooding attacks. Likewise the links to level 1 Replicators. Each Launch server has a link to every other Launch server, and every RUAS has a link to every Launch server. This may seem rather over- engineered, but the system will be robust in the event of failure of quite a few of these links, and the task at hand is a momentous one, deserving considerable effort to make it fast and reliable. The exact details of how packets are handled, information combined into packets etc. remains for future work. Each Launch server may be a single physical server, with a live backup at the same address, or a redundant cluster of servers which behaves as one. While the Launch servers are sending out the update packets for one second, they are comparing notes about updates to be sent in the next second and collecting updates to be sent in the second after that. Perhaps this one second timing clock will prove to be too ambitious, or the operations may be broken into four phases, rather than three. 8.1. Phase 1 - collecting updates from RUASes In phase 1, all RUASes attempt to send their complete set of updates to every Launch server, where they are buffered in readiness for Phase 2. The Launch server authenticates this information, by standard cryptographic means based on the public key of each RUAS. The contents of each RUAS's updates are then collected, and an MD5 (or some other hash algorithm) checksum (actually a digest) is created for each one. Whittle Expires August 21, 2008 [Page 42] Internet-Draft Ivip DB Fast Push February 2008 8.2. Phase 2 - checksum comparison Each Launch server sends to every other Launch server its record of the checksums of the updates received from each RUAS. This enables each Launch server to identify its state as one of the following: o Normal: no received set of checksums includes updates from more or different RUASes than where received by this RUAS and all the checksums agree with the local values. Therefore, this Launch server has established that it correctly received the complete set of updates. o Missing updates: One (maybe some higher figure) or more received lists contained checksums from an RUAS for which this Launch server did not correctly receive any updates. Therefore, this Launch server has established that it has missed out on updates from one or more RUASes. o Invalid updates: The local checksum value for one or more RUAS sets of updates does not equate to two or more checksums from other Launch servers, which themselves are equal. The Launch server has established that it received an erroneous copy of at least one RUAS's set of updates. Each Launch server now sends a signed message to the other Launch servers, containing the state determined above: Normal, invalid updates or missing updates. Those Launch servers which are in the Normal state count how many others are also in this state. If the number is above some "quorum" constant, say 4 in an 8 server system, then each such Launch server is ready to send the collected updates in phase 3. These Launch servers independently process the same update data into a series of packets, with sequence numbers which can easily be identified by the recipient devices - initially level 1 Replicators. Those packets are stored, ready for transmission in phase 3. Normally, all 8 Launch servers will receive the same information correctly, and so will participate in phase 3. The purpose of this constant is to ensure that there will not be a condition in which only one or two Launch servers participate in phase 3. The idea is that the updates will be launched into the Replicator network robustly, or not at all. With further development work, it should be possible to fine-tune this system to adequately guard against single or multiple points of Whittle Expires August 21, 2008 [Page 43] Internet-Draft Ivip DB Fast Push February 2008 failure, but also to ensure that the system only sends out data when it can send from at least three, or four, or some constant number of Launch servers. Careful analysis will be required to anticipate various failure modes. RUASes monitor the output of the Launch system, and if a particular second's worth of updates are not sent, then the RUAS will send them again soon. This raises some potential ordering difficulties, where one second contains a command to map a micronet to zero, and the next second contains a command to map part of it to some valid address. While these could be combined in the one second, if they were not, and the first second was not sent, then the second second's command would fail in the ITR, because it would be defining a new smaller micronet in part of a micronet which was not at the time mapped to zero. Further work required, but the RUAS can predict the problems which the ITR would have, and generate suitable updates to make the same results occur. The above algorithm will need to be extended so that a flaky RUAS, which only transmits to a few Launch servers, will not cause the quorum test to fail, due for instance to two Launch servers getting its updates, and the rest recognising that they didn't. 8.3. Phase 3 - identical update streams Those Launch servers which have the full set of update data now send the packets they generated, in separate encrypted streams, to level 1 Replicators. It would probably be best if the packets are sent in numeric sequence, with sending times decided to spread the packets over the whole second. Exactly how many level 1 Replicators there are, and how many are driven by each Launch server, will be a matter for further work. The result will be in each cycle that either the full set of updates are sent out, robustly, by all or almost all level 1 Replicators. Even if there is a relatively high packet loss from some or many of these, and some broken links, all, or almost all level 2 Replicators will receive a full set of packets. Whittle Expires August 21, 2008 [Page 44] Internet-Draft Ivip DB Fast Push February 2008 9. Replicators Further work is required to reach a more precise description of how the update information is placed in packets, and signed in such a way that ITRDs and QSDs can be sure they have received the correct information. If we assume that this problem can be solved, then the following description of the functionality of individual Replicators and the way they are arranged will lead to an understanding of how they form a robust, packet amplifying, global network for delivering the output of the Launch system to a million or more ITRDs and QSDs. (See "Figure 2 Tree of UASes above one RUAS".) \ | / } Update information from end-users \ V / } directly or via leaf UAS systems. \ | / \|/ RUAS-X ->--------------[snapshot & missing packet HTTP server 1] /|\ \ / | \ \--[snapshot & missing packet HTTP server 2] / | \ \ / V \ \-- etc. | \ | | 30 individually streams of identical real-time | updates to the 8 Launch servers - for RUAS-X's MABs. | | \ \ | / / Each of the 8 Launch server gets a \ \ V / / stream from every such RUAS. \ \ | / / [Launch server N] The 8 Launch servers have links with each / / | \ \ other, and each second, all, or most of / / V \ \ them, send streams of update packets to a / | | | \ number of level 1 Replicators. For instance | 32 in this example, with each launch server | sending packets to 16 Replicators. | \ \ / Even with packet losses and link failures, \ / most of the 32 level 1 Replicators receive level 1 \ / a complete set of update packets, which [Replicator] they replicate to 16 level 2 Replicators. / / | \ \ / / V \ \ / | | | \ In this example, each Replicator consumes Whittle Expires August 21, 2008 [Page 45] Internet-Draft Ivip DB Fast Push February 2008 | two feeds from the upstream level, and / generates 16 feeds to Replicators in / the level below (numbered one above the \ / current level). So each level involves \ / 8 times the number of Replicators. level 2 \ / [Replicator] These figures might be typical of later / / | \ \ years with a billion micronets, however / / V \ \ in the first five or ten years, with / | | | \ fewer updates, the amplification ratio / | | | \ of each level could be much higher. / | | | \ / | | | \ Replicators are cheap diskless Linux/BSD | | servers with one or two gigabit Ethernet | | links. They would ideally be located on stub connections to transit routers, levels 3 to 6 though the Level 5 and 6 Replicators (32,000 and 128,000 respectively) might \ | \ / be at the border of, or inside, provider \ | \ / larger end-user networks. \ | \ / ITRD QSD ITRDs and QSDs get two or more ideally identical full feeds of updates - so occasional packets missing from one are no problem, since the other stream provides a packet with an identical payload. Figure 2: Multiple levels of Replicators drive hundreds of thousands of ITRDs and QSDs. 9.1. Scaling limits The Replicator system is scalable to any size simply by adding Replicators. Assuming two input streams for each Replicator, N output streams gives an N/2 amplification of stream numbers per level. N could be quite high in the early years of introduction, when the number of micronets and updates is small by comparison with the design target of one to ten billion micronets, with accompanying update rates driven by their use for handheld mobile devices. First, a maximal IPv4 example will be considered. Assume a billion micronets, most of them for single IP addresses. Presumably most of these will be for individual end-users, at home or with mobile devices. The update rate will be relatively low for multihoming the home and office-based micronets, but the update rate for mobile devices could be much higher. Half a billion mobile micronets, each with an update every 3 hours, involves 47k updates a second, on Whittle Expires August 21, 2008 [Page 46] Internet-Draft Ivip DB Fast Push February 2008 average. The raw data of each IPv4 mapping update is about 12 bytes, so adding 50% protocol overhead, this is 846k bytes a second - about 10Mbps on average. Peak data rates would be higher. By the time such large update rates eventuate, Replicators based on commodity PCs will be able to handle such rates, and the bandwidth involved will not seem as frightening as it is today. While a pure pull system can scale effortlessly to any number of micronets, with any rate of change to the mapping, it can't support mobility - which is the only reason there would ever be such large numbers of micronets or updates. Any initially "pure pull" system which could support mobility would require either short caching times and so massive volumes of queries and responses, or would require a "notification" system rivalling the fast push system described here. IPv6 could theoretically involve tens of billions of micronets - and the mapping data would be more voluminous due to the long addresses involved. Still, a system based on principles such as described in this ID would be well placed to be the most scalable solution to the problem. In a system such as this, there needs to be some financial charge for each update - which need not be so high as to deter the majority of end-users. At some point, with extremely large numbers of micronets and updates, the fast push system would become unwieldy, even with the technology of the day. However realistic projections are impossible to make at this stage of development. The question is whether a system such as this is practical and desirable, considering the benefits it provides over a pull and cache, or pull with notify system. A "pull with notify" system on a global scale is likely to be more complex and insecure than a fast push system. Ivip involves a fast push system to some depth in the network, as chosen by operators given all the local conditions, update rates, bandwidth costs, technological capabilities of servers etc. Beyond that, Ivip uses query and cache with notify - but only over short distances where the delay times are short, the path lengths are a small fraction of the distance around the planet, and where costs are low and reliability high, compared to a global query server system. It is difficult to quantify the limits of a system such as this, or the tasks it will need to perform in the future. However, if an architecture such as this seems feasible, its design should be developed further so that more concrete estimates can be made of its short-term cost and worth, and of its long-term potential to scale to Whittle Expires August 21, 2008 [Page 47] Internet-Draft Ivip DB Fast Push February 2008 very large sizes. 9.2. Managing Replicators Replicators should be easy to create and deploy. Any substantial server with the requisite software, in a suitable location, will do the job. However a successful system will require some mechanisms which ensure reliable operation with a minimal amount of configuration and ongoing management. In the current model, each Replicator normally receives feeds from two upstream Replicators, and generates some figure N feeds for downstream devices. Each Replicator should be able to request and quickly gain a replacement feed from another upstream Replicator if one of those it is using becomes unavailable, or unreliable. This requires that Replicators in general be operating below capacity, so that when others in their level fail, they can take up the slack. This needs to be locally configured beforehand, with upstream Replicators of organisations which have agreed to provide the feeds, and with downstream Replicators of organisations who have requested them. It is possible to imagine a sophisticated, distributed, management system for the Replicator network. This could be developed over time, since for initial deployment, considerable manual configuration and less automation would probably be acceptable. Whittle Expires August 21, 2008 [Page 48] Internet-Draft Ivip DB Fast Push February 2008 10. Security Considerations There are many potential security problems with any bold new architectural addition to the Internet. This ID mentions some authentication and security issues and possible solutions to them, but the full consideration of security will occur as the proposal is fleshed out in greater detail. Whittle Expires August 21, 2008 [Page 49] Internet-Draft Ivip DB Fast Push February 2008 11. IANA Considerations [To do as more detail is developed about data formats and communication protocols.] Whittle Expires August 21, 2008 [Page 50] Internet-Draft Ivip DB Fast Push February 2008 12. Informative References [I-D.farinacci-lisp] Farinacci, D., "Locator/ID Separation Protocol (LISP)", draft-farinacci-lisp-05 (work in progress), November 2007. [I-D.fuller-lisp-alt] Farinacci, D., "LISP Alternative Topology (LISP-ALT)", draft-fuller-lisp-alt-01 (work in progress), November 2007. [I-D.irtf-rrg-design-goals] Li, T., "Design Goals for Scalable Internet Routing", draft-irtf-rrg-design-goals-01 (work in progress), July 2007. [I-D.jen-apt] Jen, D., Meisel, M., Massey, D., Wang, L., Zhang, B., and L. Zhang, "APT: A Practical Transit Mapping Service", draft-jen-apt-01 (work in progress), November 2007. [I-D.lear-lisp-nerd] Lear, E., "NERD: A Not-so-novel EID to RLOC Database", draft-lear-lisp-nerd-03 (work in progress), January 2008. [I-D.vogt-rrg-six-one] Vogt, C., "Six/One: A Solution for Routing and Addressing in IPv6", draft-vogt-rrg-six-one-01 (work in progress), November 2007. [I-D.whittle-ivip-arch] Whittle, R., "Ivip (Internet Vastly Improved Plumbing) Architecture", draft-whittle-ivip-arch-01 (work in progress), January 2008. [TRRP] Herrin, W., "TRRP", February 2008. Whittle Expires August 21, 2008 [Page 51] Internet-Draft Ivip DB Fast Push February 2008 Appendix A. Acknowledgements [I-D.whittle-ivip-arch] includes a list of people who have helped in some way with this project. Some have helped a great deal and I thank them all. This is not to say that any of these people necessarily support Ivip as currently described. Whittle Expires August 21, 2008 [Page 52] Internet-Draft Ivip DB Fast Push February 2008 Author's Address Robin Whittle First Principles Email: rw@firstpr.com.au URI: http://www.firstpr.com.au/ip/ivip/ Whittle Expires August 21, 2008 [Page 53] Internet-Draft Ivip DB Fast Push February 2008 Full Copyright Statement Copyright (C) The IETF Trust (2008). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Intellectual Property The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Acknowledgment Funding for the RFC Editor function is provided by the IETF Administrative Support Activity (IASA). Whittle Expires August 21, 2008 [Page 54]