INTERNET DRAFT S. Bandyopadhyay draft-shyam-real-ip-framework-54.txt March 26, 2019 Intended status: Experimental Expires: September 26, 2019 An Architectural Framework of the Internet for the Real IP World draft-shyam-real-ip-framework-54.txt Abstract This document tries to propose an architectural framework of the internet in the real IP world. It describes how a three-tier mesh structured hierarchy can be established in a large address space based on fragmenting it into some regions and some sub regions inside each of them. It shows how to make a transition from private IP to real IP without making significant changes with the existing network. It introduces VLSM tree routing protocol. It introduces another protocol for host identification with provider independent address. With the useful works done through IPv6, it provides all necessary inputs based on which a specification of IP with 64 bit address space may be emerged. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on September 26, 2019. Copyright Notice Copyright (c) 2019 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of Bandyopadhyay Expires September 26, 2019 [Page 1] Internet Draft Real IP Framework March 26, 2019 publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Table of Contents 1. Introduction.....................................................2 2. Background.......................................................3 3. A Three tier mesh structured hierarchical network................4 3.1. Route propagation...........................................5 3.2. Determination of prefix lengths.............................8 3.2.1. A pseudo optimal distribution of prefixes in a 64 bit architecture................................9 3.2.2. Whether to go for a two tier or three tier hierarchy ....................................................11 3.3. Issues related to Satellite communications.................11 4. VLSM tree routing protocol......................................12 4.1. Setting default route inside VLSM tree.....................12 4.2. Router address space 4.3. Network management and support of explicit route option.... 4.3.1. VLSM tree routing protocol messages................. 4.4. IP VPN with MPLS inside VLSM tree..........................14 4.4.1. Extension to RSVP-TE to support IP VPN inside VLSM tree.......................14 5. Provider Independent addressing, name services and multihoming..16 5.1. PI address Resolution......................................18 5.1.1. Record Format.......................................21 5.1.2. Messages............................................23 5.1.3. Master file and data file...........................25 5.1.4. Zone maintenance and transfers......................26 6. Issues related to IP mobility...................................27 6.1. Changes expected with the specifications related to IP mobility.............................................29 7. Refinements over existing IPv6 specification....................30 8. Distributed processing and Multicasting.........................33 8. Transition to real IP from private IP...........................33 10. IANA Consideration.............................................34 11. Security Consideration.........................................34 12. Acknowledgments................................................34 13. Normative References...........................................35 14. Informative References.........................................36 15. Author's Address...............................................36 Bandyopadhyay Expires September 26, 2019 [Page 2] Internet Draft Real IP Framework March 26, 2019 1. Introduction Transition from IPv4 to IPv6 is in the process. Work has been done to upgrade individual nodes (workstations) from IPv4 to IPv6. Also, there are established documents to make routers/switches to work to support IPv4 as well as IPv6 packets simultaneously in order to make the transition possible [1]. CIDR[2] based hierarchical architecture in the existing 32-bit system is supposed to be continued in IPv6 too with a large address space. There are documents/concerns over BGP table entries to become too large in the existing system [3]. There are proposals to upgrade Autonomous System number to 32-bit from 16-bit to support the demand at the same time [4]. The challenge relies on how to make the transition smooth from IPv4 to a real IP world with least changes possible. The term "real IP environment" is referred to an environment where hosts in a customer network will possess globally unique IP addresses and communicate with the rest of the world without the help of NAT[5]. This document reflects changes required with the BSD 4.4 source code where ever applicable. 2. Background Existing system is in work with Autonomous System (AS) and inter-AS layer with the approach of CIDR. In order to meet the need within the 32-bit address space, Autonomous Systems of various sizes maintain CIDR based hierarchical architecture. With the help of NAT [5], a stub network can maintain an user ID space as large as a class A network and can meet its useful need to communicate with the rest of the world with very few real IP addresses. With the combination of CIDR and NAT applied in the entire space, most of the part of 32-bit address space gets effectively used as network ID. With traditional CIDR based hierarchy, a node of higher prefix can be divided into number of nodes with lower prefixes. Each divided node can further be subdivided with nodes of further lower prefixes. This process can be continued till no further division is possible. The point worth noting is at each point the designer of the network has to preconceive the future expansion of the network with the concept in the mind that the resource can not be exhausted at any point of time. This phenomenon leads the designer to allocate resources much higher than whatever is needed which leads to a space of unused address space. The problem gets aggravated once resource gets exhausted by any chance. e.g. a node of prefix /16 can be divided with a number of nodes of prefixes /24. If any one of the nodes /24 gets exhausted, resources of other nodes of prefixes /24 can not be used even if they are available. Bandyopadhyay Expires September 26, 2019 [Page 3] Internet Draft Real IP Framework March 26, 2019 In IPv4 environment, there is a desperate attempt of the service providers to provide internet services with the help of NAT. e.g. a large educational institute meets its current requirement with 4 real IP addresses; one for its mail server, one for its web server, one for its ftp server and another one for its proxy server to provide web based services to all of its users. In general, these services are used by an organization of any size(it may be 400 or even 40000). In the current scenario, the CIDR based tree has been built using these components together. When private IP will be replaced with real IP, each customer network will require IP addresses based on its size and requirement. Transitioning from private IP to real IP basically requires the following components: o A solution for site multihoming with provider assigned address space o A strategy to replace private IP to real IP o A solution to uniquely identify a host in a real IP environment o A solution to make individual nodes and routers/switches to work with IPv4 and next generation IP simultaneously. Solution for site multihoming has been provided in a separate document [8]. Section 8 shows how to make a transition from private IP space to real IP space with provider assigned addresses with CIDR based approach itself without reorganization of the existing provider network. Section 4 provides a solution for identifying a host uniquely with a number in a real IP environment. RFC 4213 [1] has already described the transition mechanism from IPv4 to IPv6 for individual nodes and routers. Transitioning to real IP will eliminate the extra routing entries associated with multihomed sites and thus will reduce the size of the BGP table substantially. Assignment of addresses requires an architectural framework. It may continue with the existing CIDR based architecture (provided transitioning to real IP will be good enough to handle all routing related issues for ever) or may come out with a different approach. Mesh structured hierarchy will reduce the growth of routing entries in a CIDR based environment as well as convenient for distribution of network resources in a suitable manner in the long run. This document also tries to resolve and enhance several issues that were carried on as part of deployment of IPv6. It shows that a 64 bit address space is good enough for all practical purposes. With the useful works done through IPv6, it provides all necessary inputs based on which a specification of IP with 64 bit address space may be emerged. Bandyopadhyay Expires September 26, 2019 [Page 4] Internet Draft Real IP Framework March 26, 2019 3. A Three-tier mesh structured hierarchical network As Autonomous Systems of various sizes are supported, Autonomous Systems and the nodes inside the Autonomous Systems can be viewed as graphically lying on the same plane within the address apace. If network can be viewed as lying on different planes, routing issues can be made simpler. If network is designed with a fixed length of prefix for the Autonomous System everywhere, routing information for the rest will get confined with the other part of the network prefix. Which means the maximum size of AS gets assigned to all irrespective of their actual sizes. This can be made possible with the advantage of using a large address space and dividing it into number of regions of fixed sizes inside it. Thus entire network can be viewed as a network of inter-AS layer nodes. Each node in the inter-AS layer can act either only as a router in the inter-AS layer or as a router in the inter-AS layer with an Autonomous System attached to it with a single point of attachment or as an Autonomous System with multiple Autonomous System border routers (ASBR) appearing like a mesh. Thus two tier mesh structured hierarchy gets established between AS layer and inter-AS layer with each AS having a fixed length of prefix. Based on the definition of Autonomous System, it is a small area within the entire network that maintains its own independent identity that communicates with the rest of the world through some specific border routers. In the similar manner, if a larger area (say region or state) can be considered as network of Autonomous Systems, that can maintain its own identity by communicating with the rest of the world through some border routers (say, state border router), mesh structured hierarchy can be established within the inter-AS layer. The inter-AS layer will be split into inter-AS-top and inter-AS- bottom. To maintain this hierarchy, each node of inter-AS-top needs to have multiple regional or state border routers (say, SBR) through which each one will communicate with the rest of the world in the similar manner an Autonomous System maintains ASBR. Thus, entire network will appear as a network of nodes of inter-AS-top layer. To maintain hierarchy, each node of the inter-AS-top needs to have a fixed length of prefix. i.e. each node of the inter-AS top will be assigned a maximum (fixed) number of nodes of Autonomous Systems. Thus, with three-tier mesh structured hierarchy in the network layer, network ID can be viewed as A.B.C. If pA, pB and pC be the prefix lengths of inter-AS-top, inter-AS-bottom and AS layers respectively, there will be 2^pA nodes at the topmost layer, 2^pB at the inter-AS- bottom layer and 2^pC nodes at the AS layer. Thus the entire space gets divided into a fixed number of regions and each region gets divided into fixed number of sub regions. This division is supposed to be made based on geography, population density and their demands and related factors. Bandyopadhyay Expires September 26, 2019 [Page 5] Internet Draft Real IP Framework March 26, 2019 Let nMaxInterASTopNodes be the possible maximum number of nodes assigned at the top most layer and nMaxInterASBottomNodes be that at the inter-AS-bottom layer and nMaxASNodes at the AS layer. Where nMaxInterASTopNodes <= 2^pA and nMaxInterASBottomNodes <= 2^pB and nMaxASNodes <= 2^pC. 3.1. Route propagation With hierarchy established, routing information that gets established inside a node of inter-AS-top, does not need to be propagated to another node of inter-AS-top. Entire routing information of inter-AS- top layer needs to be propagated to inter-AS-bottom layer. So, each router of inter-AS layer will have two tables of information, one for the inter-AS-top and another for the inter-AS-bottom of the inter-AS- top node that it belongs to. BGP (with little modification) will work very well with a trick applied at the SBRs. Each SBR will not propagate the routing information of inter-AS-bottom layer of its domain to another SBR of neighboring domain. i.e. SBR of one top layer node will propagate routing information only of inter-AS-top layer to SBR of another top layer node. Inside a node of inter-AS- top, routing information of inter-AS-top and inter-AS-bottom need to be propagated from one ASBR to another neighboring ASBR. Inside a top layer node A, routing information of another top layer node B will have two parts; one for the list of SBRs through which a packet will traverse from top layer node A to B and another for the list of ASBRs through which the packet will traverse from one AS to another inside A. In terms of BGP, AS_PATH attribute will be split into two parts; one for the information of the top layer and another for the bottom layer. Within the same node A routing information of one AS to another AS will not have any top layer information. i.e. the top layer information will be set to as NULL. Similarly, each node of the AS layer will have three tables of routing entries. One for the inter-AS-top, one for the inter-AS- bottom and another for the routing information inside the Autonomous System itself. Introduction of hierarchy at the inter-AS layer reduces the size of the routing table substantially. With the availability of hardware resources if flat address space is maintained at each layer, problems related to CIDR can be avoided. With flat address space, no hierarchical relationship needs to be established between any two nodes in the same layer. So, all the nodes inside each layer can be used till they get exhausted. With flat address space (i.e. without prefix reduction), BGP tables will have maximum nMaxInterASTopNodes + nMaxInterASBottomNodes entries. IGP like OSPF has got provision to divide AS into smaller areas. OSPF Bandyopadhyay Expires September 26, 2019 [Page 6] Internet Draft Real IP Framework March 26, 2019 hides the topology of an area from the rest of the Autonomous System. This information hiding enables a significant reduction in routing traffic. With the support of subnetting, OSPF attaches an IP address mask to indicate a range of IP addresses being described by that particular route. With this approach it reduces the size of the routing traffic instead of describing all the nodes inside it, but introduces another level of hierarchy. If subnetting concept can be avoided from the AS layer(with the additional overhead of computation inside the SPF tree), each area can be configured from a free pool of addresses based on its requirement dynamically. So, an AS can be divided into number of areas of heterogeneous sizes with the nodes from a free pool of address space. Similarly, the concept of area can be introduced in the inter-AS- bottom layer the way it works in OSPF. The area border routers in the inter-AS-bottom layer have to behave exactly in the similar manner the way an ABR behaves in OSPF. i.e. an area border router will hide the topology inside an area to the rest of the world and will distribute the collected information inside the area to the rest. It will distribute the collected routing information from outside to the nodes inside as well. In order to implement this, protocol running in the inter-AS layer (say BGP) will have to introduce a 'cost' factor. This cost factor can be interpreted as the cost of propagation of a packet from one AS to another. The protocols running inside AS layer (RIP/OSPF, etc) will have to the supply the cost information for a packet to travel from one ASBR to another. All the protocols must behave in unison for supplying this information. The cost factor is needed for a remote node while sending a packet to a node inside an area while more than one area border routers are equidistant from that remote node. Thus inter-AS-bottom layer (i.e. one inter-AS-top level node) can be divided into number of areas of heterogeneous sizes with nodes of AS from a free pool of address space. BGP adopts a technique called route aggregation. Along with route aggregation it reduces routing information within a message. In the similar manner, introduction of area inside inter-AS-bottom layer will not only reduce the complexity of the protocol, but will reduce the size of a BGP packet substantially. With this architecture, each node(router) inside an AS is represented as A.B.C. Each node may or may not be attached with a network which acts as a leaf node (i.e. a network will not act as a transit). In order to make use of user-id space properly and to support customer networks of heterogeneous sizes, the user-ID space needs to be divided as subnet-ID and user-ID. Profoundly, a VLSM (variable length subnet mask) type of approach (in the form of a tree) has to be adopted at each node of an AS. So, each node of the AS layer will act as the root of a tree whose leaves are independent small customer networks which will act as stub. As the routing information of inter- Bandyopadhyay Expires September 26, 2019 [Page 7] Internet Draft Real IP Framework March 26, 2019 AS layer as well as AS layer need not be passed inside any node of the VLSM tree, each router inside the tree should maintain default route for any address outside of its network/domain. With this approach, load on each router of the service providers will become negligible. Protocols that supports VLSM with MPLS/VPN has to be implemented inside the tree. Inside the VLSM tree, all the physical ports of a switch have to be configured with the subnet mask. A light weight routing protocol can be developed on top of static routing table by setting default route inside VLSM tree. The fundamental assumptions based on which this architecture lies can be summarized as follows: i) Entire network can be viewed as a network of regions or states where each region or state can have its own identity by communicating with the rest of the world through some state border routers. Each region or state is a network of Autonomous Systems. Each region as well as each Autonomous System inside them will have a fixed (maximum) length of prefix. ii) Availability of hardware resources is such that flat address space can be maintained at the inter-AS layer. Introduction of mesh-structured hierarchy will have several advantages: o Load at each router will get reduced substantially. o Concept of CIDR style approach and complexity related to prefix reduction can be easily avoided. o Mesh structured hierarchy will make traffic evenly distributed. o Physical cable connection can be optimized. o Administrative issues will become easier. 3.2. Determination of prefix lengths With this architecture, IP address can be described as A.B.C.D where the D part represents the user id. Each router in the inter-AS layer will have two tables of information, one for the inter-AS-top and another for the inter-AS-bottom of the inter-AS-top node that it belongs to. Whereas, each node of the AS layer will have three tables of routing entries; one for the inter-AS-top, one for the inter-AS- bottom and another for the routing information inside the Autonomous System itself. In the worst case. a node inside an AS needs to maintain nMaxInterASTopNodes + nMaxInterASBottomNodes + nMaxASNodes entries in its routing table. The dynamic nature of allocating an area from a free pool of address space is more frequent at the AS layer than at the inter-AS-bottom Bandyopadhyay Expires September 26, 2019 [Page 8] Internet Draft Real IP Framework March 26, 2019 layer. As OSPF supports all the features needed, it can be considered as default choice in the AS layer. Existing implementation of OSPF (Version 2) supports subnetting, by which an entire area can be represented as a combination of network address and subnet mask. With this approach, entire routing table gets reduced substantially. With the removal of subnetting, all the nodes inside an area will have an entry inside the routing table (OSPF Version 1). So the deterministic factor is what is the maximum number of nodes inside an AS OSPF can support once subnetting support gets removed. So the prefix length of AS layer will be determined by this factor of OSPF. With the introduction of hierarchy in the inter-AS layer, number of entries in the BGP routing table will get reduced substantially. Even if pA and pB both are selected as 16, number of routing entries come within the admissible range of existing BGP protocol. But, it is the responsibility of IANA to come out with a scheme how nMaxInterASTopNodes and nMaxInterASBottomNodes are to be selected. Each top level node will have nMaxInterASBottomNodes nodes. It will be a waste of address space if each country gets assigned a top level nodes (e.g. china has got a population of 1,306,313,800 people where as Vatican City has got only 920 according to a census of 2006). So a moderate value of nMaxInterASBottomNodes is desirable, with which larger countries will have a number of top level nodes. e.g. each state of USA can be assigned a top level node. With the introduction of area in the inter-AS-bottom layer, each top level node can be divided into number of areas of heterogeneous sizes. So, a group of neighboring countries with less population can share the address space of a top level node. Similarly, user-id space has to be decided based on the largest area VLSM tree should be spanned through. All these issues are completely geo political and have to be decided by IANA. 3.2.1. A pseudo optimal distribution of prefixes in a 64 bit architecture In order to have optimal use of cable connections, length of the VLSM tree is expected to be as short as possible. Also any single organization may prefer to have its user id space to be under the same network id. So, a 16 bit user-id may become insufficient for places like large university campus, where as 32 bit will become too large. Hence, 24 bit user-id will be a moderate one which is the class A address space in IPv4 (also used as the space for private IP). As published in 1998 [6], OSPF can support an area with 1600 routers and 30K external LSAs. So, 11 bits are needed to support this space. With the assumption that OSPF can support much more address space with the advancement of hardware technology as well as to keep the space open for future expansions, 12 bits are assigned for the AS layer. 16 bits are assigned for the inter-AS-bottom layer. So, if on Bandyopadhyay Expires September 26, 2019 [Page 9] Internet Draft Real IP Framework March 26, 2019 the average, 16 bit equivalent space gets used within the user-id space (i.e. one out of 256) and 8 bit equivalent nodes gets used inside an AS (16% of 1600), for a top level node (with 16 bit equivalent AS nodes), it will generate 2^40 IP addresses, which will give 8629 IP addresses per person in Japan (with a population of 127417200; Japan is at the 10th position from the top in the population list of the world). So, even if all the countries with population less than or equal to Japan are assigned a top level node and all the provinces/states of countries with larger population are assigned a top level node each, total number of nodes will come well under 1024. If a number of neighboring countries with lesser population shares a top level node, total number of top level nodes will come down further. This suggests that 62 bit equivalent (10(pA)+16(pB)+12(pC)+24(user-id)) space will be good enough for unicast addresses. This distribution expects OSPF to support 65K (64K+1K) external LSAs. Distribution of address space will be finalized based on the consultation with IANA. Primarily, they may appear to be as follows: 64 bit address space may be divided into two 63 bits blocks: i. Global unicast addresses with the most significant bit set to 0. This space is equally divided between provider assigned (PA) address space and provider independent (PI) address space. a) Provider assigned address space with prefix 00. b) Provider independent (PI) address space with prefix 01. Provider independent address space will be used for the customers who would like to retain their number even after changing their providers. As routing will be based on PA addresses, each PI address will be associated to at least one PA address. Most significant part of PI addressing is, it is independent of the architectural framework of the provider network; even if the architectural framework changes, same format of PI addressing can be maintained. Once implemented, PI address of a node will be the number that will be generally used by the common people. Section 4 describes issues related to PI addressing in detail. ii. Address space with the MSB set to 1 will be distributed within the rest. Each of them will have a fixed prefix. This distribution will be based on the requirements and the work that have already been done in connection to IPv6: a) Address space for multicasting with a prefix set to 1001. b) Address space for link-local address: Link local addresses will Bandyopadhyay Expires September 26, 2019 [Page 10] Internet Draft Real IP Framework March 26, 2019 have a prefix 1010. c) Router address space: This space will be used by the routers and will have a prefix 111. d) Address space for private IP: Each customer network can maintain private address space to communicate within its users. This space will be distributed within all the customer sites of a corporate that can maintain VPN services. A 32 bit address space should be good enough for private IP. Private address space will have a 32 bit prefix with leading 4 bits are set to 1100 and the rest are set to 1. Rest of the address space has been kept for future use. 3.2.2. Whether to go for a two-tier or three-tier hierarchy Establishment of hierarchy in the inter-AS layer reduces the size of BGP entries to a great extent, but leads to an improper use of address space due to geo-political reason. If hierarchy in the inter- AS space gets removed, entire 26 bit (10+16) space will be available for a single layer and use of inter-AS space will be true to its sense, but will increase external LSA (and/or number of entries in the BGP table) dramatically. So, it depends on to what extent OSPF can support external LSAs. BGP expects the packet length to be limited to 4096 bytes. BGP manages to make it work with this limitation with the concept of prefix reduction in the CIDR based environment. As the number of inter-AS nodes increases, BGP has to change this limit in order to make it work in flat address space. The alternate will be to divide the inter-AS space into number of areas as defined in section 2.1. The area border routers will advertise the aggregated information to the rest of the world. BGP may have to incorporate both the options at the same time. As the number of nodes in the inter-AS layer increases, in order to reduce the number of entries in the routing table, inter-AS space has to be split into two separate planes. So, two-tier hierarchy can be considered as an interim state to go for three-tier hierarchy. If it so happen that current available data is good enough to support the present need, it will be worth to look for to what extent it can support in the future. Assignment of inter-AS nodes in two-tier hierarchy should be based on the geographical distribution as if it is part of three-tier hierarchy. Otherwise, introduction of three-tier hierarchy in the future will become another difficult task to go through. Based on the report of year 2011, BGP supports ~400,000 entries in the routing table. With this growing trend, BGP may have to change the limit of packet length even in a CIDR based environment. With the introduction of two-tier hierarchy, number of entries in the routing table will come down drastically and with the three-tier approach, it will come down further. Bandyopadhyay Expires September 26, 2019 [Page 11] Internet Draft Real IP Framework March 26, 2019 3.3. Issues related to Satellite communications Establishment of hierarchy in the inter-AS layer expects the only way any two autonomous systems in two different top level nodes communicate is through their SBRs. If two autonomous systems inside the same top level node communicate through satellite, it will be considered as a direct link between them. Whenever autonomous system 'ASa' of top level node 'A' communicates with autonomous system 'ASb' of top level node 'B' through satellite, they have to go through their state border routers. i.e. satellite port inside 'A' that communicates with a satellite port inside 'B' will be considered as state border router. If multiple such ports exists inside node 'A', all of them will be equidistant from any port inside 'B'. Which expects any satellite port inside 'B' to have prior knowledge of list of autonomous systems that will be under the purview of any port inside 'A'. So, all the satellite ports of 'A' have to exchange such group of information with all the satellite ports of 'B' and vice versa. These group of autonomous systems can be considered as a cluster of autonomous systems inside an area of a top level node. If number of such ports is small, some heuristics can be applied while assigning AS numbers in order to reduce the processing time during the circuit establishment phase. It will become difficult to maintain such heuristics once the number of such ports becomes large. So, in case of satellite communication, the advantage of establishing hierarchy inside inter-AS layer diminishes as the number of satellite ports increases. If any private corporate maintains its own satellite channel to communicate between its offices at distant locations, all of these offices are going to be considered as under the user-id space of its network. Service providers that provide satellite services to the end-site customers, can operate in the usual manner as they will provide connection to customer networks which will act as stub. 4. VLSM tree routing protocol This section describes a light weight routing protocol applicable inside VLSM tree. It is based on setting default route inside VLSM tree. Inside a VLSM tree, all the physical ports of a switch have to be configured with their associated domain (i.e. NetAddress/NetMask). Routing table will contain static routes based on the entries configured on these ports. 4.1 Setting default route inside VLSM tree Section 3.1 describes that there is no need to pass down the routing information of the external world inside VLSM tree that acts as a stub. Inside a VLSM tree, a node of higher prefix can be divided into number of nodes with lower prefixes. Each divided node can further be Bandyopadhyay Expires September 26, 2019 [Page 12] Internet Draft Real IP Framework March 26, 2019 subdivided with nodes of further lower prefixes. This process can be continued as long as it is desired or no more division is further possible. Following figure shows a typical arrangement of VLSM tree of a service provider's network with IPv4 address space. Switch SW-A is connected to the outside world and maintains global routing table. It acts as the root of a VLSM tree that acts as a stub. It has been assigned an address block 11.1.16.0/20 which is distributed among its four children SW-B, SW-C, SW-D and SW-E with the approach of VLSM. Switch SW-B further divides its address space between switches SW-F and SW-G. Switch SW-F assigns an address block 11.1.16.0/24 to customer network CN-A. Switch SW-G assigns address block 11.1.20.0/24 and 11.1.21.0/24 to two customers CN-B and CN-C; where as switch SW-E assigns address block 11.1.30.0/24 to customer network CN-D. Routing inside the tree takes place with the following principle. Inside the tree, if a node (switch/router) that is assigned a domain (NetAddr/NetMask) receives a packet which is destined to somewhere outside of its domain, needs to forward the packet to its parent in the hierarchy. +--------------+ | SW-A | | 11.1.16.0/20 | +-+-+------+-+-+ | | | | +---------------+ | | +----------------+ | | | | +------+-----+ +---------+--+ +-+----------+ +-----+------+ | SW-B | | SW-C | | SW-D | | SW-E | |11.1.16.0/21| |11.1.24.0/22| |11.1.28.0/23| |11.1.30.0/23| +---+----+---+ +------------+ +------------+ +--+---------+ | | | | +-------+ | | | +--+--+ +-------+----+ +----+-------+ |CN-D | | SW-F | | SW-G | +-----+ |11.1.16.0/22| |11.1.20.0/22| 11.1.30.0/24 +--+---------+ +--+------+--+ | | | | | | +--+--+ +--+--+ +-+---+ |CN-A | |CN-B | |CN-C | +-----+ +-----+ +-----+ 11.1.16.0/24 11.1.20.0/24 11.1.21.0/24 Bandyopadhyay Expires September 26, 2019 [Page 13] Internet Draft Real IP Framework March 26, 2019 If a host in CN-A wants to send a packet to an address 11.1.21.116, CE router of CN-A forwards it to SW-F. SW-F finds the destination address of the packet to be outside of its domain and forwards the packet to its parent SW-B. SW-B finds that a port that has been configured with the matching destination address and forwards it to its child SW-G. Switch SW-G sends the packet to customer network CN- B. If a host in CN-B wants to send a packet to 11.1.17.120, CE router of CN-B forwards the packet to SW-G. SW-G finds the destination address of the packet to be outside of its domain and forwards the packet to its parent SW-B. SW-B finds that a port that has been configured with the matching destination address and forwards the packet to its child SW-F. SW-F finds the destination address to be within its domain, but no port has been configured with the matching destination address and generates ICMP UNREACHABLE. If a host in CN-C wants to send a packet to 16.2.22.116, CE router of CN-C forwards the packet to SW-G. SW-G finds the destination address of the packet to be outside its domain and forwards the packet to SW- B. SW-B forwards the packet to its parent SW-A. SW-A find the destination address of the packet to be outside its domain and consults with the global forwarding table and forwards the packet through the right port. 4.2. Router address space Section 2.2.7 of RFC 1812 states, "a router that has unnumbered point to point lines also has a special IP address, called a router-id in this memo. The router-id is one of the router's IP addresses (a router is required to have at least one IP address). This router-id is used as if it is the IP address of all unnumbered interfaces." A router-id is selected based on the domain (NetAddress/NetMask) that it is associated with. The prefix of the domain gets embedded with in the router-id. The least significant bits of the router-id will contain the prefix. For a prefix of 'n' bits in a 32 bits address space there will be 32-'n' bits at the beginning of the address. It starts with the prefix "111" (see section 3.2.1), followed by set of '1' bits and ends with a '0' bit. Therefore to get the prefix of the domain, router-id needs to be tracked from the MSB towards LSB till it encounters a '0' bit. The rest of the bits till the end is the prefix. So, it expects prefix to be at most (32-4) i.e. 28 bits (4=first three bits as "111" followed by '0'). So, minimum length of a domain that a router can assign is 16. With this approach, locators (i.e routers) and identifiers can be routed based on the same routing table. This can be defined as association between locators and Bandyopadhyay Expires September 26, 2019 [Page 14] Internet Draft Real IP Framework March 26, 2019 identifiers. 4.3. Network management and support of explicit route option Section 4.1. has shown how routing is achieved using static route table based on the ports configured with their associated domain. Standard routing protocols usually advertise networks based on which routing table is constructed. There is no such need here. When a router tries to establish a circuit with another, it may contact a PCE to get the best possible route within a set of routes. On getting the best possible path, it sets the circuit using explicit route option. As there is only one path between any two nodes inside a tree, setting explicit route option does not make any sense to communicate between any two nodes within the same tree. It may be required to communicate a node in one VLSM tree to a node in another VLSM tree. To support this feature, root of a VLSM tree needs to maintain an image of the entire tree. A PCE can get this image by contacting the root of the tree. A network management system software also can get the status of the entire tree by communicating with the root of the tree. It is possible to construct this tree with approach of network management by means of pooling and generation of trap on network failure. This section shows how it is done with the approach of routing protocol. It adopts "Hello protocol" and authentication mechanism of OSPF protocol leaving behind the SPF part and introducing new message types relevant to VLSM tree. The router at the root constructs the tree the way it appears in the figure above. Whenever a router adds a node (it may be a customer network or another router) as a child, it sends an "Add Node" message to its parent. The message gets propagated to the root on hop-by-hop basis. On getting an "Add Node" message, root traces the tree and identifies the node with "Router ID" as specified in the message and adds a node underneath. Similarly whenever a node gets deleted, "Delete Node" message gets propagated to the root. On getting "Delete Node" message, root deletes the entire sub-tree under that node in the tree. Whenever a link goes down, "Link Down" message gets propagated to the root. On receiving "Link Down" message, root marks the link status as not active. Whenever a link comes up, on receiving "Link Up" message, root sends "Get Subtree" message to the router that has sent "Link Up" message and sets the link status as active. This is to get the up-to-date status of the Subtree whose link went down. "Get Subtree" message sends the database of the entire Subtree in a recursively manner. Root can also send "Get Subtree" message to all of its child to build the entire tree at the time of transition from old protocol to new or whenever required. Bandyopadhyay Expires September 26, 2019 [Page 15] Internet Draft Real IP Framework March 26, 2019 4.3.1. VLSM tree routing protocol messages It maintains same message format of OSPF protocol such that existing source code can be directly ported. This section describes new message types along with Hello message of OSPF. Please follow section A.3.1 of OSPF specification [19] for OSPF message format. Every message starts with a standard 24 byte header. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Version # | Type | Packet length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Router ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Area ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Checksum | AuType | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Authentication | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Authentication | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Version # The version number. This specification documents version 1 of the protocol. Type The message types are as follows. Type Description ________________________________ 1 Hello 2 Add Node 3 Delete Node 4 Link Down 5 Link Up 6 Get Subtree 7 Acknowledgment Packet length The length of the protocol packet in bytes. This length includes the standard header. Router ID The Router ID of the packet's source. Bandyopadhyay Expires September 26, 2019 [Page 16] Internet Draft Real IP Framework March 26, 2019 Area ID This is not relevant here but has been retained to make use of existing OSPF source code with least modification. Checksum The standard IP checksum of the entire contents of the packet, starting with the packet header but excluding the 64-bit authentication field. This checksum is calculated as the 16-bit one's complement of the one's complement sum of all the 16-bit words in the packet, excepting the authentication field. If the packet's length is not an integral number of 16-bit words, the packet is padded with a byte of zero before checksumming. The checksum is considered to be part of the packet authentication procedure; for some authentication types the checksum calculation is omitted. AuType Identifies the authentication procedure to be used for the packet. Authentication is discussed in Appendix D of OSPF specification [19]. Authentication A 64-bit field for use by the authentication scheme. See Appendix D of OSPF specification for details. 4.3.1.1. The Hello packet Hello packet is just same as defined in OSPF protocol. Please follow Section A.3.2 of OSPF specification [19] for detail. 4.3.1.2. The Add Node packet An "Add Node" packet is generated when a router adds a port to a customer network or attaches a new router. A node can be a customer network or a router. The message gets transported to the root on hop- by-hop basis. The receiving router sends back an "Acknowledgment" message by changing the "Type" field as Acknowledgment. The "Sequence Number" and "Router ID" field gets verified on receiving the acknowledgment back. On receiving an "Add Node" message, root adds a new node to the tree under the node designated by "Router ID". 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Version # | 2 | Packet length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Router ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Bandyopadhyay Expires September 26, 2019 [Page 17] Internet Draft Real IP Framework March 26, 2019 | Area ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Checksum | AuType | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Authentication | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Authentication | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Node Type | Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Node ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Node Type Node type is Customer Network (1)/Router (2) Sequence Number Whenever a router generates an Add Node message it uses a Sequence Number. Usually it increments the Sequence Number on completion of the transaction. Node ID Node ID is the router ID of the domain associated with the router/customer network. 4.3.1.3. The Delete Node packet "Delete Node" message gets generated by a router when a node gets deleted. The message gets transported to the root on hop-by-hop basis. On receiving "Delete Node" message, root deletes the node (i.e. the entire subtree) under the node designated as "Router ID". All the fields of a "Delete Node" packets are same as an "Add Node" packets apart from the Type(3) field. 4.3.1.4. The Link Down packet "Link Down" message gets generated once a router failed to get "Hello" from its neighbor and declares the link to be inactive. The message gets transported to the root on hop-by-hop basis. On receiving "Link Down" message root marks the link in the tree to be inactive. All the fields of a "Link Down" packet are same as an "Add Node" packet apart from the Type(4) field. 4.3.1.5. The Link Up packet "Link Up" message gets generated once a router starts getting "Hello" messages from a neighbor which was marked as inactive. The message gets transported to the root on hop-by-hop basis. On receiving "Link Bandyopadhyay Expires September 26, 2019 [Page 18] Internet Draft Real IP Framework March 26, 2019 Down" message root sends "Get Subtree" message to the router as designated by "Router ID". After getting the subtree database, root marks the link as active. All the fields of a "Link Up" packet are same as an "Add Node" packet apart from the Type(5) field. 4.3.1.6. The Get Subtree packet "Get Subtree" packet gets generated to get the database of a subtree. Database of a subtree is expressed recursively in the following manner. Add Router ID of the root of the subtree (32 bits in IPv4) + Add Number of child of the subtree (16 bits) + for each child of the subtree { Add Type of the child (Customer Network/Router) (16 bits) + Add Router ID of the child (32 bits in IPv4) } for each child as a router of the subtree call Get Subtree Once router R1 (as master) sends "Get Subtree" packet to router R2 (as slave), router R2 collects database information from all the routers as child recursively and then exchange database information to R2 (the master). Exchange of database is just same as operation of "Database Description" packet of OSPF (See section A.3.3 of [19]). Format of "Get Subtree" packet is same as "Database Description" packet of OSPF with the "Type" field set as 6. Amount of time that R2 takes to collect database information is based on its position at the tree. If it is at the bottom most position of the tree, it takes least amount of time, where as if it is just one level below the root of the tree, it takes the maximum amount of time. So, to get the first packet, R1 needs to wait maximum amount of time and subsequent packets will arrive at regular interval of time. If timer expires, R1 sends the same packet to R2. R2 ignores the packet if it has already started processing. The question arises how many times R1 will send the same packet to get the first response? The answer is if there is no link failure, it will get a response and if there is a link failure, it will receive "Link Down" message and halts operation. 4.3.1.7. The Acknowledgment packet An "Acknowledgment" packet is sent to acknowledge that an "Add Node"/"Delete Node"/"Link Up"/"Link Down" has been received to the sender. All the fields of an "Acknowledgment" packet are same as an "Add Node" packet apart from the Type(7) field. Bandyopadhyay Expires September 26, 2019 [Page 19] Internet Draft Real IP Framework March 26, 2019 4.4. IP VPN with MPLS inside VLSM tree This section describes how to make IP VPN work inside VLSM tree without using BGP. RFC4364 [7] describes "IP VPN" with BGP/MPLS. To support VPN, PE routers maintain per-site forwarding table. When a packet arrives from an associated CE router, PE router consults with this forwarding table to forward the packet. If the packet is supposed to be forwarded to another site of VPN through the backbone, it uses two- level label stack. The upper label is used to forward the packet from ingress PE router to the egress PE router; where as, the inner label is used for the egress PE router to identify the associated CE router where the packet is supposed to be forwarded. BGP is used by the Service Provider to exchange the routes of a particular VPN among the PE routers that are attached to that VPN. Configuration takes place on PE routers of both the sides of LSP. The simplest way to achieve this is to configure these attributes manually on PE routers. In order to have dynamic allocation of inner label, MPLS signaling protocols (in place of BGP) need to be extended. Allocation of inner label has to be done by the egress PE router. Same message that is used for the assignment of upper label may be used for the assignment of inner label. Inside the forwarding table, each entry contains the forwarding destination address based on a set of destination addresses (NetAddress/NetMask) of the IP packets received from ingress CE router. While establishing inner label, ingress PE router needs to send these attributes with the signaling message and the egress PE router needs to validate those before assigning label. 4.4.1. Extension to RSVP-TE to support IP VPN inside VLSM tree This section describes extension to RSVP-TE[17] to support dynamic allocation of inner label of two-level label stack used to support VPN services. In order to establish LSP using RSVP-TE, ingress PE router sends Path message to the egress PE router. Path message is augmented with a LABEL_REQUEST object. Labels are allocated downstream and distributed (propagated upstream) by means of RSVP Resv message. For this purpose, the RSVP Resv message is extended with a special LABEL object. In order to support VPN to establish the inner label, Path message is augmented with a VPN_ATTRIBUTE label. Similarly, RSVP Resv message is extended with a VPN_LABEL object. When an egress PE router receives a Path message, it checks the presence of VPN_ATTRIBUTE object. On finding this object, egress PE router checks the viability of assignment of VPN label with the parameters from the VPN_ATTRIBUTE object and the attributes that are already configured with the egress PE router. If the test is positive, it assigns a VPN label and does Bandyopadhyay Expires September 26, 2019 [Page 20] Internet Draft Real IP Framework March 26, 2019 the rest of the processing of LSP label assignment and sends the RSVP Resv message with the extension of VPN_LABEL object towards the ingress PE router. On receiving Resv message with VPN_LABEL object, ingress PE router assigns VPN label along with the rest of the processing of Resv message and completes the operation. VPN_ATTRIBUTE and VPN_LABEL objects are described below. VPN_LABEL class=, C-Type=1 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | (inner label) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ VPN_ATTRIBUTE class=, C-Type=1 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Global Unicast Address of Ingress CE Router | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Global Unicast Address of Egress CE Router | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Net Address of Destination IP Packet | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Net Mask of Destination IP Packet | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The format of the Path message is as follows: ::= [ ] [ ] [ ] [ ] [ ... ] ::= [ ] [ ] The format of the Resv message is as follows: ::= [ ] Bandyopadhyay Expires September 26, 2019 [Page 21] Internet Draft Real IP Framework March 26, 2019 [ ] [ ] [ ... ] [ ]