from http://www.my.kernel.org/pub/ietf/concluded-wg-ietf-mail-archive/rolc/rolc.arc Routing over Large Clouds Working Group Juha Heinanen Reguest for Comments: DRAFT Telecom Finland Expires February 8, 1994 Ramesh Govindan Bellcore August 8, 1993 NBMA Next Hop Resolution Protocol (NHRP) Status of this Memo This document is an Internet Draft. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts. Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a ``working draft'' or ``work in progress.'' Please check the 1id- abstracts.txt listing contained in the internet-drafts Shadow Directories on nic.ddn.mil, nnsc.nsf.net, nic.nordu.net, ftp.nisc.sri.com, or munnari.oz.au to learn the current status of any Internet Draft. Abstract This document describes the NBMA Next Hop Resolution Protocol (NHRP). NHRP can be used by a source terminal (host or router) connected to a Non-Broadcast, Multi-Access link layer (NBMA) network to find out the IP and NBMA addresses of the "NBMA next hop" towards a destination terminal. The NBMA next hop is the destination terminal itself, if the destination is connected to the NBMA network. Otherwise, it is the egress router from the NBMA network that is "nearest" to the destination terminal. Although this document focuses on NHRP in the context of IP, the technique is applicable to other network layer protocols as well. 1. Introduction The NBMA Next Hop Resolution Protocol (NHRP) allows a source terminal (a host or router), wishing to communicate over a Non-Broadcast, Multi-Access link layer network (called NBMA for short), to find out the IP and NBMA addresses of the "NBMA next hop" towards a destination terminal. The "NBMA next hop" is the destination terminal itself, if the destination is connected to the NBMA network. Heinanen & Govindan Expires February 8, 1994 [Page 1] RFC DRAFT NBMA NHRP August 8, 1993 Otherwise, it is the egress router from the NBMA network is nearest to the destination terminal. Once the NBMA next hop has been resolved, the source may either start sending IP packets to the destination (in a connectionless NBMA network such as SMDS) or may first establish a connection to the destination with the desired bandwidth and QOS characteristics (in a connection oriented NBMA network such as ATM). An NBMA network can be non-broadcast either because it technically doesn't support broadcasting (e.g. an X.25 network) or because broadcasting is not feasible for one reason or another (e.g. an SMDS broadcast group or an extended Ethernet would be too large). 2. Protocol Overview In this section, we briefly describe how a source S uses NHRP to determine the "NBMA next hop" to destination D. S first determines the next hop to D through normal routing processes. If this next hop is reachable through its NBMA interface, S formulates an NHRP request containing the source and destination IP addresses, the source NBMA address, and QOS information. S then forwards the request to an entity called the route server (RS). RSs are configured to cooperatively satisfy NHRP requests. Each RS "serves" a pre-configured set of terminals and peers with a pre- configured set of RSs. An RS exchanges routing information with its peers (and possibly with the terminals it serves), using regular routing protocols. (However, an RS, unless it is also an egress/ingress router, need not necessarily be able to switch regular IP packets). This exchange is used to construct a forwarding table per QOS in every RS. The forwarding table determines the next hop from the RS for an NHRP request with the corresponding QOS. After receiving an NHRP request, the RS checks if it "serves" D. If so, the RS uses ARP [1] to find out D's NBMA address. For the case of an ATM network, the ARP operation is described in [2]. The RS then either forwards the NHRP request to D or generates a positive NHRP reply on its behalf. The reply contains D's (D is S's NBMA next hop) IP and NBMA address and is sent back to S. NHRP replies usually traverse the same sequence of RSs as the NHRP request (in reverse order, of course). If the RS does not serve D, it extracts from its forwarding table the next hop towards D. If no such next hop entry is found, the RS generates a negative NHRP reply. If the next hop is behind the RS's NBMA interface, the RS forwards the NHRP request to the next hop. If the next hop is behind some Heinanen & Govindan Expires February 8, 1994 [Page 2] RFC DRAFT NBMA NHRP August 8, 1993 other interface, the RS may be willing to act as an egress router for traffic bound to D. In that case, the RS generates a positive NHRP reply containing its own IP and NBMA address (i.e., the RS is the NBMA next hop from S to D). An RS receiving an NHRP reply may cache the NBMA next hop information contained therein. To a subsequent NHRP request, this RS might respond with the cached, non-authoritative, NBMA next hop. If a communication attempt based on non-authoritative information fails, a source terminal can choose to send an authoritative NHRP request. RSs never respond to authoritative NHRP requests with cached information. NHRP provides a mechanism to aggregate NBMA next hop information in RS caches. Suppose that RS X is the NBMA next hop from S to D. Suppose further that X is an egress router for all terminals sharing an IP address prefix with D. When X generates an NHRP reply in response to a request, it may replace the IP address of D with this prefix. The prefix to egress router mapping in the reply is cached in all RSs on the path of the reply. A subsequent (non- authoritative) NHRP request for some destination that shares an IP address prefix with D can be satisfied with this cached information. 3. Configuration Terminals In order to participate in NHRP, a terminal connected to an NBMA needs to be configured with the IP address(es) of its RS(s). The RS(s) may be physically located on the terminals' default or peer routers. If the terminal is attached to several link layer networks, it may also need to be configured to receive routing information from its RS(s) so that the terminal can determine which IP networks are reachable through the NBMA. Route Servers An RS is configured with a set of IP address prefixes that correspond to the IP addresses of the terminals it is serving. Moreover, the RS must be configured to exchange routing information with its peer RSs (if any). If a served terminal is attached to several link layer networks, the RS may also need to be configured to advertize routing information to such terminals. If an RS is acting as an egress router for terminals connected to other link layer networks, the RS must, in addition to the above, be configured to exchange routing information between the NBMA and the other link layer networks. Heinanen & Govindan Expires February 8, 1994 [Page 3] RFC DRAFT NBMA NHRP August 8, 1993 In all cases, routing information is exchanged using regular intra- and/or inter-domain routing protocols such as OSPF, Dual IS-IS, BGP, or IDRP. 4. Packet Formats NHRP packets are carried either directly over the NBMA, encapsulated in IP with a separate protocol number, or carried as ICMP messages. Regardless, NHRP request and reply packets contain the following fields: nhrp$op 1 byte Operation code nhrp$hc 1 byte Hop count nhrp$lnk 2 bytes Link layer type nhrp$net 2 bytes Network layer type nhrp$sll 1 byte Length of source link layer address nhrp$sla sll/8 bytes Source link layer address nhrp$snl 1 byte Length of source network layer address nhrp$sna snl/8 bytes Source network layer address nhrp$dnl 1 byte Length of destination network layer address nhrp$dna dnl/8 bytes Destination network layer address nhrp$qosl 1 byte Length of QOS information nhrp$qos qosl/8 bytes QOS information nhrp$dll 1 byte Length of destination link layer address nhrp$dla dll/8 bytes Destination link layer address nhrp$onl 1 byte Length of originator network layer address nhrp$ona pnl/8 bytes Originator network layer address The Operation code indicates the type of the message. The assigned values are: NHRP Request = 1 NHRP Request for Authoritative Information = 2 NHRP Positive, Authoritative Reply = 3 NHRP Positive, Non-Authoritative Reply = 4 NHRP Negative, Authoritative Reply = 5 If ICMP is used to carry NHRP requests and replies, then the operation code determines the ICMP code. The Hop count indicates the maximum number of RSs that a request or reply is allowed to pass before being discarded. The possible values for the Link layer type and Network layer type fields are the same as for the Hardware type and Protocol type of ARP [1] and may be found in the current Assigned Numbers RFC. All Length fields indicate the length of the corresponding entity in Heinanen & Govindan Expires February 8, 1994 [Page 4] RFC DRAFT NBMA NHRP August 8, 1993 bits. An empty address or QOS field has a length 0. The QOS information field is network layer specific and is used to select a forwarding table during query/request forwarding. For IP, this field contains the desired TOS value. In requests and negative replies, the Destination link layer address is always empty. In a positive NHRP reply originating from an egress router, the Destination network layer address may be a prefix of the requested Destination network layer address. Positive replies always contain their originator's IP address. If the originator's IP address and the destination's IP addresses differ, the source terminal may assume that the reply was generated by an egress router. An RS is not allowed to reply to an NHRP Request for Authoritative Information with cached information, but may do so for an NHRP Request. Replies based on cached information carry a different operation code from those based on authoritative information. 5. Protocol Operation The external behavior of an RS may be described in terms of two procedures (processRequest and processReply) operating on two tables (forwardingTable and cacheTable). In an actual implementation, the code and data structures may be realized differently. Each RS has, for each supported QOS, a forwardingTable consisting of entries with the fields: The networkLayerAddrPrefix field identifies a set of network layer addresses known to the RS. The outIf field denotes either the server NBMA network interface or some other link layer network interface. If outIf denotes the served NBMA interface, then two possibilities exist: (1) The RS is itself serving the networkLayerAddrPrefix. This is indicated by a true value in the directlyConnected? field and the outIfAddr field has no meaning. Such a forwardingTable entry has been created by manual configuration. (2) Some other RS is serving the networkLayerAddrPrefix. This is indicated by a false value in the directlyConnected? field. The outIfAddr field now contains the NBMA address of the next hop RS. Such a forwardingTable entry is a result of network layer address prefix information exchange with one of the RS's peers. Heinanen & Govindan Expires February 8, 1994 [Page 5] RFC DRAFT NBMA NHRP August 8, 1993 If outIf denotes an interface other than the served NBMA interface, then the RS may act as an egress router for the terminals sharing the networkLayerAddrPrefix. To do this it must be capable of switching IP packets between the NBMA and the other link layer network. Again two possibilities exist: (1) The RS can itself resolve the link layer address corresponding to the networkLayerAddrPrefix. This is indicated by a true value in the directlyConnected? field and the outIfAddr field has no meaning. Such a forwardingTable entry has been created by manual configuration. (2) The networkLayerAddrPrefix is behind an RS in another NBMA or behind some other router. This is indicated by a false value in the directlyConnected? field. The outIfAddr field now contains a link layer address of this other router or RS. Such a forwardingTable entry is a result of networkLayerAddrPrefix information exchange with one of the RS's peer routers or RSs in another NBMA. The protocol used to exchange networkLayerAddrPrefix information among the RSs can be any regular IP intra- or inter-domain routing protocol, such as OSPF, Dual IS-IS, BGP, or IDRP. In addition to the forwardingTable, each RS has for each supported QOS a cacheTable consisting of entries with the fields: The entries in this table are learned from NHRP request and replies passing through the RS. The networkLayerAddrPrefix field identifies a set of IP addresses sharing a common NBMA address that is stored in the linkLayerAddr field. The networkLayerAddr field identifies the IP address of the originator of the information. It value differs from networkLayerAddrPrefix only in case the originator is an egress router. The cacheTable entries could also include a timeStamp field to be used to age nnhopTable entries after a certain hold period. The following pseudocode defines how NBMA NHRP requests and replies are processed by an RS. Heinanen & Govindan Expires February 8, 1994 [Page 6] RFC DRAFT NBMA NHRP August 8, 1993 procedure processRequest(request); addCacheTableEntry(request.sna, request.sna, request.sla); let bestMatch == matchForwardingTable(request.dna) do if bestMatch then if bestMatch.directlyConnected? then if nbmaIf?(bestMatch.outIf) then let nbmaAddr == arp(request.dna) do if nbmaAddr then genPosAuthReply(request, request.dna, nbmaAddr) else genNegReply(request) end end else genPosAuthReply(replaceDna(request, bestMatch.networkLayerAddrPrefix), selfNetworkLayerAddr, selfLinkLayerAddr) end else if not requestForAuthInfo?(request) then let cacheMatch == matchCacheTable(request.dna) do if cacheMatch then genPosNonAuthReply(request, cacheMatch.networkLayerAddr, cacheMatch.linkLayerAddr); return; end end end; forward(request, bestMatch.OutIf, bestMatch.OutIfAddr) end else genNegReply(request) end end end Heinanen & Govindan Expires February 8, 1994 [Page 7] RFC DRAFT NBMA NHRP August 8, 1993 procedure processReply(reply); if posReply?(reply) then addCacheTableEntry(reply.dna, reply.ona, reply.dla) end; if reply.sna == selfNetworkLayerAddr then "reply is to the RS itself that is also acting as a terminal" else let bestMatch == matchForwardingTable(reply.sna) do if bestMatch then if nbmaIf?(bestMatch.outIf) then if bestMatch.directlyConnected? then let nbmaAddr == arp(reply.sna) do if nbmaAddr then forward(reply, nbmaIf, nbmaAddr) end end else forward(reply, bestMatch.outIf, bestMatch.outIfAddr) end else "a request should never originate from another NBMA" end end end end end The semantics of the procedures and constants used in the pseudocode are explained below. addCacheTableEntry(networkLayerAddrPrefix, networkLayerAddr, linkLayerAddr) adds a new entry to the cacheTable or overwrites an existing entry whose networkLayerAddrPrefix field is equal to networkLayerAddrPrefix. A new entry is not added if matchCacheTable(networkLayerAddrPrefix) returns an entry whose linkLayerAddr field is equal to linkLayerAddr. matchForwardingTable(networkLayerAddrPrefix) returns a forwardingTable entry whose networkLayerAddrPrefix field is the best match for networkLayerAddrPrefix or false if no match is found. nbmaIf?(outIf) tests if outIf is the interface of the served NBMA. arp(networkLayerAddr) performs ARP [1] on networkLayerAddr and returns either the NBMA address corresponding to the networkLayerAddr or false if no NBMA address is found. matchCacheTable(networkLayerAddrPrefix) returns a cacheTable entry Heinanen & Govindan Expires February 8, 1994 [Page 8] RFC DRAFT NBMA NHRP August 8, 1993 whose networkLayerAddrPrefix field is the best match for networkLayerAddrPrefix or false if no match is found. genPosAuthReply(request, networkLayerAddr, linkLayerAddr) and genPosNonAuthReply(request, networkLayerAddr, linkLayerAddr) respectively generate positive authoritative and non-authoritative replies by copying the request, rewriting the Operation code field, initializing the Hop count field, filling in the Length of destination link layer address and Destination link layer address fields based on linkLayerAddr, and filling in the Length of originator network layer address and Originator network layer address fields based on networkLayerAddr. genNegReply(request) generates a negative, authoritative reply by copying request, rewriting the Operation code field, and initializing the Hop count field. replaceDna(request, networkLayerAddrPrefix) returns an NHRP request where the Destination network layer address field of request is replaced by networkLayerAddrPrefix. selfNetworkLayerAddr and selfLinkLayerAddr denote the egress router's own IP and NBMA address in the served NBMA. requestForAuthInfo?(request) tests if request is a Request for Authoritative Information. forward(request, outIf, outIfAddr) decrements the Hop count field of request and forwards request to the address outIfAddr of the interface outIf provided that the value of the Hop count field remains positive. posReply?(reply) tests if reply is a Positive Authoritative or Positive Non-Authoritative Reply. authoritativeReply?(reply) tests if reply is a Positive, Authoritative Reply. nmbaIf denotes the NBMA interface of the RS. Similar to RSs, each terminal participating in NHRP has a forwardingTable and a cacheTable. The forwardingTable is the regular forwarding table that the terminal is using for its IP routing. The terminal must, of course, be able to generate NHRP requests to its RS(s) in case the forwardingTable shows that a particular destination address is behind the NBMA interface and to process replies to these requests. The cacheTable is for caching the replies in order to avoid repeated requests for the same destination addresses. Heinanen & Govindan Expires February 8, 1994 [Page 9] RFC DRAFT NBMA NHRP August 8, 1993 6. Discussion The result of an NHRP request depends on how routing is configured among the RSs. If the destination terminal is directly connected to the NBMA and the RSs always prefer NBMA routes over routes via other link layer networks, the NHRP replies always return the NBMA address of the destination terminal itself rather than the NBMA address of some egress router. For destinations outside the NBMA, egress routers and routers in the other link layer networks should exchange routing information so that the optimal egress router is always found. In addition to RSs, an NBMA terminal could also be associated with one or more regular routers that could act as "connectionless servers" for the terminal. Then the terminal could choose to resolve the NBMA next hop or just send the IP packets to one of the terminal's connectionless servers. The latter option may be desirable if communication with the destination is short-lived and/or doesn't require much network resources. The connectionless servers could, of course, be physically integrated in the RSs by augmenting them with IP switching functionality. NHRP supports portability of NBMA terminals. A terminal can be moved anywhere within the NBMA and still keep its original IP address as long as its RS(s) remain the same. Requests for authoritative information will always return the correct link layer address. References [1] Address Resolution Protocol, David C. Plummer, RFC 826. [2] Classical IP and ARP over ATM, Mark Laubach, Internet Draft. Acknowledgements We would like to thank John Burnett of Adaptive, Dennis Ferguson of ANS, Joel Halpern of Network Systems, and Paul Francis of Bellcore for their valuable insight and comments to earlier versions of this draft. Authors' Addresses Juha Heinanen Ramesh Govindan Telecom Finland, Bell Communications Research PO Box 228, MRE 2P-341, 445 South Street SF-33101 Tampere, Morristown, NJ 07960 Finland Heinanen & Govindan Expires February 8, 1994 [Page 10] RFC DRAFT NBMA NHRP August 8, 1993 Phone: +358 49 500 958 Phone: +1 201 829 4406 Email: Juha.Heinanen@datanet.tele.fi Email: rxg@thumper.bellcore.com Heinanen & Govindan Expires February 8, 1994 [Page 11]