Internet Draft Yasuhiro Katsube Ken-ichi Nagami Hiroshi Esaki (Toshiba R&D Center) Sept. 6th, 1995 Router Architecture Extensions for ATM : Overview Status of this memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress". To learn the current status of any Internet-Draft, please check the "1id-abstract.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Abstract This memo describes new internetworking architecture which makes better use of the property of ATM. IP datagrams are transferred along hop-by-hop path via routers similar to the Classical IP Model [RFC1577], but datagram assembly/disassembly and IP header processing are not necessarily carried out at individual routers in the proposed architecture. A concept of "Cell Switch Router (CSR)" is introduced as a new internetworking equipment, which has ATM cell switching capabilities in addition to conventional IP datagram forwarding. CSR can concatenate one incoming VC and another outgoing VC (or VCs) in order to provide a certain communication with an ATM level connectivity even when its endpoints do not share a common IP address prefix. Proposed architecture can provide applications with desired QoS and as much bandwidth as current ATM switch networks provide while retaining current router-based internetworking concept. Katsube, et al. Expires March 6th, 1996 [Page 1] Internet Draft Sept. 6th, 1995 1. Introduction The Internet is growing both in its size and its traffic volume. In addition, recent applications often require guaranteed bandwidth and QoS rather than best effort. Such changes make the current hop-by-hop datagram forwarding paradigm inadequate, then accelerate investigations on new internetworking architectures. Roughly two distinct approaches can be seen as possible solutions; the use of ATM to convey IP datagrams, and the revision of IP to support flow concept and resource reservation. Integration or interworking of these approaches will be necessary to provide end hosts with high throughput and QoS guaranteed internetworking services over any datalink platforms as well as ATM. New internetworking architecture proposed in this draft is based on "Cell Switch Router (CSR)" which has the following properties. - It makes the best use of ATM's property while retaining current router-based internetworking and routing architecture. - It takes into account interoperability with future IP that supports flow concept and resource reservations. Section 2 of this draft explains background and motivations of our proposal. Section 3 describes an overview of the proposed internetworking architecture and its several remarkable features. Section 4 discusses control architectures for CSR, which will need to be further investigated. 2. Backgrounds and Motivations It is considered that the current hop-by-hop best effort datagram forwarding paradigm will not be adequate to support future large scale Internet which accommodates huge amount of traffic with certain desired QoS. Two major schools of investigations can be seen in IETF whose main purpose is to improve ability of the Internet with regard to its throughput and QoS. One is to utilize ATM technology as much as possible, and the other is to introduce the concept of resource reservation and flow into IP. 1) Utilization of ATM Although basic properties of ATM; necessity of connection setup, necessity of traffic contract, etc.; is not necessarily suited to conventional IP datagram transmission, its excellent throughput and delay characteristics let us to investigate the realization of IP Katsube, et al. Expires March 6th, 1996 [Page 2] Internet Draft Sept. 6th, 1995 datagram transmission over ATM. A typical internetworking architecture specified by IETF IPoverATM WG is "Classical IP Model"[RFC1577]. This model allows direct ATM connectivities only between nodes that share the same IP address prefix. IP datagrams should traverse routers whenever they go beyond IP subnet boundaries even though their source and destination are accommodated in the same ATM cloud. Although an ATMARP is introduced which is not based on legacy datalink broadcast but on centralized ATMARP servers, this model does not require drastic changes to the legacy internetworking architectures with regard to the IP datagram forwarding process. This model still has problems of limited throughput and large latency, compared with an ability of ATM, due to IP header processing at every router. It will become more critical when multimedia applications that require much larger bandwidth and smaller latency will become dominant in the near future. Another internetworking model is currently under discussion in IETF ROLC WG[KAT95] and the ATM Forum MPOA(Multiprotocol-over-ATM) SWG. The model, that we call "NHRP (Next Hop Resolution Protocol) Model" here, aims at resolving throughput and latency problems in the Classical IP Model and making the best use of the ability of ATM. ATM connections can be directly established from an ingress point to an egress point of an ATM cloud even when they do not share the same IP address prefix. In order to enable it, the entity of Next Hop Server[KAT95] (or Route Server) is introduced which can find an egress point of the ATM cloud nearest to the given destination and resolves its ATM address. A sort of query/response protocols between the server(s) and clients and possibly server and server will be specified. After the ATM address of a desired egress point is resolved, the client establishes a direct ATM connection to that point through ATM signaling procedures.[ATM3.1] IP datagram forwarding function and routing protocol processing function, both of which are provided by conventional routers, are distributed to the ATM cloud and server(s) respectively. Once a direct ATM connection has been set up through this procedure, IP datagrams do not have to experience hop-by-hop IP processing but can be transmitted over the direct ATM connection. Therefore, high throughput and low latency communications become possible even if that go beyond IP subnet boundaries. In this model, ATM is utilized not only as a datalink function but also as a replacement of current hop-by-hop IP forwarding function. However, it should be noted that the provision of such direct ATM connections does not mean disappearance of legacy routers which interconnect distinct ATM-based IP subnets. For example, hop-by-hop IP datagram forwarding function would still be required in the Katsube, et al. Expires March 6th, 1996 [Page 3] Internet Draft Sept. 6th, 1995 following cases: - When you want to transmit IP datagrams before direct ATM connection from an ingress point to an egress point of the ATM cloud is established - When you neither require a certain QoS nor transmit large amount of IP datagrams for some communication[REK95-1] - When the direct ATM connection is not allowed by security or policy reasons 2) IP level resource reservation and flow support Apart from investigation on specific datalink technology such as ATM, resource reservation technologies for desired IP level flows have been studied and are still under discussion. Their typical examples are RSVP[RSVP95] and STII[STII95]. RSVP itself is not a connection oriented technology since datagrams can be transmitted regardless of the result of resource reservation process. After a resource reservation process from a receiver (or receivers) to a sender (or senders) is successfully completed, RSVP- capable routers along the path of the flow reserve their resources for datagram forwarding according to its requested flow spec. STII is regarded as a connection oriented IP which requires connection setup process from a sender to a receiver (or receivers) before transmitting datagrams. STII-capable routers along the path of the requested connection reserve their resources for datagram forwarding according to its flow spec. Neither RSVP nor STII restrict underlying datalink networks since their primary purpose is to let routers provide each IP flow with desired forwarding quality (by controlling their datagram scheduling rules). Since various datalink networks will coexist as well as ATM datalink in the future, these IP level resource reservation technologies would be necessary in order to provide end-to-end IP flow with desired bandwidth and QoS. Taking these backgrounds into consideration, we should be aware of several issues which motivate our proposal. - ATM specific internetworking architecture proposed as NHRP model does not take into account an interoperability with IP level resource reservation or connection setup protocols. Especially Katsube, et al. Expires March 6th, 1996 [Page 4] Internet Draft Sept. 6th, 1995 operating RSVP in the NHRP-based ATM cloud seems to require much effort since RSVP is soft-state receiver-oriented protocol with multicast capability as a default, while ATM with NHRP is hard- state sender-oriented which does not support multicast yet. - Although RSVP or STII-based routers will provide each IP flow with a desired bandwidth and QoS, they have some native throughput limitations due to processor-based IP forwarding mechanism compared with the switching mechanism of ATM. Main objective of our proposal is to resolve above issues. Proposed internetworking architecture makes the best use of the property of ATM by extending legacy routers to handle future IP such as flow support and resource reservation with the help of ATM's cell switching capabilities. 3. Internetworking Architecture Based On Cell Switch Router (CSR) 3.1 Overview Cell Switch Router (CSR) is a key network element of the proposed internetworking architecture. The CSR provides cell switching function in addition to conventional IP datagram forwarding. Communications with high throughput and small latency, that are native property of ATM, become possible by using this cell switching function even when the communications pass through IP subnetwork boundaries. In an ATM Internet composed of CSRs, VPI/VCI-based cell switching which bypasses datagram assembly/disassembly and IP header processing is possible at every CSR for communications which are worth doing that (e.g., communications which require certain amount of bandwidth and QoS), while hop-by-hop datagram forwarding based on IP header is also possible at every CSR for other conventional communications. By using such cell-level switching capabilities, the CSR is able to concatenate incoming and outgoing ATM VCs, although the concatenation in this case is controlled outside the ATM cloud (ATM's control/ management-plane) unlike conventional ATM switch nodes. By carrying out such VPI/VCI concatenations at multiple CSRs consecutively, native ATM pipe composed of multiple ATM VCs, each of which connects adjacent CSRs (or CSR and hosts/routers), can be provided. We call such an ATM pipe "ATM Bypass-pipe" to differentiate it from "ATM VCC (VC connection)" provided by a single ATM datalink cloud through ATM signaling. Example network configurations based on CSRs are shown in figure 1. An ATM datalink network may be a large cloud which accommodates Katsube, et al. Expires March 6th, 1996 [Page 5] Internet Draft Sept. 6th, 1995 multiple IP subnets X, Y and Z. Or several distinct ATM datalinks may accommodate single IP subnet X, Y and Z respectively[OHTA95]. The latter configuration would be straightforward in discussing CSR, but CSR is also applicable to the former configuration as well. In addition, CSR would be applicable as a router which interconnects multiple NHRP-based ATM clouds. Two different kinds of ATM VCs are defined between adjacent CSRs or between CSR and ATM-attached hosts/routers. 1) Default-VC It is general purpose VC used by any communications which select conventional hop-by-hop IP routed path. All incoming cells received from this VC are assembled to IP datagrams and handled based on their IP headers. VCs set up in the Classical IP Model are classified into this category. 2) Dedicated-VC It is used by specific communications which are specified by, for example, any combination of the destination IP address/port, the source IP address/port or IPv6 flow label. It can be used to be concatenated with other Dedicated-VCs and can constitute an ATM Bypass-pipe for those communications. Ingress/egress nodes of the Bypass-pipe can be either CSRs or ATM- attached routers/hosts both of which understand a Bypass-pipe control protocol. (we call that "Bypass-capable nodes") On the other hand, intermediate nodes of the Bypass-pipe should be CSRs since they need to have cell switching capabilities as well as to understand Bypass-pipe control protocol. Route for a Bypass-pipe is determined when it is set up based on IP routing table in each CSR. In figure 1, IP datagrams from source host or router X.1 to destination host or router Z.1 are transferred over the route X.1 -> CSR1 -> CSR2 -> Z.1 regardless of whether the communication is hop-by-hop basis or Bypass-pipe basis. Routes for individual Dedicated-VCs which constitutes the Bypass-pipe X.1 --> Z.1 (X.1 -> CSR1, CSR1 -> CSR2, CSR2 -> Z.1) would be determined based on ATM routing protocol [IISP][PNNI], and would be independent of IP level routing. An example of IP datagram transmission mechanism is as follows. Katsube, et al. Expires March 6th, 1996 [Page 6] Internet Draft Sept. 6th, 1995 o The host/router X.1 checks an identifier of each IP datagram, which may be "destination IP address (prefix)", "source/destination IP address (prefix) pair", "destination IP address and port", "source IP address and Flow label (in IPv6)", and so on. Based on either of those identifier, it determines over which VC the datagram should be transmitted. o The CSR1/2 checks the VPI/VCI value of each incoming cell. When the mapping from the incoming interface/VPI/VCI to outgoing interface/VPI/VCI is found in an ATM routing table, it is directly forwarded to the specified interface through ATM switch module. When the mapping in not found in the ATM routing table (or the table shows an IP module as an output interface), the cell is assembled to an IP datagram then forwarded to an appropriate outgoing interface/VPI/VCI based on an identifier of the datagram. IP subnet X IP subnet Y IP subnet Z <---------------------> <-----------------> <---------------------> +-------+ Default +-------+ Default +-------+ Default +-------+ | | -VC | CSR 1 | -VC | CSR 2 | -VC | | | Host +=============+ +===============+ +=============+ Host | | X.1 +-------------+++++---------------+++++-------------+ Z.1 | | +-------------+++++---------------+++++-------------+ | | +-------------+++++---------------+++++-------------+ | | |Dedicated | | Dedicated | |Dedicated | | +-------+ -VCs +-------+ -VCs +-------+ -VCs +-------+ <---------------------------------------------------> Bypass-pipe Figure 1 Internetworking Architecture based on CSR 3.2 Features Main feature of the proposed CSR-based internetworking architecture is the same as that of NHRP-based architecture in the sense that they both provide direct ATM level connectivity beyond IP subnet boundaries. There are, however, several remarkable differences in the CSR-based architecture from NHRP-based one as follows. Katsube, et al. Expires March 6th, 1996 [Page 7] Internet Draft Sept. 6th, 1995 1) Relationship between IP routing and ATM routing In NHRP model, an egress point of the ATM network is first determined in the next hop resolution phase based on IP level routing information. Then the actual route for an ATM-VC to the obtained egress point is determined in the ATM connection setup phase based on ATM level routing information. Both kinds of routing information would be calculated according to factors such as network topology and available bandwidth for the large ATM cloud. The ATM routing will be based on IISP[IISP] or PNNI phase1[PNNI] while the IP routing for ATM clouds is being investigated at IDR WG[REK95-2]. We need to manage two different routing protocols over the large ATM cloud until Integtrated-PNNI[IPNNI] which takes both ATM level metric and IP level metric into account will be phased in in the future. In CSR model, IP level routing determines an egress point of the ATM cloud as well as determines inter-subnet level path to the point that shows which CSRs it should pass through. ATM level routing determines intra-subnet level path for ATM-VCs (both Dedicated-VC and Default-VC) only between adjacent nodes (CSRs or ATM-attached hosts/routers). Since roles of routing are hierarchically subdevided into IP level (router level) and ATM level (ATM SW level), ATM routing does not have to manage all over the ATM cloud but only individual IP subnets independent from each other. This will decrease the amount of information for ATM routing protocol handling. 2) Dynamic routing and redundancy support CSR-based network can dynamically change routes for Bypass-pipes when related IP level routing information changes. Ingress points of these Bypass-pipes (ATM-attached sender hosts, routers, or CSRs) do not have to be aware of such dynamic change of routes since intermediate CSRs related to IP routing changes can follow them and change routes for related Bypass-pipes by themselves. The same things apply when some error or outage happens in any ATM nodes/links/routers on the route of a Bypass-pipe. CSRs that have noticed such error or outage would change routes for related Bypass- pipes by themselves. 3) Interoperability with IP level connection control / resource reservation protocols IP level connection control / resource reservation protocols such as RSVP or STII should operate over ATM LANs as well as legacy LANs. When an ATM LAN is a large cloud which accommodates a huge number of Katsube, et al. Expires March 6th, 1996 [Page 8] Internet Draft Sept. 6th, 1995 IP subnets and routers, routers at boundaries between the ATM cloud and legacy LANs should be a branch point of an IP multicast tree with a huge number of leafs. Such border routers, in the case of RSVP, should send or receive a huge number of control messages to maintain soft states of IP flows. CSR-based internetworking architecture which keeps subnet-by-subnet internetworking does not impose individual routers such a huge amount of message processing capacity. RSVP or STII can operate over CSR-based networks as they are. 4. Control Architecture for CSR Several issues that need further investigations in order to design control architecture for CSR are discussed in this section. 4.1 Network Reference Model In order to help understanding discussions in this section, the following network reference model is assumed. Source hosts S1, S2, and destination hosts D1, D2 are attached to Ethernet, while S3 and D3 are attached to ATM. Routers R1 and R5 are attached to Ethernet only, while R2, R3 and R4 are attached to ATM. ATM datalink for subnet #3 and subnet #4 can either be physically separated datalinks or be the same datalink. In other words, R3 can be either one-port or multi-port router. Ether Ether ATM ATM Ether Ether | | +-----+ +-----+ | | | | | | | | | | S1--| S2---| S3---| | | |---D3 |---D2 |--D1 | | | | | | | | |---R1---|---R2---| |--R3--| |---R4---|---R5---| | | | | | | | | | | +-----+ +-----+ | | subnet subnet subnet subnet subnet subnet #1 #2 #3 #4 #5 #6 Figure 2 Network Reference Model Katsube, et al. Expires March 6th, 1996 [Page 9] Internet Draft Sept. 6th, 1995 Bypass-pipes can be set up [S3 or R2]-->R3-->[D3 or R4]. That means that S3, D3, R2, R3 and R4 need to speak Bypass-pipe control protocol described later, and means that R3 needs to be a CSR. We use term "Bypass-capable nodes" for hosts/routers which can speak Bypass-pipe control protocol but are not necessarily CSRs. As shown in this reference model, Bypass-pipe can be set up from host to host (S3-->R3-->D3), router to host (R2-->R3-->D3), host to router (S3-->R3-->R4), and router to router (R2-->R3-->R4). 4.2 Purposes of Bypass-pipe Setup Before discussing Bypass-pipe control architecture, we need to think about purposes (or triggers) for setting up the Bypass-pipe, which may affect concrete architecture. Following two purposes for Bypass-pipe setup are assumed at present; a) Provision of specific bandwidth/QoS requested by hosts This indicates cases in which sender or receiver end hosts request specific bandwidth/QoS for communications between their source and destination ports. Bandwidth/QoS requests from a host application will be transferred to a Bypass-pipe control entity via a generic IP-level resource reservation module (e.g., RSVP or STII), or transferred directly to a Bypass-pipe control entity. The former case would be more general than the latter since RSVP or STII entity can provide applications with bandwidth/QoS request interfaces which are not specific to ATM but applicable to any datalink. RSVP or STII entity in the host or router, which has received bandwidth/QoS request from applications or adjacent nodes, would just ask its own "resource management entity" to realize the request. Then the resource management entity may choose to accommodate the requested IP flow (identified by the destination IP address and port number for instance) to an existing ATM-VC or choose to allocate a new Dedicated-VC for the requested IP flow. When both an incoming VC and an outgoing VC are dedicated to a requested IP flow, those VCs can be concatenated at the CSR (ATM cut-through) to constitute a Bypass-pipe. In order to enable that, an information which describes mapping relationship between VC and IP flow need to be exchanged between adjacent nodes as discussed in Appendix A. Katsube, et al. Expires March 6th, 1996 [Page 10] Internet Draft Sept. 6th, 1995 b) Mitigation of IP processing burden at intermediate routers This indicates cases in which either routers or hosts determine to initiate a Bypass-pipe setup procedure with their own decision for IP flow bound for a specific destination host or network. For example, - A router/host (sender) initiates Bypass-pipe setup procedures based on the measurement of IP datagrams transmitted toward a certain destination host or network. - A router/host (sender) initiates Bypass-pipe setup procedures when it detects TCP SYN-flag. Other triggers for the initiation of Bypass-pipe setup procedure may be possible. In any case, the purpose of Bypass-pipe setup in each of these cases is to reduce IP processing burden at intermediate CSRs as well as to provide communication path with low latency, rather than to provide end host applications with specific bandwidth/QoS. For example, when R2 in figure 2 detects large amount of datagrams bound for IP subnet #6, it may initiate Bypass-pipe setup with its target destination set to subnet#6. Resulting Bypass-pipe will finally be R2-->R3-->R4 in this case. Whether the use of this Bypass-pipe is limited to the communication bound for subnet#6 or is open to communications bound for other networks (e.g., subnet#5) depends on whether the information about the Bypass-pipe is advertized as a routing information, and requires further study. Other examples of b) are, when R2 detects TCP SYN-flag whose destination is D1, when S3 transmits TCP SYN-flag whose destination is D2, when S3 transmits large amount of datagrams bound for D3, and so on. Bypass-pipe provision for the purpose of a) will surely be beneficial in the near future when related IP-level resource reservation protocol (RSVP/STII) will become available as well as the definitions of individual service classes offered to applications will become clear. On the other hand, Bypass-pipe setup for the purpose of b) may be beneficial right now since b) does not require availability of IP-level resource reservation protocols. 4.3 Desirable Characteristics of Bypass-pipe Before discussing variations of Bypass-pipe control architecture, several important characteristics of the Bypass-pipe which would be Katsube, et al. Expires March 6th, 1996 [Page 11] Internet Draft Sept. 6th, 1995 required by end users are itemized. o Support of both unicast and multicast communication o Dymanic change of Bypass-pipe topology in response to dynamic change of network status (e.g., routing change) and end-user status (e.g., addition/deletion of multicast group member) o Small amount of processing overhead for control Other desirable characteristics may be added in the future. Variations of Bypass-pipe control architecture are discussed at 4.4 taking above requirements into consideration. 4.4 Variations of Bypass-pipe Control Architecture A number of variations regarding Bypass-pipe control architecture are introduced, and evaluated from the viewpoint of end users' requirements itemized in 4.3. Items which are related to architectural variations are; o Ways of providing Dedicated-VCs o Channels for Bypass-pipe control message transfer o Initiator of Bypass-pipe setup procedure o Management of Bypass-pipe state Each of these items are discussed below. 4.4.1 Ways of Providing Dedicated-VCs There are roughly three alternatives regarding the way of providing Dedicated-VCs in individual IP subnets as components of a Bypass- pipe. a) On-demand SVC setup Dedicated-VCs are set up in individual IP subnets each time you want to set up a Bypass-pipe through the ATM signaling procedure. Each Dedicated-VC is released when the corresponding Bypass-pipe is released. Katsube, et al. Expires March 6th, 1996 [Page 12] Internet Draft Sept. 6th, 1995 b) Picking up one from a bunch of (semi-)PVCs Several VCs are set up beforehand between CSR and CSR, or CSR and other ATM-attached nodes (hosts/router) in each IP subnet. Unused VC is picked up as a Dedicated-VC from these PVCs in each IP subnet when a Bypass-pipe is set up. Each VC turns into an idle state when the corresponding Bypass-pipe is released. A sort of "Unused VC list" will be managed by the peer nodes which share these PVCs. c) Picking up one VCI in PVP/SVP PVPs or SVPs are set up between CSR and CSR, or CSR and other ATM- attached nodes (hosts/routers) in each IP subnet. PVPs would be set up as a router/host initialization procedure, while SVPs, on the other hand, would be set up through ATM signaling when the first VC (either Default- or Dedicated-) setup request is initiated by either of some peer nodes. Then, Unused VCI value is picked up as a Dedicated-VC in the PVP/SVP in each IP subnet when a Bypass-pipe is set up. Each VC turns into an idle state when the corresponding Bypass-pipe is released. A sort of "Unused VC list" will be managed by the peer nodes which share the PVP/SVP. The SVP can be released through ATM signaling when no VCI value is in active state. The best choice will be a) with regard to efficient network resource usage. However, you may go through three steps, ATMARP (for unicast [RFC1577] or multicast[ARM95] in each IP subnet), SVC setup (in each IP subnet) and Bypass-pipe setup in this case. Whether a) is practical choice or not will depend on whether you can allow larger Bypass-pipe setup time due to three-step procedure mentioned above, or whether you can send datagrams over Default-VCs in a hop-by-hop manner while waiting for the Bypass-pipe set up. In the case of b) or c), the issue of Bypass-pipe setup time will be improved since SVC setup step can be skipped. In b), each node (CSR or ATM-attached host/router) should specify some traffic descriptors even for unused VCs, and the ATM datalink should reserve its desired resource (such as VCI value and bandwidth) for them. In addition, the ATM datalink may have to carry out UPC functions for those unused VCs. Such burden would be reduced when you use UBR-PVCs and set peak cell rate for each of them equal to link rate, but bandwidth/QoS for the Bypass-pipe is not provided in this case. In c), on the other hand, traffic descriptors which should be specified by each node for the ATM datalink is not each VC's but VP's only. Resource reservations for individual VCs will be carried out not as a function of the ATM datalink but of each CSR or ATM-attached host/router if necessary. Only function which need to be provided by the ATM Katsube, et al. Expires March 6th, 1996 [Page 13] Internet Draft Sept. 6th, 1995 datalink is control of VPs' bandwidth such as UPC and dynamic bandwidth negotiation if it is available. 4.4.2 Channels for Bypass-pipe Control Message Transfer There are several alternatives regarding the channels for managing (setting up, releasing, and possibly changing route) a Bypass-pipe. This subsection explains these alternatives and discusses their properties. Three alternatives are discussed, Inband control message, Outband control message, and use of ATM signaling. i) Inband Control Message When setting up a Bypass-pipe, control messages are transmitted over a Dedicated-VC which will eventually be used as a component of the Bypass-pipe. These messages are handled at each CSR and forwarded over a Dedicated-VC along the selected route (based on IP routing table) for the requested Bypass-pipe. Unlike outband message protocol described in ii), each message does not have to indicate a Dedicated-VC which will be used since the message itself is carried over that VC. The inband control message can be either "datagram dedicated for Bypass-pipe control" or "actual IP datagram" sent by user application. Actual IP datagrams can be transmitted over Bypass-pipe after it has been set up in the former case. In the latter case, on the other hand, the first (or several) IP datagram(s) received from an unused Dedicated-VC are analyzed at IP level and transmitted toward adequate next hop over an unused Dedicated-VC. Then incoming Dedicated-VC and outgoing Dedicated-VC are concatenated to construct a Bypass-pipe. The latter case requires quick concatenation action (set-up of mapping relationship between incoming and outgoing VC) in order to avoid loss or disordering of cells received on the process of Dedicated-VCs concatenation. In inband control, Bypass-pipe control messages transmitted after a Bypass-pipe has been set up cannot be identified at intermediate CSRs since those messages are forwarded at cell level there. For example, Bypass-pipe release messages may be issued when you finish transmitting datagrams. Or Bypass-pipe control messages may be transmitted periodically when the protocol is based on soft-state operation. As a possible solution for this issue, intermediate CSRs can identify Bypass-pipe control messages by marking cell headers, e.g., PTI bit which indicates F5 OAM cell. With regard to Bypass- Katsube, et al. Expires March 6th, 1996 [Page 14] Internet Draft Sept. 6th, 1995 pipe release, explicit release message may not be necessary if individual CSRs administer the amount of traffic over each Dedicated- VC and deletes concatenation information for an inactive Bypass-pipe. ii) Outband Control Message When a Bypass-pipe is set up or released, control messages are transmitted over VCs which are different from Dedicated-VCs used as components of the Bypass-pipe. Unlike inband message protocol described in i), each message has to indicate which Dedicated-VCs the message would like to control. Therefore, an identifier that uniquely discriminates a VC, which is not VPI/VCI that is not identical at both endpoints of the VC, need to be defined and be given at VC initiation phase. However, an issue of control message transmission after a Bypass-pipe has been set up in inband case does not exist. Three alternatives are possible regarding how to convey Bypass-pipe control messages hop-by-hop over ATM datalink networks. 1) Defines VC for Bypass-pipe control messages only. 2) Uses Default-VC and discriminates Bypass-pipe control messages from user datagrams by an LLC/SANP value in RFC1483 encapsulation. 3) Uses Default-VC and discriminates Bypass-pipe control messages from user datagrams by a protocol field value in IP header. 4) Uses Default-VC and discriminates Bypass-pipe control messages from user datagrams by a port ID in the UDP frame. When we take into account interoperability with Bypass-incapable routers, 1) will not be a good choice. Whether we select 2) or 3) 4) depends on whether we should consider multiprotocol rather than IP only. In the case of IP multicast, point-to-multipoint VCs in individual subnets are concatenated at CSRs consecutively in order to constitute end-to-end multicast tree. Above four alternatives may require the same number of point-to-multipoint Defalut-VCs as the number of requested point-to-multipoint Dedicated-VCs in multicast case. The fifth alternative which can reduce the necessary number of VCs to convey control messages in a multicast environment is; Katsube, et al. Expires March 6th, 1996 [Page 15] Internet Draft Sept. 6th, 1995 5) Defines point-to-multipoint VC whose leaves are members of multicast group 224.0.0.1. All nodes which are members of at least one multicast group should become leaves of this point-to- multipoint VC. Each upstream node may become a root of the point-to-multipoint VC, or a sort of multicast server to which each upstream node transmits cells over a point-to-point VC may become a root of that. In any case, Bypass-pipe control messages for every multicast group are transmitted to all nodes which are members of either of the group. When a downstream node has received control messages which are not related to a multicast group it belongs, it should discard them by referring to a destination group address on their IP header. Donwstream node would still need to use point-to-point VC to send control messages toward upstream. iii) Use of ATM Signaling Message Supposing that ATM signaling messages can convey IP addresses (and possibly port IDs) of source and destination, it may be possible that ATM signaling messages be used as Bypass-pipe control messages also. In that case, an ATM connection setup message indicates a setup of a Dedicated-VC to an ATM address of a desirable next-hop IP node, and also indicates a setup of a Bypass-pipe to an IP address (and possibly port ID) of a target destination node. Information elements for the Dedicated-VC setup (ATM address of a next-hop node, bandwidth, QOS, etc.) are handled at ATM nodes, while information elements for the Bypass-pipe setup (source and destination IP addresses, possibly their port IDs, or flow label for IPv6, etc.) are transparently transferred to the next-hop IP node. The next-hop IP node accepts Dedicated-VC setup and handles such IP level information elements. Then it transmits an ATM signaling message to the ATM network in order to forward Bypass-pipe setup request to the next-hop IP node as well as to request Dedicated-VC setup to that. ATM signaling messages can be transferred from receiver to sender as well as sender to receiver when you set zero Forward Cell Rate and non-zero Backward Cell Rate as an ATM traffic descriptor information element in unicast case, or when Leaf Initiated Join capabilities will become available in multicast case. Issues in this method are, - Information elements which specify IP level (and port level) information need to be defined, e.g., B-HLI or B-UUI, as an ATM signaling specification. Katsube, et al. Expires March 6th, 1996 [Page 16] Internet Draft Sept. 6th, 1995 - It would be difficult to support soft-state operation since ATM signaling is naturally a hard-state protocol. The latter issue exists when a router receives/sends an RSVP message from/to non-ATM subnet which is sent/received to/from ATM subnet (ex., R2 or R4 in Figure 2). When we consider a flow from S1 to D1 in Figure 2, RESV Path messages are received periodically at R2 from R1, and RSVP Resv messages are received periodically at R4 from R5. ATM signaling, however, does not have capability to transmit messages periodically except for User Information message which is supported as an optional service depending on ATM service provider. When such a service in not supported as an ATM platform, R2 and R4 should have some capability to translate soft-state RSVP protocol and hard-state ATM signaling protocol. 4.4.3 Initiator of Bypass-pipe Setup Procedure Initiator of Bypass-pipe setup procedure can be either sender or receiver, in other words, Bypass-pipe can be set up from sender-side to receiver-side (with the same direction as information flow) or from receiver-side to sender-side (with the opposite direction from information flow). a) Sender-Initiated Setup In the case that Bypass-pipe setup is initiated as a result of administration of datagrams at hosts or routers as described in 4.2 a) (e.g, measurement of datagrams, detection of TCP SYN-flag, etc.), setup from sender-side to receiver-side would be natural. Since the purpose of setup in this case is not provision of a specific bandwidth/QoS but provision of whatever direct ATM connectivity, heterogeneity of receivers does not have to be taken into account, that makes sender-initiated control much easier. An attribute of such a Bypass-pipe may be UBR, or may be CBR/VBR/ABR whose desirable bandwidth/QoS would be determined based on the measurement of an actual traffic. Route of the Bypass-pipe can easily follow the routing information when the setup message is transmitted from sender-side to receiver- side, irrespective of whether the communication is unicast or multicast. In the case that Bypass-pipe setup is initiated as a result of specific bandwidth/QoS request from sender hosts with STII, setup will also be carried out from sender-side to receiver-side. As STII itself does not have capabilities to exchange mapping information between reserved IP flow and ATM connection identifier, additional messages in cooperation with STII sequence would be Katsube, et al. Expires March 6th, 1996 [Page 17] Internet Draft Sept. 6th, 1995 required. b) Receiver-Initiated Setup In the case that Bypass-pipe setup is initiated as a result of specific bandwidth/QoS request from receiver hosts with RSVP, setup will be carried out from receiver-side to sender-side. As RSVP itself does not have capabilities to exchange mapping information between reserved IP flow and ATM connection identifier, additional messages in cooperation with RSVP sequence would be required. Route of the Bypass-pipe follows the routing information since RSVP Path messages determine the route of datagrams based on the routing information. Although heterogeneity of receivers is taken into account in RSVP, how to use ATM connection in such heterogeneous as well as dynamically variable environment with regard to bandwidth/QoS requires further study. 4.4.4 Management of Bypass-pipe State Bypass-pipe can be managed with either hard-state or soft-state operation. a) Hard-State Management In hard-state management, hosts/routers initiate Bypass-pipe control messages only when they want to change their state or they are notified to change their state. The initiator should wait for an acknowledgement of the message before it changes its internal state. Handling of such acknowledgements for multicast Bypass-pipe control may cause reception of large amount of acknowledgement messages simultaneously. Although an average message processing overhead is small, mechanisms to identify any changes about network status (e.g., routing information) or users (e.g., multicast group member) and reconfigure the Bypass-pipe in response to that changes should be investigated. b) Soft-State Management In soft-state management, hosts/routers periodically initiate Bypass-pipe control messages in order to maintain their state and when their state has been changed. The initiator does not have to wait for an acknowledgement of the message before it changes its internal state. Although an average message processing overhead is larger than hard-state, the Bypass-pipe can be automatically Katsube, et al. Expires March 6th, 1996 [Page 18] Internet Draft Sept. 6th, 1995 reconfigured in response to any changes about network status or users by virtue of periodic control messages like RSVP Path message. 5. Security Considerations Security issues are not discussed in this memo. 6. Summery Basic concept of Cell Switch Router (CSR) are clarified and control architecture for CSR is discussed. A number of methods to control Bypass-pipe will be possible each of which has its own advantages and disadvantages. Further investigation and discussion will be necessary to design control protocol which may depend on the requirements by users. 7. References [ARM95] G. Armitage, "Support for Multicast over UNI 3.1 based ATM Networks", IETF Internet Draft (work in progress), draft-ietf-ipatm- ipmc-06.txt, Aug. 1995. [ATM3.1] The ATM-Forum, "ATM User-Network Interface Specification, v.3.1", Sept. 1994. [GOTO95] Y. Goto, "Session Identity Notification Protocol (SINP)", IETF Internet Draft (work in progress), draft-goto-sinp-00.txt, July 1995. [IISP] The ATM-Forum, "Interim Inter-switch Signaling Protocol (IISP) Protocol Specification, Version 1.0", Dec. 1994. [IPNNI] R. Callon, "Integrated PNNI for Multi-Protocol Routing", The ATM Forum Contribution No. 94-0789, Sept. 1994. [KAT95] D. Katz and D. Piscitello, "NBMA Next Hop Resolution Protocol(NHRP)", IETF Internet Draft (work in progress), draft-ietf- rolc-nhrp-04.txt, May 1995. [KATSUBE95] Y. Katsube and K. Nagami, "Mapping of IP Flow to Datalink Layer Connection", IETF Internet Draft (work in progress), draft- katsube-flow-mapping-dl-conn-00.txt, May 1995. [OHTA95] M. Ohta, et al., "Conventional IP over ATM", IETF Internet Draft (work in progress), draft-ohta-ip--over-atm-02.txt, Mar. 1995. Katsube, et al. Expires March 6th, 1996 [Page 19] Internet Draft Sept. 6th, 1995 [PNNI] The ATM-Forum, "P-NNI Draft Specification R11", Aug. 1995. [REK95-1] Y. Rekhter and D. Kandlur, "IP Architecture Extensions over ATM", IETF Internet Draft (work in progress), draft-rekhter-ip-atm- architecture-01.txt, July 1995. [REK95-2] Y. Rekhter, et al., "Inter-Domain Routing over ATM networks", IETF Internet Draft (work in progress), draft-rekhter-idr- over-atm-00.txt, Feb. 1995. [RFC1483] J. Heinanen, "Multiprotocol Encapsulation over ATM Adaptation Layer 5", IETF RFC 1483, July 1993. [RFC1577] M. Laubach, "Classical IP and ARP over ATM", IETF RFC 1577, Oct. 1993. [RSVP95] R. Braden, et al., "Resource ReSerVation Protocol (RSVP), Version 1 Functional Specification", IETF Internet Draft (work in progress), draft-ietf-rsvp-spec-07.ps/txt, July 1995. [STII95] L. Delgrossi and L. Berger, "Internet STream Protocol Version 2(STII)", Internet Draft (work in progress), draft-ietf-st2- spec-04.txt, Aug. 1995. 8. Authors' Address Yasuhiro Katsube R&D Center, Toshiba 1 Komukai Toshiba-cho, Saiwai-ku, Kawasaki 210 Japan Phone : +81-44-549-2238 Email : katsube@isl.rdc.toshiba.co.jp Ken-ichi Nagami R&D Center, Toshiba 1 Komukai Toshiba-cho, Saiwai-ku, Kawasaki 210 Japan Phone : +81-44-549-2238 Email : nagami@isl.rdc.toshiba.co.jp Hiroshi Esaki R&D Center, Toshiba 801 Schapiro Research Building, c/o CTR, Columbia Univ. 530 West, 120th St., New York, NY 10027 Phone : 212-854-2365 Email : hiroshi@ctr.columbia.edu Katsube, et al. Expires March 6th, 1996 [Page 20] Internet Draft Sept. 6th, 1995 Appendix A. Example of Bypass-pipe Control Protocol ---------------------------------------------------- A Bypass-capable node (host or router) will initiate Bypass-pipe setup procedure when, - It (router) has received IP-level resource reservation messages from its upstream (STII) or downstream (RSVP) node, or it (host) has received IP level resource reservation primitives from its own upper layer entity. - It (router) has detected more than certain amount of IP-level traffic bound for a specific destination host or network, has detected TCP SYN-flag, and so on. Or it (host) has been transmitting more than certain amount of IP-level traffic bound for a specific destination host or network, is transmitting TCP SYN- flag, and so on. The former case corresponds to purpose a) in 4.2, while the latter case corresponds to b). Below is an example of Bypass-pipe control protocol for the former case. In the case that the Bypass-pipe is set up as a means to satisfy IP-level bandwidth/QoS request, the protocol sequence would be triggered by the reception of IP resource reservation messages or primitives such as RSVP or STII. Since current RSVP/STII messages do not convey any information on the underlying datalink network, additional information should be exchanged in order to make it possible to constitute Bypass-pipes. The information which each CSR needs in order to concatenate incoming and outgoing VC at ATM level is a mapping relationship between IP flow for which bandwidth/QoS is requested and ATM VC that accommodates the IP flow. For example, when a CSR decides to accommodate the requested IP flow to an outgoing VC dedicated to that flow and learns that the flow is exclusively accommodated to an incoming VC from its previous node (either host or router), these incoming and outgoing VCs can be concatenated at the CSR. Even when both the incoming and the outgoing VCs accommodate plural IP flows, they may be concatenated at the CSR provided that each VC accommodates the same IP flows. Protocol examples for the mapping information exchange between IP flow and ATM VC in cooperation with RSVP are shown in figure 1A. Dotted arrow designated as "Bypass" signifies such mapping information exchange. Actual protocol for the information exchange may be query/response type or notification/ack type. More detailed descriptions for this sort of information exchange have been proposed Katsube, et al. Expires March 6th, 1996 [Page 21] Internet Draft Sept. 6th, 1995 in [GOTO95] and [KATSUBE95] although they need further investigations. When you use SVCs as components of the Bypass-pipe, SVC setup procedure through ATM signaling should be required before "Bypass" messages. Whether "Bypass" message is transmitted over a Dedicated-VC which will be concatenated (inband) or over a distinct VC (outband) depends on; - whether the message should be transmitted periodically (soft- state) after concatenation has been carried out, and - whether the Bypass-pipe release should be carried out by message or a sort of timeout mechanism. It should be noted that information exchange for constitution of the Bypass-pipe must not mandate success of Bypass-pipe setup on the process of IP resource reservation, since what end hosts are requesting is not the provision of Bypass-pipe but the provision of end-end bandwidth/QoS. Some routers may try to utilize ATM cut- through forwarding with Bypass-pipe to realize bandwidth/QoS request while some routers may not have cut-through capabilities but utilize conventional IP forwarding (scheduling). Ethernet ATM ATM Ethernet || +--------+ +--------+ || || S3---| | | |---D3 || || | |--R3--| | || S2 ---||---R2---| | | |---R4---||--- D2 || +--------+ +--------+ || subnet subnet subnet subnet #2 #3 #4 #5 | | Path | | Path | | Path | | Path | | | |------->| |----------->| |----------->| |------->| | | | | | | | Resv | | Resv | | | | | | | |<-----------| |<-------| | | | | | Resv | | Bypass | | | | | | | |<-----------| |<==========>| | | | | | Resv | | Bypass | | | | | | | |<-------| |<==========>| | | | | | | | | | | | | | | | Figure A1 Protocol Example Utilizing RSVP Katsube, et al. Expires March 6th, 1996 [Page 22] Internet Draft Sept. 6th, 1995 When you use the protocol based on RSVP, the route of Bypass-pipe may vary dynamically in accordance with the route of RSVP Path messages which is based on IP routing table. Datagram forwarding mechanism when the route of the Bypass-pipe changes at an intermediate CSR should be carefully designed in order not to discard or mis-order cells due to the alteration of an incoming and outgoing VC mapping. Katsube, et al. Expires March 6th, 1996 [Page 23]