Network Working Group Toshiyuki Kubo Internet Draft Katsuyoshi Iida Expiration Date: August 2003 Nara Institute of Science and Technology February 2003 Path based route identification in SCTP draft-toshiyuki-path-sctp-00.txt Status of this Memo This document is an Internet-Draft and is subject to all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Abstract Although SCTP provides multihoming function, a single point of failure may occur due to its lack of source address selection. RFC2960 describes that "a host should pick the most divergent source- destination pair." But, it also describes that selecting source address is implementation matter. In RFC3257, Coene et al. have claimed that a host must take careful in selectiong its source address. In this document, we propose a way of outgoing interface selection and its corresponding source address selection in SCTP. Toshiyuki, et al. [Page 1] Internet Draft February 2003 Table of Contents Section 1. Introduction . . . . . . . . . . . . . . . . . . . . . . 2 Section 2. Single point of failure in SCTP . . . . . . . . . . . . . 5 Section 3. Path based route identification . . . . . . . . . . . . 7 Section 3.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . 7 Section 3.2 Path parameters notification . . . . . . . . . . . . . . 8 Section 3.3 Path choosing algorithm . . . . . . . . . . . . . . . . 10 Section 4. Security Considerations . . . . . . . . . . . . . . . . . 12 Section 5. Acknowledgments . . . . . . . . . . . . . . . . . . . . . 12 Section 6. References . . . . . . . . . . . . . . . . . . . . . . . 12 Section 7. Authors Information . . . . . . . . . . . . . . . . . . . 13 1. Introduction Although SCTP [RFC2960] provides multihoming function, a single point of failure may occur due to its lack of source address selection. RFC2960 describes that "a host should pick the most divergent source- destination pair." But, it also describes that selecting source address is implementation matter. In [RFC3257], Coene et al. have claimed that a host must take careful in selectiong its source address. To avoid this problem, Coene et al. have shown an example of routing table setting [SCTP-MULTIHOME], as a solution of "single point of failure". We explain this using Fig. 1.1. In the figure, there are two endpoints: Endpoints A and Endpoints B. Both of endpoints have two interfaces and hence two addresses. The routing table to avoid single point of failure is described in Tab. 1.1. Both routing table contain all address of their corresponding endpoint and vice versa. +------------+ *~~~~~~~~~* +------------+ | Endpoint A | * Cloud * | Endpoint B | | 1.2 +---------+ 1.1 3.1 +----------+ 3.2 | | | | | | | | 2.2 +---------+ 2.1 4.1 +----------+ 4.2 | | | * * | | +------------+ *~~~~~~~~~* +------------+ Figure 1.1: Two hosts with redundant networks. Toshiyuki, et al. [Page 2] Internet Draft February 2003 Table 1.1: Routing tables in two hosts to avoid single point of failure Endpoint A Endpoint B Destination Gateway Destination Gateway ------------------------ ------------------------ 3.0 1.1 1.0 3.1 4.0 2.1 2.0 4.1 There are two drawbacks for this solution. First, these routing table setting depends on each SCTP association. This means we need to have different routing settings for different SCTP association. Figure 1.2 and Tab. 1.2 illustrate this difficulty. There are two associations: Endpoint A <-> Endpoint B and Endpoint A <-> Endpoint C. Here, End- point A is required to have different entries for each peer of end- points. In this way, this method is not practical because routing ta- ble setup is needed before each association establishment. +------------+ *~~~~~~~~~* +------------+ | Endpoint A | * Cloud * | Endpoint B | | 1.2 +---------+ 1.1 3.1 +----------+ 3.2 | | | | | | | | 2.2 +---------+ 2.1 4.1 +----------+ 4.2 | | | * 5.1 6.1 * | | +------------+ *~~~+~~+~~* +------------+ | | | | +----+--+----+ | 5.2 6.2 | | | | | | Endpoint C | | | +------------+ Figure 1.2: Three hosts with redundant networks. Toshiyuki, et al. [Page 3] Internet Draft February 2003 Table 1.2: Routing tables in three hosts to avoid single point of failure. Endpoint A Endpoint B Destination Gateway Destination Gateway ------------------------ ------------------------ 3.0 1.1 1.0 3.1 4.0 2.1 2.0 4.1 5.0 1.1 6.0 2.1 Endpoint C Destination Gateway ------------------------ 1.0 5.1 2.0 6.1 Second, when the number of interfaces of an endpoint is different from that of its corresponding endpoint for an association, the num- ber or routes to be used becomes the smaller number of interfaces. Therefore, we cannot take the advantage of redundancy of the endpoint with the larger number of interfaces. In this document, we propose a way of outgoing interface selection and its corresponding source address selection in SCTP. The main fea- ture of our proposal is the simultaneous address switching, i.e., when an endpoint switches its destination address, then it also changes its source address simultaneously. We call a combination of source address and destination address "path". In order to implement the path based route identification, we need to modify both IP routing and route identification mechanisms in SCTP. However, the required modification of IP routing is very simple: End- point must have multiple default routes, each of which corresponds to each outgoing interface. This avoids the impracticality mentioned before. Meanwhile, the current SCTP endpoint identifies "a destination address" as a route. Instead, in our proposal, an endpoint identifies "a path", that represents a route to be used in failover. This enables us the correct data paths without complicated routing setup. Moreover, each endpoint maintains a set of parameters that represents route characteristics for each path. The parameters includes error counter, cwnd, and RTO. The advantages of our scheme are the followings. o It can avoid the "single point of failure". Toshiyuki, et al. [Page 4] Internet Draft February 2003 o Routing setup for each association establishment is not needed because of multiple default routes. o Since we choose the number of paths as the larger number of interfaces, we can take full advantage of redundancy of two endpoints. This is particularly useful for the two-to-one association, that consists of an endpoint with two interfaces and that with one. With the original SCTP, the association has no redundancy. o Since we do not establish the full-mesh paths, the amount of state increases becomes negligible. There are some scenarios that our proposal is useful. o VPN. VPN needs to have long term association setup. For exam- ple, endpoint A subscribes ISP X (with address x.a) and ISP Y (with y.a), and endpoint B also subscribes ISP X (with x.b) and ISP Y (with y.b). In this scenario, Only two paths are enough, say, (x.a <-> x.b) and (y.a <-> y.b). In this way, it can avoid to use meaningless routes including (x.a <-> y.b) and (x.b <-> y.a). o Mobile scenario. Suppose the two-to-one topology, and the end- point with two interfaces is mobile station. In this example, two different error counters are needed on the mobile station because it has two different wireless interfaces. Our solution provides different error counters for each outgoing inter- faces, although this cannot be provided by the original SCTP. 2. Single point of failure in SCTP One of the reasons behind the SCTP standardization is to increase the redundancy of a level of the transport protocol. However, there are many situations that cannot be solved by the current SCTP. In this section, we explain what are situations and/or what kind of "single point of failure" we are focusing on. Toshiyuki, et al. [Page 5] Internet Draft February 2003 +------------+ *~~~~~~~~~* +------------+ | Endpoint A | * Cloud * | Endpoint B | | 1.2 +---------+ 1.1<--+ | | | | | | |->3.1|----------+ 3.2 | | 2.2 +---------+ 2.1<--+ | | | | | * * | | +------------+ *~~~~~~~~~* +------------+ Figure 2.1: Two-to-one topology. Table 2.1: Typical routing tables in two-to-one topology. Endpoint A Endpoint B Destination Gateway Destination Gateway ------------------------ ------------------------ default 1.1 default 3.1 1.0 direct 3.0 direct 2.0 direct In Fig. 2.1, we illustrate an association between Endpoint A and End- point B, where Endpoint A has two interfaces with addresses 1.2 and 2.2, and Endpoint B has one interface with address 3.2. In other words, Endpoint A is multihomed, but Endpoint B is not. We call this configuration "two-to-one topology". Here, the link between 3.1 and 3.2 becomes a single point of failure, but this single point of fail- ure is natural and unavoidable. Therefore, we do not care about this in the remaining of this document. We then explain what kind of "single point of failure" we are focus- ing on using an example of the two-to-one topology. In this example, our target "single point of failure" is the link between 1.1 and 1.2. To explain this we first describe typical routing tables of both end- points in Tab. 2.1, since single point of failure we focus is strongly related to the IP routing. According to Endpoint A's routing table, packets from Endpoint A are going out at 1.2, although Endpoint A may receive packets at 1.2 and 2.2. Thus, there is only one route actually used in the direction from Endpoint A to Endpoint B, while there is an alternate route in the same direction. In this way, the link between 1.1 and 1.2 becomes "single point of failure", although it is avoidable. Therefore, when the link becomes down, all the packets going through from Endpoint A to Endpoint B will be lost in a consequence of the association fail- ure. Note that a character of "single point of failure" we focus on is that a HEARTBEAT and its HEARTBEAT-ACK form a triangle: 3.2 -> 2.2 -> Toshiyuki, et al. [Page 6] Internet Draft February 2003 1.2 -> 3.2. This happens when Endpoint B choose 2.2 as destination of a HEARTBEAT. Moreover, "single point of failure" we focus on also occurs in cases with many other topological configurations e.g. the two-to-two topol- ogy. Among them, the two-to-one topology is the simplest configura- tion. 3. Path based route identification 3.1 Basics We then propose a path based route identification in SCTP to avoid "single point of failure" without a routing table setup for each association. The proposal consists of both IP layer's and SCTP layer's modifications. Meanwhile, there is no solid definition of terminology of "path" in the current literature of SCTP, although the term appears in many documents. Therefore, we would like to define "path" first. Our def- inition of "path" is a combination of a source address and a destina- tion address. This enables us to differentiate two different source addresses even with the same destination address. For example, in Fig. 2.1, there exits two paths (1.2 - 3.2) and (2.2 - 3.2). As a consequence, Endpoint A with path based route identifi- cation will continue successful packet deliveries even when the link between 1.1 and 1.2 fails. To implement the path based route identification in SCTP, modifica- tions of the IP routing are required because the traditional IP rout- ing cannot differentiate two source addresses with the same destina- tion address. This can be achieved by source address attaching enhancement of routing table, that is illustrated in Tab 3.1 as an example of the two-to-one topology in Fig. 2.1. Toshiyuki, et al. [Page 7] Internet Draft February 2003 Endpoint A Source Destination Gateway ---------------------------------------- 1.2 default 1.1 2.2 default 2.1 1.2 1.0 direct 2.2 2.0 direct Endpoint B Source Destination Gateway ---------------------------------------- 3.2 default 3.1 3.2 3.0 direct Table 3.1. Source address attached routing tables in two-to-one topology. When Endpoint A's SCTP stack selects the path (1.2 -> 3.2), it employs the entry of the first line. Otherwise, it chooses the entry of the second line. We note that "path" can be bidirectional, i.e., associations maintain each direction of path. Thus, the numbers of routes for both direc- tions completely agree. Finally, for information those of who would like to implement this, there are some implementations of source address attaching enhance- ment of IP routing [STAR]. Also see our paper that deals with the implementation [PATH-MANAGEMENT]. In summary of this section, we have described IP layer's modifica- tion, which is required to avoid "single point of failure". We then explain SCTP layer's modification in the rest of this sections. 3.2 Path parameters notification Next, we propose a minor modification of the INIT-ACK phase to real- ize the path based route identification in SCTP. To ensure consis- tency of path parameters in both endpoints, the endpoint that receives the INIT chunk first creates some paths as combinations of one of own addresses and one of address contained by INIT chunk. Then it compiles the path parameters into the INIT-ACK chunk. More details about path creation are described in Section 3.3. Figure 3.1 illustrates proposed path parameters that are attached with the INIT-ACK chunk. There are two kinds of path parameters for IPv4 and for IPv6. Here we define an "initiator" endpoint as the Toshiyuki, et al. [Page 8] Internet Draft February 2003 endpoint that sent the INIT chunk, a "responder" endpoint as the end- point that receives the INIT chunk. As we already described in the previous section, a path is a combination of a source address and a destination address, i.e., a combination of one of initiator addresses and one of responder addresses. (a) IPv4 Path Parameter (0xb000) 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type = 0xb000 | Length = 12 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Initiator Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Responder Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ IPv4 Initiator Address: 32 bits (unsigned integer) IPv4 Responder Address: 32 bits (unsigned integer) (b) IPv6 Path Parameter (0xb001) 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type = 0xb001 | Length = 36 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | Initiator Address | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | Responder Address | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ IPv6 Initiator Address: 128 bits (unsigned integer) IPv6 Responder Address: 128 bits (unsigned integer) Figure 3.1. Proposed path parameters for IPv4 and IPv6. To deliver path information from the responder to the initiator, we introduce a new parameter, called path parameter that includes a pair of initiator address and responder address. If the responder creates four paths, then four path parameters will be included in total. Toshiyuki, et al. [Page 9] Internet Draft February 2003 When the responder endpoint sends out the INIT-ACK chunk, it does not store any path related information in its memory, to protect DoS packets as same as the original SCTP. Therefore, the initiator end- point saves this information into a cookie data. More precise descriptions about our modifications are shown in Fig. 3.2 as a form of BNF notations. ::= | ::= | | | ::= ::= ::= ::= ::= | | ::= | ::= Figure 3.2. BNF notations of proposed INIT-ACK chunk Note that the initiator endpoint can piggy-back user data into the COOKIE-ECHO chunk, also as same as the original SCTP. This is the reason why we decide the responder creates the path parameters when it receives the INIT chunk. 3.3 Path choosing algorithm Theoretically, in case with the full-mesh paths, the maximum number of paths is equal to n * m, where n is the number of interfaces of the initiator, and m is that of the responder. Therefore, the amount of memory required to maintain paths may be exhausted if we choose the full-mesh paths with large n and/or large m. Moreover, it gives us excessive redundancy. For example, in case with the two-to-two topology, the number of paths becomes four. Since at least one alter- native path is enough for redundancy, the full-mesh path seems to be excessive. Here, we propose a path choosing algorithm, which chooses a sub set of paths from the full-mesh paths. Our algorithm is very simple but effective to increase the network redundancy with less memory con- sumption. We note that reason why we call "path choosing algorithm" is that the full-mesh paths are defined naturally, and our doing is just choosing some parts of the full-mesh paths. Toshiyuki, et al. [Page 10] Internet Draft February 2003 We first define some symbols. Let a_1, a_2, ..., a_n be the initiator addresses in the order written in the INIT chunk. We denote b_1, b_2, ..., b_m are the responder addresses in the order written in the path parameters explained in Section 3.2. Then, we describe our algorithm in the followings. 1) Choose paths (a_1, b_1), (a_2, b_2), ... (a_l, b_l), where l = n if (n < m), l = m if (m > n), otherwise l = m = n. 2) if (n < m), choose paths (a_n, b_(n+1)), (a_n, b_(n+2)), ..., (a_n, b_m). 3) if (m < n), choose paths (a_(m+1), b_m), (a_(m+2), b_m), ..., (a_n, b_m). In our algorithm, the number of chosen paths is equal to larger value of n or m. This is different from the original SCTP. In the original SCTP, the number of unidirectional routes from the initiator to the responder may be different from that in the reverse direction. Therefore, the num- ber of bidirectional routes is equal to smaller value of n or m. This is another representation that indicates our proposal gives better redundancy than the original SCTP. For example, in the two-to-one topol- ogy, we have no redundancy on the original SCTP. However, our proposal establishes two bidirectional routes. In summary, comparisons in terms of the number of bidirectional routes among some algorithms are shown by (original SCTP) < (our proposal) < (full-mesh path). This means the full-mesh path gives us the maximum redundancy but costly. On the other hand, our algorithm pays small amount of cost to attain an appropriate level of redundancy. One drawback of our algorithm is that it is difficult to avoid undesir- able combination of initiator address and responder address. Consider the two-to-two topology (Fig. 1), and suppose the following situations. 1.2 and 4.2 are belonging to the same ISP X, or 2.2 and 3.2 are belong- ing to the same ISP Y. Since we fix which path should be chosen, we can- not avoid a path establishment between an address with ISP X and that with ISP Y. This is not desirable. Therefore, more sophisticated path choosing algorithm is needed to meet this requirement. Toshiyuki, et al. [Page 11] Internet Draft February 2003 4. Security Considerations There is no security consideration. 5. Acknowledgments We would like to express our gratitude to Mr. Randall Stewart of Cisco Systems for his valuable comments on source address selection. We also thank to Associate Professor Youki Kadobayashi of NAIST for his comments on the path choosing algorithm and for his tremendous encouragements. We are deeply indebted to Professor Suguru Yamaguchi of NAIST for his untiring supports of this project. Great thanks go to our project members: Mr. Shigeru Kashihara, Mr. Takashi Nishiyama and Mr. Kosuke Hata of NAIST for their assistance of our development and for their stimulating discussions. 6. References [RFC2960] R. Stewart, Q. Xie, K. Morneault, C. Sharp, "Stream Control Transmission Protocol.", RFC 2960, October 2000. [RFC3257] L. Coene, "SCTP Applicability Statement.", RFC 3257, April 2002. [SCTP-MULTIHOME] Coene, et al, "Multihoming issues in the Stream Control Trans- mission Protocol.", , February 2002. [STAR] T. Kaji, "Development and Implementation of routing strategy switching mechanism for regional IX."(in japanese), Master's thesis, Nara Institute of Science and Technology, March 1998. [PATH-MANAGEMENT] T. Kubo, S. Kashihara, K. Iida, Y. Kadobayashi, S. Yamaguchi, "Path Management of SCTP to Eliminate Single Point of Failure in Multihoming.", In Proc. IEEE 5th International Conference on Advanced Communication Technology(ICACT2003), Phoenix Park, Korea, January 2003. http://sctp.aist-nara.ac.jp/icact2003.pdf Toshiyuki, et al. [Page 12] Internet Draft February 2003 7. Authors Information Toshiyuki Kubo Graduate School of Information Science, Nara Institute of Science and Technology (NAIST) 8916-5 Takayama, Ikoma 630-0192, Japan. Tel: +81-743-72-5216, Fax: +81-743-72-5219 E-mail: toshi-ku@is.aist-nara.ac.jp Katsuyoshi Iida Graduate School of Information Science, Nara Institute of Science and Technology (NAIST) 8916-5 Takayama, Ikoma 630-0192, Japan. Tel: +81-743-72-5213, Fax: +81-743-72-5219 E-mail: katsu@is.aist-nara.ac.jp Full Copyright Statement Copyright (C) The Internet Society (date). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this doc- ument itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of develop- ing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MER- CHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Toshiyuki, et al. [Page 13]