Network Working Group Robert E. Gilligan Internet Draft Rajkumar Velpuri draft-gilligan-iscsi-fault-tolerance-00.txt Intransa, Inc. Expires: October 2003 Lakshmi Ramasubramanian Alan Warwick Microsoft Corp. Matthew W. Baker Intel Corp. April 2003 iSCSI Implementation Guidelines for Fault Tolerance and Load Balancing using Temporary Redirection Status of this Memo This document is an Internet-Draft and is subject to all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Abstract An approach for achieving fault tolerance and load balancing in iSCSI using the iSNS discovery mechanism or iSCSI discovery session and the temporary redirection mechanism is outlined here. This approach requires no change to iSCSI or other protocols that initiators and targets implement. But the manner in which initiators perform target discovery, support the temporary redirection mechanism, and recover from failed iSCSI sessions affects their ability to support this approach. This paper provides implementation guidelines for iSCSI initiators to follow to support this form of fault tolerance and load balancing. [Page 1] draft-gilligan-iscsi-fault-tolerance-00.txt April 2003 1. Introduction iSCSI can be used in a variety of configurations, including those in which the target is implemented as a distributed collection of nodes, or as a single node with multiple network interfaces. While fault tolerance and load balancing is not directly addressed by the protocol, the protocol does have two features that can be used to help build a fault tolerant solution: the target discovery process and the temporary redirection function. The iSCSI protocol's login response process includes temporary redirection as a required feature. However, the recovery behavior after a redirected session fails or is explicitly terminated by an asynchronous event is not specified. The paper details the behavior that iSCSI initiators may implement to enable a fault tolerant solution based on iSNS discovery or the discovery session and the temporary redirection features. This approach can be used to provide load balancing features as well. The initiator behavior outlined here is fully compatible and compliant with the iSCSI specification. The paper first gives an overview of the approach, then details some examples of the sequence of events that occur when this approach is employed, and finally summarises the features initiators should implement to support temporary redirect based fault tolerant and load balancing systems. 2. Overview of the Approach The solution relies on the initiator to establish a full feature phase session in a procedure that may take up to three steps, then to back up and repeat those steps in reverse order to recover the session if the underlying TCP connection fails or is terminated. The initiator discovers the address or addresses of a target by either querying an iSNS server or by performing a discovery session to the "portal" configured for the target system. A portal, in iSCSI terminology, refers to an IP address and TCP port number pair. We term the address that an initiator connects to in order to perform a discovery session as the "discovery target portal." Both of these processes -- iSNS or discovery session -- return a set of portals for the target that we term the "initial target portals." Next, the initiator initiates a session to the target by trying to open a TCP connection to each of the initial target portals in sequence until one succeeds. Once connected, the initiator logs in. In its login response, the target may direct the initiator to a new portal via the iSCSI temporary redirection function. We term this new portal the "redirect portal" for the target. The initiator then closes the initial connection and attempts to initiate a session with the redirect portal. The negotiation of an iSCSI session to the redirect portal is the final step leading to a normal, full feature session between the initiator and the target, allowing data to flow. If the connection fails, or is terminated by the target by an [Page 2] draft-gilligan-iscsi-fault-tolerance-00.txt April 2003 asynchronous logout message, the initiator performs a sequence of actions to attempt to recover the session. Since this is session recovery, the initiator performs these recovery actions no matter what recovery level has been negotiated. The initiator first attempts to re-connect to one of the initial target portals learned from the iSNS query or during the discovery session, again trying each in sequence until one succeeds. If this fails, the initiator repeats the discovery phase, re-connecting to the discovery target portal and re-running the discovery session, or re-running the iSNS discovery procedure. After repeating the discovery process, the initiator follows the procedure it used at the time of initial connection, eventually connecting to a new redirect target for the target and recovering the full feature session. Fault tolerance in this scheme is achieved by allowing the target system to direct the initiator to a specific network interface within a multi-homed iSCSI target system, or a specific network node in a clustered or distributed iSCSI target system for the duration of the session. In the event of failure, the target system can direct the initiator to a different, healthy interface or node, allowing the session to be recovered. The same mechanism provides load balancing by allowing the target to intelligently instruct the initiator to establish a new session with a less heavily loaded node. This solution relies on initiator support of the iSCSI temporary redirect function. Additionally, the initiator must return to one of the initial target portals in the event that a connection fails or is terminated via an asynchronous logout message, and repeat the discovery process if that fails. The iSCSI specification does not dictate the precise recovery behavior for sessions established following a temporary redirection by a target. Some initiator implementations re-connect back to the same redirect portal after a connection to that portal fails. However, this behavior would prohibit the fault tolerant and load-balancing solution outlined here, and violates the spirit of the temporary redirection function. By returning to the redirect portal, the initiator is treating redirection as greater than temporary, but less than permanent. 3. Example Scenarios To further illustrate how this mechanism works, this section presents the typical sequence of events that occur when a session is begun, when a target node or interface serving the session fails, and when a target decides to move an initiator to a different target node for load balancing. Initial connection sequence of events: 1. Initiator performs the discovery procedure by using iSNS or executing a discovery session. 2. If iSNS is used, the initiator queries the iSNS server, which returns a set of portals for the target. [Page 3] draft-gilligan-iscsi-fault-tolerance-00.txt April 2003 3. If the discovery session is used, the initiator opens a TCP connection to the discovery target portal, logs in and issues the "send targets" commands. The target responds with a list of target names and their associated portals. The initiator or user selects the portals associated with the specific target it is interested in establishing a session with. The initiator terminates the discovery session and closes the associated TCP connection. 4. Whichever discovery procedure is used, the initiator remembers the portals for this target as the "initial target portals". 5. The initiator iterates through the initial target portals list until it succeeds in opening a TCP connection to one of them. 6. The initiator then logs into the target, which may respond with a "target moved temporarily" redirect response, listing the redirect portal for the target. The initiator remembers this as the "redirect portal." The initiator then closes the TCP connection. 7. The initiator then opens a TCP connection to the redirect portal and logs in. The target accepts this login and the session proceeds to full feature phase. 8. Data flow begins. Target node or interface failure sequence of events: 1. The initiator has an iSCSI session established and TCP connection open to the redirect portal. Full feature session in progress. Data is flowing. 2. The target fails. 3. The initiator detects the failure of the TCP connection with the target. 4. The initiator iterates through the list of initial target portals learned in the discovery process until it succeeds in opening a TCP connection to one of them. 5. If the initiator succeeds in connecting to one of the initial target portals, it executes steps 6 and 7 in the "Initial connection sequence of events" section. 6. If the initiator fails to connect to any of the initial target portals, it repeats steps 1 through 7 in the "Initial connection sequence of events" section. 7. Data flow resumes. Overloaded target node sequence of events: [Page 4] draft-gilligan-iscsi-fault-tolerance-00.txt April 2003 1. The initiator has a full feature iSCSI session established, and associated TCP connection open, to the redirect portal. Data is flowing. 2. The target terminates the session with an asynchronous logout message. The initiator closes the TCP connect. 3. If the asynchronous logout message PDU is type 1 (target requests logout), the initiator logs out, closes the TCP connection, and proceeds to step 4 in the "target node failure sequence of events" section. 4. If the asynchronous logout PDU is type 2 (target will drop connection), then Parameter2 (Time2Wait) specifies the time in seconds that the initiator should wait before attempting to re-login. The initiator should wait this time, then proceed to step 4 in the "target node failure sequence of events" section. The distinguished value of 0xFFFF may be used as an indication that the initiator should not re-login without the intervention of the administrator on the initiator. (The protocol provides no other way for the target to signal to the initiator that it does not wish it to re-connect.) In these scenarios, the initiator is called on to determine when a TCP connection with a target has failed, and also when an attempt to open a new TCP connection to a target has failed. For both of these determinations, the iSCSI layer in the initiator can simply rely on the underlying TCP layer's retransmission abort timeout mechanism, or it could implement timeouts of its own. The approach used, and the timeout values that the initiator selects, are highly implementation dependent. For example, some implementations allow applications to select the TCP abort timeout, while others do not. No matter what approach is taken, implementations may wish to make these two failure-determining timeout values configurable so that administrators may tune the system for operation in different environments. 4. Summary of guidelines for initiators To summarize, the mechanisms that initiators should implement to support this approach for fault tolerance and load balancing are: - Provide a target discovery mechanism by implementing either iSNS or the iSCSI target discovery session. - Accept and act upon the iSCSI temporary redirect login response. - If a session TCP connection to a redirect portal fails, try to re-connect to the initial target portals. - If a session is terminated by the target with an asynchronous logout message, try to re-connect to the initial target portals. [Page 5] draft-gilligan-iscsi-fault-tolerance-00.txt April 2003 - If attempts to connect to the initial target portals fail, re-run the discovery mechanism. 5. Acknowledgements Thanks to Bill Nowicki of Intransa, and Hari Mudaliar of Adaptec for their helpful comments on early revisions of this paper. 6. Security considerations The initiator must perform authentication on every login. 7. References [ISCSI] J. Satran, K. Meth, C. Sapuntzakis, M. Chadalapaka, E. Zeidner, ``iSCSI'', draft-ietf-ips-iscsi-20.txt, Work in progress. [ISNS] Josh Tseng, Kevin Gibbons, Franco Travostino, Curt Du Laney, Joe Souza, ``Internet Storage Name Service (iSNS)'', draft-ietf-ips-isns-17.txt, Work in progress. Authors Address Robert E. Gilligan Intransa, Inc. 2870 Zanker Road San Jose, CA 95134 Email: gilligan@intransa.com Phone: 408-678-8647 Rajkumar Velpuri Intransa, Inc. 2870 Zanker Road San Jose, CA 95134 Email: Rajkumar.velpuri@intransa.com Phone: 408-678-8641 Lakshmi Ramasubramanian Microsoft Corp. One Microsoft Way Redmond, WA 98052 Email: nramas@microsoft.com Phone: 425-703-7559 Alan Warwick Microsoft Corp. One Microsoft Way Redmond, WA 98052 Email: alanwar@microsoft.com Phone: 425-706-0230 Matthew W. Baker Intel Corporation 1501 South Mopac Expressway, #400 Austin, TX 78746 Email: matt.w.baker@intel.com Phone: 512-732-1306