Network Working Group Y. Rekhter Request for Comments: DRAFT T.J. Watson Research Center, IBM Corp. Editor 10/21/92 Version 3.6 IP and ARP on Fibre Channel (FC) Status of this Memo This document specifies a standard method of encapsulating the Internet Protocol (IP) [1] datagrams and Address Resolution Protocol (ARP) [2] requests and replies on FC hardware and protocols [3]. This RFC specifies an IAB standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "IAB Official Protocol Standards" for the standardization state and status of this protocol. Distribution of this document is unlimited. This document is an Internet Draft. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts. Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a "working draft" or "work in progress". 1 Acknowledgements This document would not exist without significant contributions of Bryan Cook (IBM Corp.), Martin Sachs (IBM Corp.), and Beth Vanderbeck (IBM Corp.). We would also like to thank Greg Nordstrom (IBM Corp.), Jerry Rouse (IBM Corp.), Paul Griffiths (IBM Corp.), and Lansing Sloan (LLNL) for their review and constructive comments. Certain parts of this document were taken from "IP and ARP on HIPPI" [5] written by J. Renwick and A. Nicholson. Expiration Date March 1993 [Page 1] - 2 - 2 Introduction Fibre Channel [3] describes the point-to-point physical interface, transmission protocol, and signaling protocol of a high-performance serial link for support of the higher level protocols associated with IP, IPI, SCSI and others. The Fibre Channel is logically a bidirectional point-to-point serial data channel, optimized for transfers of large blocks of data. Physically, the Fibre Channel can be an interconnection of multiple communication points, called N-ports, interconnected by a switched network, called a Fabric, or a point-to-point link. Fibre Channel is structured as a set of hierarchical functions grouped into several levels. These levels are organized as follows: - FC-0 defines the physical portions of the Fibre Channel. - FC-1 defines the transmission protocol - FC-2 defines the signaling protocol which includes the frame structure, and byte sequence - FC-3 defines a set of services which are common across multiple ports of a node. - FC-4 defines the channel protocol, or mapping between the lower levels of the Fibre Channel and Upper Level Protocols (ULPs). Of these levels, FC-0, FC-1, and FC-2 are integrated into the FC-PH document [3]. The reader of this document is assumed to be familiar with the relevant parts of the FC-PH document. A Fibre Channel Node may support one or more N_Ports and one or more FC-4s. Each N_Port contains FC-0, FC-1 and FC-2 functions. FC-3 optionally provides the common services to multiple N_Ports and FC- 4s. A single N_Port may support one or more FC-4s. Although the FC-4 defined by this document can be used by other protocol stacks, for convenience, we refer to it herein as the IP FC-4. The IP FC-4 defines a mapping between two particular Upper Level Protocols (ULPs), IP and ARP, and the lower levels of the Fibre Channel. Expiration Date March 1993 [Page 2] - 3 - 3 Scope The document focuses solely on the issues related to running IP and ARP as ULPs over FC. Within the scope of this document are - mechanisms to exchange IP and ARP packets - constraints on FC-2 Frame Header parameters - mechanisms to resolve IP address to physical address mapping - mechanisms to ensure fair access to resources (N_Ports) All other issues are outside the scope of this document. In particular, the following issues are not discussed in this document. - vendor dependent solutions for ARP server - supporting IP multicast over FC - network configuration and management - interaction with other FC-4s (e.g. SCSI) running over the same N_Port - IEEE 802 MAC Layer Bridging - Full support for IEEE 802.2 LLC 4 Definitions Class 1 service: A service which establishes a dedicated connection between communicating N_Ports. Class 2 service: A service that multiplexes frames at frame boundaries to or from Expiration Date March 1993 [Page 3] - 4 - one or more N_Ports with acknowledgement provided. Class 3 service: A service that multiplexes frames at frame boundaries to or from one or more N_Ports without acknowledgement. Destination Identifier: The address identifier used to indicate the targeted destination of the transmitted frame. Destination N_Port: The N_Port to which a frame is targeted. Exchange: The basic mechanism which transfers information consisting of one or more related Information Units. An Exchange may span multiple Class 1 Dedicated Connections. The Exchange is identified by an Originator Exchange Identifier (OX_ID) and a Responder Exchange Identifier (RX_ID). Exchange Identifier: A generic reference to OX_ID or RX_ID (see Exchange). Fabric: The entity which interconnects various N_Ports attached to it and is capable of routing frames by using only Destination Identifier in the FC-2 frame header. Information Unit: An organized collection of one or more Information Categories which an Upper Layer Protocol identifies to FC-PH. Link_Control_Facility: A link hardware facility which attaches to an end of a link and manages transmission and reception of data. It is contained within each N_Port. Node: A collection of one or more N_Ports controlled by a level above FC-2. A node may be an Internet host or a router. N_Port: A hardware entity which includes a Link_Control_Facility. It may act as an Originator, a Responder, or both. N_Port Identifier: A Fabric unique address identifier by which an N_Port is uniquely known. The identifier is used in the Source Identifier and Destination Identifier fields of a frame. Originator: Expiration Date March 1993 [Page 4] - 5 - The logical function associated with an N_Port responsible for originating an Exchange. Process_Associator: A value used in the Association_Header to identify a process within a node. Process_Associator is the mechanism by which a process is addressed by another communication process. Responder: The logical function associated with an N_Port responsible for supporting the Exchange initiated by the Originator in another N_Port. Source_Identifier: The address identifier used to indicate the source of the transmitted frame. 5 Design Objectives This document describes the specific feature sets of a Fibre Channel that must be used, so that any conformant Fibre Channel Node implementation has some assurance of being able to interoperate at the IP level with any other conformant implementation. 6 FC-2 Frame Header Parameters This document places the following constraints on the value of the FC-2 Frame Header fields when used by IP FC-4 (both by IP and by ARP). - Routing Bits of the R_CTL field must indicate Device_Data frame (0000). - Information Category of the R_CTL field must indicate Unsolicited Data (0100). - The TYPE field must indicate IEEE 802.2 LLC/SNAP Encapsulation (0000 0101). - The DF_CTL field shall indicate the presence of the Network_Header. - The Abort Sequence Condition of the F_CTL field shall indicate in the first data frame of an Exchange "Abort, Discard policy Expiration Date March 1993 [Page 5] - 6 - requested". Use of the IEEE 802.2 LLC/SNAP Encapsulation for IP and ARP prescribed by this document shall not be viewed as a hindrance for using the same encapsulation technique by other protocol stacks (e.g. IPX, AppleTalk). A conformant implementation is required to send the Network_Header. When sending an ARP packet the source and destination network addresses in the Network_Header may be in IEEE or CCITT format. If neither IEEE nor CCITT formats are required, as determined by the sender, the Destination Network_Address_Authority and Source Network_Address_Authority fields of the Network_Header shall be set to zeros. When sending an IP packet the source and destination network addresses in the Network_Header may be in IEEE, CCITT, or IP format. If neither IEEE nor CCITT formats are required, as determined by the sender, the IP format shall be used. The procedures for determining whether IEEE or CCITT formats are required (either for IP or for ARP packets) are outside the scope of this document. A recipient may ignore the content of the Network_Header. If a node sends IP packets to an N_Port, and the address resolution procedure for that N_Port indicates that the N_Port has a non-null Initial Process Associator (see Section 9), the node is required to use the Association Header on the first Information Unit of an Exchange with the value of the Responder Process Associator field of the Association Header being set to the value of the Initial Process Associator. Content of the other fields in the Association Header is outside the scope of this document. A conformant implementation may also send the Expiration/Security Header. The content of this header is outside the scope of this document. A conformant implementation shall not send the Device Header. Setting of all other parameters in the FC-2 Frame Header is outside the scope of this document. 7 Initializing IP Packet Exchange In order for a node attached to a Fabric to be able to send or receive IP and/or ARP packets, the node shall establish its operating environment with a Fabric, if present, and other destination N_Ports with which it communicates. This is accomplished via Fabric Login and destination N_Port Login procedures. Either implicit or explicit Login procedure is acceptable. The procedures for a node to obtain its N_Port Identifier(s) (N_Port Expiration Date March 1993 [Page 6] - 7 - ID(s)) are outside the scope of this document. Setting of all Common Login Service Parameters is outside the scope of this document. Setting of all N_Port Login Service Parameters for Fabric Login is outside the scope of this document. Setting of all N_Port Login Service Parameters for N_Port Login is outside the scope of this document. 8 Sending IP and ARP packets After a node has successfully completed the Fabric Login procedure and the Destination N_Port Login procedure, the node shall check the responses to the Fabric Login and the N_Port Login. If the responses indicate that the node can communicate with the Fabric and with the Destination N_Port, the node may send IP and/or ARP packets to the node associated with the Destination N_Port. A node sends an IP or an ARP packet by forming an Information Unit that consists of the IEEE 802.2 LLC and SNAP headers followed by the IP (ARP) packet itself. There is a one-to-one mapping between an IP (ARP) packet and an Information Unit. The fields in the LLC header shall contain the following values. - SSAP (8 bits) shall contain 170 (decimal). - DSAP (8 bits) shall contain 170 (decimal). - CTL (8 bits) shall contain 3 (Unnumbered Information). The fields in the SNAP header shall contain the following values. - Organization Code (24 bits) shall be zero. - EtherType (16 bits) shall be set as defined in Assigned Numbers [4] (IP = 2048, ARP = 2054, RARP = 32,821). The base relative offset for each Information Unit shall be zero. Expiration Date March 1993 [Page 7] - 8 - 8.1 Use of Exchanges Interchange of the IP (ARP) packets (in the form of Information Units) between a pair of N_Ports is coordinated via Exchanges. To improve performance this document specifies that an Exchange shall be used in a unidirectional mode (this is because an Exchange is inherently half-duplex). Only the Exchange Originator is allowed to send IP and/or ARP packets. Thus, to support bidirectional IP traffic between a pair of N_Ports, separate Exchanges are required in each direction. Each N_Port shall originate one or more Exchanges which it uses to send IP and/or ARP packets to the other N_Port. Support for multiple concurrent Exchanges is optional. A possible scheme that utilizes multiple concurrent exchanges is described in Appendix A. An Exchange used for IP and/or ARP packets shall be used solely for IP and/or ARP packets. 8.2 Errors and Exception Conditions at the Exchange Responder If the Stop Sequence protocol is used during IP communication, the Information Unit may be discarded by the N_Port or may be passed to the IP FC-4 with an indication that it is damaged. The Sequence Recipient provides no indication to the Sequence Initiator as to why the sequence was stopped. Whenever the Sequence Recipient terminates an Information Unit with the Abort Sequence condition, the Sequence Recipient shall return initiative for this Exchange to the Sequence Initiator in the BA_ACC response to Abort Sequence. 8.3 Errors and Exception Conditions at the Exchange Originator If the sending N_Port receives an ACK with the Stop-Sequence indication, it performs the Stop-Sequence protocol defined in [3]. The Information Unit may be retransmitted. The sending N_Port retains Sequence Initiative for this Exchange. Whenever the sending N_Port receives the BA_ACC response to its Abort-Sequence (due to performing the Abort-Sequence protocol) the sending N_Port retains Sequence Initiative for this Exchange. The damaged Information Unit may be retransmitted. Expiration Date March 1993 [Page 8] - 9 - 9 Address Resolution Procedures An IP address may correspond to a single N_Port or to a group of N_Ports attached to a single node. In the latter case the group of N_Ports associated with a given IP address is required to be attached to the same region (see Section 12). The method by which a node obtains its own IP address(es) is outside the scope of this document. Conceptually a hardware address used within the context of address resolution is formed by a tuple. Address resolution procedure provides mapping between tuples and IP addresses. Multiple tuples may be associated with a single IP address. Each tuple forms an indivisible unit of information. That is, when sending FC-2 Frames an N_Port shall not use an N_Port Identifier from one tuple and an Initial Process Associator from another tuple. If an IP address is associated with multiple tuples, then procedures for deciding what tuple to use are a local matter. For the purpose of determining the destination tuple(s) associated with the destination IP address, or the next hop IP address (if the destination is on a different subnet), a node may use the techniques described in Section 9.1 and in Section 9.2. 9.1 Local Mapping A node shall provide the capabilities to maintain local mapping between IP addresses and tuples of other nodes attached to the Fabric. The source of the information for constructing such a mapping is outside the scope of this document. A conformant implementation is required to support local mapping capabilities. Expiration Date March 1993 [Page 9] - 10 - 9.2 ARP Server This section describes how the mapping may be realized by using ARP [2]. This document assumes that each region (see Section 12) has an entity that is capable of performing the mapping. Such an entity is referred to as an ARP Server. The ARP Server maintains mapping between tuples and IP addresses. The ARP request shall contain all the tuples associated with the originator IP address. Upon receipt of the ARP request, the ARP Server constructs the ARP Reply and sends it back to the node. The ARP Reply shall contain all the tuples associated with the requested IP address. To provide the ARP Server with the information about mapping between tuples and IP addresses of the nodes, a node shall register with the ARP Server its IP address(es) and the tuples associated with that IP address(es) immediately upon successful completion of the Fabric Login Procedure and Login with the ARP Server. The registration of an IP address shall be accomplished by sending an ARP Request to the ARP Server. The ARP Request shall contain the requester's own IP address in the ar$tpa field. The requester shall retransmit this ARP Request until it receives an ARP Reply that contains all the tuples carried in the ARP Request. If an N_Port requires the Initial Process Associator to be used for the demultiplexing of incoming IP data, then in the ARP registration request the Initial Process Associator part of the tuple shall be filled with the appropriate value. Otherwise, a null Initial Process Associator shall be used. The information in this request is intended to be registered by the ARP Server. The order in which tuples are listed in the ARP Request/Reply packets is irrelevant. When an N_Port is connected to another N_Port by a single dedicated link (point-to-point topology), the respective Nodes shall be able to perform the Address Resolution Protocol function, so that each Node shall be able to register with the Node at the other end of the link. An N_Port attached to a dedicated link shall ignore ARP Requests sent by the N_Port itself. Expiration Date March 1993 [Page 10] - 11 - The ARP Server shall use well-known N_Port Identifier of hex "FFFFFC". 9.2.1 ARP Message Format To provide alignment an N_Port ID (3 octets) is encoded as four octets with the leading octet filled with zeros. ar$hrd (16 bits) shall contain %%TBD by IANA%% ar$pro (16 bits) shall contain the IP protocol code 2048 (decimal). ar$hln (8 bits) in ARP requests shall contain 12 * number of tuples associated with the IP address of the requester. In ARP replies it shall contain 12 * number of tuples associated with the IP address of the ARP Request target. Note that due to the size of the ar$hln field the number of tuples carried in a single ARP request/reply is limited to 21. For ARP requests and for ARP replies the value of this field is equal to the length (in octets) of the ar$sha and the ar$tha fields (both of these fields have the same length). ar$pln (8 bits) shall contain 4. ar$op (16 bits) shall contain 1 for requests, 2 for responses. ar$sha (variable) in ARP requests shall contain the list of all the tuples associated with the requester. In ARP replies it shall contain the list of all the tuples associated with the target of the original ARP Request (as specified in the ar$tpa field of an ARP Request). ar$spa (32 bits) in ARP requests shall contain the requester's IP address. In ARP replies it shall contain the IP address of the ARP Request target. ar$tha (variable) in ARP requests shall be filled with zeros. In ARP replies it shall contain an tuple associated with the node which originated the ARP Request with the rest of the field, if any, being filled with zeros. Expiration Date March 1993 [Page 11] - 12 - ar$tpa (32 bits) in ARP requests shall contain the IP address of the ARP Request target. In ARP replies it shall contain the IP address of the Node that originated the ARP Request. 10 Fair Access and Resource Starvation The following rules for Exchange management are intended to ensure frequent, fair access to a node for which multiple other nodes are contending. An Exchange Originator or an Exchange Responder may terminate an Exchange for lack of resources (Exchange Status Blocks). The decision to terminate an Exchange is a local matter. The procedures for terminating an Exchange are defined in [3]. Appendix B contains guidelines for ensuring fair access to an N_Port, when the N_Port uses Class 1 service. If an N_Port is concurrently used by several FC-4s (including IP FC- 4), then providing fair access and avoiding resource starvation can not be addressed by IP FC-4 means only. 11 MTU Maximum Transmission Unit (MTU) is defined as the length of the IP packet, including IP header, but not including any overhead below IP. Conventional LANs have MTU sizes determined by physical layer specification. MTUs may be required simply because the chosen medium won't work with larger packets, or they may serve to limit the amount of time a node must wait for an opportunity to send a packet. In IP FC-4 the transmission unit is the Information Unit, not the frame. The N_Port may transmit an Information Unit using multiple frames. The recipient N_Port reassembles the original Information Unit before passing it to the upper level. The maximum size of a single Information Unit is limited to 2^32 - 1, which imposes no practical limit for networking purposes. Even so, an N_Port needs an MTU so that maximum buffer sizes for Information Units can be determined. The MTU for IP on Fibre Channel is 65280 (decimal) octets. This value was selected because it allows the IP packet to fit in one 64K octet buffer with up to 256 octets of overhead. It is also consistent with the MTU defined for HIPPI [5]. Expiration Date March 1993 [Page 12] - 13 - The maximum overhead is 8 octets at the present time; there are 248 octets of room for expansion. IEEE 802.2 LLC/SNAP Headers 8 octets Maximum IP packet size (MTU) 65280 octets ------------ Total 65288 octets (64K - 248) 12 Forming an IP subnet This document defines a region as a set of N_Ports attached to a common Fabric (if Fabric is present), such that any N_Port in the set can successfully complete the N_Port Login Procedure with all the other N_Ports in the set. If a Fabric is absent then a region is defined as a pair of N_Ports that have successfully completed the N_Port Login Procedure with each other. All the N_Ports that form a single region constitute a single IP subnet. That is, all the nodes associated with the N_Ports forming a single region should be able to exchange IP packets with each other directly without any intervening routers. For a set of N_Ports attached to a common Fabric not all of the N_Ports within the set may be able to communicate with each other. This may be due, for example, to different Classes of service supported by different ports within the set, thus resulting in potentially incompatible sets of the Login parameters. Therefore, a set of N_Ports attached to a common Fabric may consist of multiple regions. If an N_Port and the Fabric to which the N_Port is attached support multiple Classes of service, then the set of N_Ports with which the N_Port can communicate may not have the transitivity property with respect to connectivity. For example, the set may have three different N_Ports, P1, P2, and P3, such that P2 can communicate with P1, P3 can communicate with P1, but P2 and P3 can't communicate with each other. Thus, an N_Port may belong to more than one region. If an N_Port belongs to more than one region, then for each region in which the N_Port is intended to support IP the N_Port shall be assigned a distinct IP address. A Node that is connected by N_Port(s) to, and supports IP addresses on, multiple regions may act either as a multihomed host or as a router. By default such a node shall act as a multihomed host. Expiration Date March 1993 [Page 13] - 14 - Procedures for determining the set of N_Ports attached to a common Fabric that forms a region are outside the scope of this document. Appendix A Using multiple concurrent exchanges for TCP demultiplexing This appendix suggests one possible application of multiple concurrent exchanges between a pair of N_Ports. The suggested scheme is intended to provide an alternative mechanism to demultiplex incoming IP packets destined to different TCP ports within a node. A pair of N_Ports residing on a common IP subnetwork may agree to maintain a separate pair of Exchanges for each TCP connection between the nodes connected via these two N_Ports. In such an environment an Exchange Responder may use X_ID information as an alternative mechanism to correlate incoming IP packets with one of its TCP connection control blocks. The procedure for establishing such an agreement is outside the scope of this document. An implementation that uses X_ID information for demultiplexing of incoming IP packets shall be able to correctly operate in the presence of the gratuitous termination of Exchanges (such termination may happen, for instance, due to the lack of resources). Appendix B Fair access with Class 1 service The following approach for Class 1 connection management is suggested to ensure frequent, fair access to a node for which multiple other nodes are contending. An Exchange Originator should use the Continue Sequence Condition bits to indicate to the Exchange Responder whether the Originator has more IP/ARP packets to send. The Responder may use this information when making a decision about terminating a Class 1 connection. An Exchange Originator should attempt to terminate a Class 1 connection used solely by the IP FC-4 any time it does not have any additional IP or ARP packets to send or is unable to send more packets, e.g., due to flow or congestion control or excessive occurrences of the Stop_Sequence protocol. Even in the presence of additional IP or ARP packets to send an Exchange Originator may terminate such a Class 1 connection after some upper bounded time interval. The suggested value of this interval is 500 milliseconds. The purpose of this is to give each Exchange Originator a fair share of a common Exchange Responder's bandwidth. Without a limit, if Expiration Date March 1993 [Page 14] - 15 - there is a Responder that is constantly in demand by multiple Originators, the Originator that sends the most data per connection may effectively monopolize the Responder. Appendix C Guidelines for using different Classes of service If the Fabric, the Exchange Originator, and the Exchange Responder all support Class 1, Class 2 and/or Class 3 service, then the Originator is allowed to send data over any Class supported by all. To improve performance it is suggested to use Class 1 for sending long Information Units, and use Class 2 or 3 for sending the rest, unless there is already an established Class 1 connection that can be used. Use of Class 3 service for sending IP packets may have certain undesirable performance implications. References [1] Postel, J., "Internet Protocol", RFC-791, USC/Information Sciences Institute, September 1981. [2] Plummer, D., "An Ethernet Address Resolution Protocol - or - Converting Network Protocol Addresses to 48.bit Ethernet Address for Transmission on Ethernet Hardware", RFC826, MIT, November 1982 [3] "Fibre Channel - Physical and Signaling Interface (FC-PH)", Rev 3.0, ANSI, June 1992 [4] Reynolds, J.K., Postel, J., "Assigned Numbers", RFC1060, USC/Information Sciences Institute, March 1990 [5] Renwick, J., Nicholson, A., "IP and ARP on HIPPI", Internet Draft, June 1992 Editor's Address Yakov Rekhter T.J. Watson Research Center, IBM Corporation P.O. Box 218 Yorktown Heights, NY 10598 Phone: (914) 945-3896 email: yakov@watson.ibm.com Expiration Date March 1993 [Page 15]