Internet Draft Tom Talpey (NetApp) May, 2002 David Robinson (Sun) Robert Teisberg (HP) Jim Wendt (HP) Document: draft-talpey-rdma-over-ip-requirements-00.txt Expires: November 2002 RDMA over IP (ROI) Requirements Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Copyright Notice Copyright (C) The Internet Society (2002). All Rights Reserved. Abstract This draft defines terminology and requirements to be used in conjunction with the RDMA over IP (ROI) effort. Talpey, et al Expires November 2002 [Page 1] Internet-Draft RDMA over IP Requirements May 2002 Table Of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 2 Overview . . . . . . . . . . . . . . . . . . . . . . . 3 Authors' Note . . . . . . . . . . . . . . . . . . . . . 5 Document Conventions . . . . . . . . . . . . . . . . . 5 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6 General Terms . . . . . . . . . . . . . . . . . . . . . 6 Direct Data Placement Terms . . . . . . . . . . . . . . 7 Remote Direct Memory Access Terms . . . . . . . . . . . 9 Memory Management Terms . . . . . . . . . . . . . . . . 10 Terminology Note . . . . . . . . . . . . . . . . . . . 11 3. Requirements . . . . . . . . . . . . . . . . . . . . . . . 13 Implementation Goals . . . . . . . . . . . . . . . . . 13 Transport Requirements . . . . . . . . . . . . . . . . 13 Direct Data Placement Requirements . . . . . . . . . . 16 Remote Direct Memory Access Requirements . . . . . . . 17 Upper Layer Protocol Requirements . . . . . . . . . . . 18 Security Requirements . . . . . . . . . . . . . . . . . 19 4. Acknowledgements . . . . . . . . . . . . . . . . . . . . . 20 5. References . . . . . . . . . . . . . . . . . . . . . . . . 20 Authors' Addresses . . . . . . . . . . . . . . . . . . . . 21 Full Copyright Statement . . . . . . . . . . . . . . . . . 22 1. Introduction This document defines terminology and presents requirements for Remote Direct Memory Access over the Internet Protocol Suite. The concept is subdivided into two primary components, referred to as Remote Direct Memory Access, or RDMA, and Direct Data Placement, or DDP. This document considers the specification of both RDMA and DDP protocols for the Internet Protocol family, to be called "RDMA over IP", or ROI. Section 2 defines many important terms. The goal of the DDP protocol is to allow the efficient placement of data into buffers designated by Upper Layer Protocols (ULP). Efficiency may be characterized by the minimization of the number of transfers of the data over the receiver's system buses. The goal of the RDMA protocol is to provide the semantics to enable Remote Direct Memory Access between ROI peers in a way consistent with application requirements. The RDMA protocol is not an application protocol in itself, but provides facilities immediately useful to existing and future networking, storage, and other application protocols. [SDP] [DAFS] [VI] [IB] [MYR] [SVR] [FIBRE] Talpey, et al Expires November 2002 [Page 2] Internet-Draft RDMA over IP Requirements May 2002 The DDP and RDMA protocols work together to achieve their respective goals. RDMA provides facilities to a ULP for identifying buffers, controlling the transfer of data between ULP peers, and providing completion notifications to the ULP. RDMA uses the features of DDP to steer payloads to specific buffers at the Data Sink. ULPs that do not require the features of RDMA may be layered directly on top of DDP. The DDP and RDMA protocols are transport independent. The following figure shows the relationship between RDMA, DDP, Upper Layer Protocols and Transport. +-------------------+--------------+----------------+ | ULP | ULP | ULP | +-----+-------------+-------------------------------+ | | | RDMA | | | +-------------------------------+ | | DDP | | +--------------------+------------------------+ | Transport | Transport | +--------------------------+------------------------+ 1.1. Overview Several performance trends are at work in networked systems. Moore's law describing CPU performance trends is well known. The nearly parallel trend in network link bandwidth differs from Moore's law primarily in lacking a catchy name. Today's inexpensive network adapters running at 1Gbps succeed the 100Mbps adapters of the 1990s and the 10Mbps adapters of the 1980s in a so far unbroken sequence. [ROM] [HP97] [STREAM] Less remarked but painfully familiar to CPU and system designers is the trend in memory performance. Memory speeds have been improving along with CPU and network speeds, but at a much slower pace. [HP97] In a conventional implementation of an Internet protocol stack, incoming link-layer frames are deposited by hardware into buffers owned by the operating system. Software in the host CPU parses headers until the data can be associated with a specific application buffer, at which time the payload is copied to the buffer. This means that each byte of incoming payload crosses the memory bus at least three times; once when the containing frame is received, and twice when the payload is copied to the application's buffer. Furthermore, the copy indirectly causes additional memory traffic as cache lines are flushed and reloaded. Talpey, et al Expires November 2002 [Page 3] Internet-Draft RDMA over IP Requirements May 2002 Network Interface Controllers (NICs) that offload protocol processing only up through the transport layer cannot address the memory bandwidth problem caused by copying because the information needed to place the payload is not known to the transport layer. While the problem can be solved one Upper Layer Protocol (ULP) at a time by implementing the ULP in the NIC (this is being done now by several vendors for iSCSI [ISCSI1] [ISCSI2], for example), there are so many ULPs affected by the memory bandwidth problem that migrating ULP implementations into the NIC is economically infeasible. Neither a multitude of specialized NICs each implementing one ULP, nor a large, complex, expensive, multipurpose NIC implementing many ULPs is attractive to either vendors or end users. The problem of memory bandwidth consumption due to copying of network payload can be solved by a common protocol that identifies the final destination of the payload. Such a protocol has come to be known as Direct Data Placement (DDP). Just as a network layer protocol such as IP can be thought of as steering data from a source node to a destination node, and a transport layer protocol such as SCTP [SCTP] as steering data from a source process to a destination process, so DDP steers data from a source buffer to a destination buffer. A protocol stack residing in a NIC and containing all layers up through DDP can place incoming payloads directly in the ULP's buffer with only one memory bus crossing and can do so for any ULP. Another source of overhead in networked computing systems is context switches. Just as many conventional peripheral devices use Direct Memory Access (DMA) to read and write buffers without interrupting ongoing processing, a process can use Remote Direct Memory Access (RDMA) to read and write buffers belonging to a process in a remote node without interrupting the remote process's (or unrelated processes') activity. This can eliminate yet another source of memory bus traffic and cache pollution. Even when the network protocol stack is not offloaded to a peripheral device, RDMA provides benefits to applications by giving them a convenient way to identify both the source and the destination of data to be transferred. Complete control of an RDMA transfer resides in one peer, simplifying both the application protocol and the application's logic. DDP therefore solves the problem of efficiently directing payloads to buffers, while RDMA enables simplified and more efficient application logic by giving applications a way to identify source and destination buffers, controlling the entire transfer from one end. Talpey, et al Expires November 2002 [Page 4] Internet-Draft RDMA over IP Requirements May 2002 1.2. Authors' Note In order to make a meaningful start on these requirements, the authors found it necessary to begin with some fundamental assumptions about the nature of the proposed solution. It has not been the intention of the authors to summarize discussion of any implementation, nor to capture all possible alternatives. As such, this Draft only considers the case of layering the ROI solution atop IP Transport. We leave it to other Drafts to explore alternatives. Initially, it is expected that ROI will target the Stream Control Transmission Protocol. This is not to preclude consideration of TCP or other Internet Protocol family member, within the requirements stated. The ROI solution must ensure its portability among suitable IP transports. The authors invite discussions of these requirements, and expect a lively debate! 1.3. Document Conventions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY" and "OPTIONAL" appearing in this document are to be interpreted as described in RFC2119. [RFCTERMS] Also in this document, the following naming conventions for certain functional components have been adopted. When referring to the RDMA over IP architecture, the term "ROI" is used. When referring to all the ROI component protocols taken together, the term "ROI protocols" is used. When referring to the RDMA protocol alone, or when taken together with DDP, of a property specifically provided by the RDMA protocol, the term "RDMA protocol" is used. When referring to the DDP protocol alone, exclusive of RDMA, the term "DDP protocol" is used. These terms are described in greater detail in the following section. Talpey, et al Expires November 2002 [Page 5] Internet-Draft RDMA over IP Requirements May 2002 2. Terminology This section contains proposed terminology definitions for RDMA over IP (ROI) and serves to establish a common language for continued discussions and forthcoming documents. 2.1. General Terms Data Sink The peer receiving a directly placed data payload. Note that the Data Sink can be required to both send and receive RDMA/DDP Messages to transfer a data payload. Data Source The peer sending a directly placed data payload. Note that the Data Source can be required to both send and receive RDMA/DDP Messages to transfer a data payload. Fabric The collection of links, switches, and routers that connect a set of Nodes with ROI protocol implementations. LLP Lower Layer Protocol - The protocol layer beneath the protocol layer currently referenced. For example, for DDP, the LLP is SCTP, TCP, or other transport protocols. For RDMA, the LLP is DDP. Local Peer The ROI protocol implementation on the local end of the connection. Used to refer to the local entity when describing a protocol exchange or other interaction between two Nodes. NIC Network Interface Controller. In this context, this would be a NIC with ROI functionality. NIC Driver Software that supports a NIC, and provides an appropriate OS interface as required by the OS. Node A computing device attached to one or more links of a Fabric (network). A Node in this context does not refer to a specific application or protocol instantiation running on the computer. A Node may consist of one or more NICs installed in a host computer. Talpey, et al Expires November 2002 [Page 6] Internet-Draft RDMA over IP Requirements May 2002 Remote Peer The ROI protocol implementation on the opposite end of the connection. Used to refer to the remote entity when describing protocol exchanges or other interactions between two Nodes. ROI RDMA over IP. The set of wire protocols that provide Direct Data Placement and RDMA Operations to a ULP. ULP Upper Layer Protocol - The protocol layer above the protocol layer currently referenced. The ULP for RDMA/DDP is expected to be an OS, Application, adaptation layer, or proprietary device. The ROI documents do not specify a ULP, but provide a set of semantics that allow a ULP to be designed to utilize ROI. ULP Payload The ULP data that is contained within a single protocol segment or packet (e.g. an RDMA Operation as viewed within a DDP segment) 2.2. Direct Data Placement (DDP) Terms Data Delivery For DDP, delivery is defined as the process of informing the ULP or application that a particular DDP Message or Segment is available for use. This is specifically different from "Placement", which may generally occur in any order, while the order of "delivery" is strictly defined. See "Data Placement". Data Placement For DDP, this term is specifically used to indicate the process of writing to a data buffer by a DDP implementation. DDP Segments carry Placement information which may be used by the receiving DDP implementation to perform Data Placement of the ULP payload. See "Data Delivery". Placement Validation For DDP, the set of actions performed to validate the Placement information for a given DDP Segment. Payload Validation For DDP, the set of actions performed to validate the integrity of the payload in a DDP Segment. Talpey, et al Expires November 2002 [Page 7] Internet-Draft RDMA over IP Requirements May 2002 DDP Header The header present in all DDP Segments. The DDP Header contains control and Placement fields that are used to define the final Placement location for the ULP payload carried in a DDP Segment. DDP Message A ULP-defined unit of data interchange, which is subdivided into one or more payloads carried in one or more, respectively, DDP Segments. DDP Segment The smallest unit of data transfer for the DDP protocol. It includes a DDP Header and payload (if present). A DDP Segment is typically sized in order to optimize appropriately on the underlying transport. DDP Stream A sequence of DDP Messages whose ordering is defined by the LLP. A DDP Stream may map to an SCTP stream, or other LLP- specific facility. Note that DDP provides no ordering guarantees between DDP Streams. Direct Data Placement (DDP) A mechanism whereby ULP data contained within DDP Segments may be placed directly into its final destination in memory without processing of the ULP. Steering Tag An identifier of a Memory Region on a Node, valid as defined within a protocol specification. STag Steering Tag Target Offset The offset within a Memory Region. The offset is a Data Sink- supplied base value which is manipulated by the Data Source to direct data transfers within the Memory Region using ordinary address arithmetic. TO Target Offset. Decorated A DDP Segment that is accompanied by DDP Decoration is considered to be "decorated". Talpey, et al Expires November 2002 [Page 8] Internet-Draft RDMA over IP Requirements May 2002 DDP Decoration The Placement information accompanying a DDP Segment and which facilitates Data Placement. Undecorated A DDP Segment that is lacking DDP Decoration is considered to be "undecorated". Undecorated ULP data may still be accompanied by a header sufficient to distinguish it from Decorated data. 2.3. Remote Direct Memory Access (RDMA) Terms Remote Direct Memory Access (RDMA) A method of accessing memory on a remote system in which the local system specifies the remote location of the data to be transferred. RDMA Protocol A wire protocol that supports RDMA Operations to transfer ULP data between the Local Peer and the Remote Peer. RDMA Stream An association between a pair of RDMA implementations, possibly on different Nodes, which transfer ULP data using RDMA Operations. There may be multiple RDMA Streams on a single Node. RDMA Operation A sequence of RDMA messages, including control messages, to transfer data from a Data Source to a Data Sink. RDMA Message A data transfer mechanism used to fulfill an RDMA Operation. RDMA Buffer A region of memory used to send or receive data by RDMA. The region may be Tagged or Untagged (see Memory Management terms). RDMA Read An RDMA Operation used by the Data Sink to transfer the contents of a source RDMA Buffer from the Remote Peer to the Local Peer. An RDMA Read Operation consists of a single RDMA Read Request message and at least one RDMA Read Response message. RDMA Read Request An RDMA message used by the Data Sink to request the Data Talpey, et al Expires November 2002 [Page 9] Internet-Draft RDMA over IP Requirements May 2002 Source to transfer the contents of an RDMA Buffer. The RDMA Read Request Message describes both the Data Source and Data Sink RDMA Buffers. RDMA Read Response An RDMA message used by the Data Source to transfer the contents of a RDMA Buffer to the Data Sink, in response to an RDMA Read Request. The RDMA Read Response message only describes the Data Sink RDMA Buffer. RDMA Write An RDMA Operation that transfers the contents of a source RDMA Buffer from the Local Peer to a destination RDMA Buffer at the Remote Peer using RDMA. The RDMA Write message only describes the Data Sink RDMA Buffer. Send An RDMA Operation that transfers the contents of a ULP Buffer from the Local Peer to an RDMA Buffer at the Remote Peer. The Send message does not specify the Data Sink RDMA Buffer. RDMA Completion For RDMA, completion is defined as the process of informing the ULP or application that a particular RDMA Operation has completed. The completion semantic of each RDMA Operation is distinctly defined. Completion RDMA Completion. Fence To block the current RDMA Operation from executing until certain other RDMA Operations have completed. Solicited Event A facility by which an RDMA Operation sender may cause an event to be generated at the recipient when a Send message is received. 2.4. Memory Management Terms Advertisement The act of informing a Remote Peer of the availability of a local buffer. A Node exposes a registered Buffer for incoming read or write access by informing its ROI peer of the buffer identifiers (STag, base address, length). This advertisement of buffer information is not defined by ROI and is left to the ULP. A typical method would be for the Local Peer to embed the Talpey, et al Expires November 2002 [Page 10] Internet-Draft RDMA over IP Requirements May 2002 buffer's Steering Tag, address, and length in a Send message destined for the Remote Peer. Tagged Buffer A buffer that is Advertised for RDMA access by the Remote Peer. A Tagged Buffer is manipulated by the Remote Peer by means of its associated Steering Tag, Target Offset, and length. Untagged Buffer A ULP receive buffer used to receive incoming Remote Peer Send transfers. The buffer is referred to as untagged because the Data Source does not specify the final destination of the Send on the Data Sink. Memory Registration The act of registering a host Memory Region for use by a ULP or application. The memory registration operation returns a Steering Tag. Memory Region An area of registered memory, which can be accessed in a contiguous fashion by the DDP implementation. The Memory Region is thereby enabled for DDP local access and optional remote access. A Memory Region is identified by a Steering Tag and has an associated length. Note that the DDP implementation defines the mapping, and therefore the Memory Region may or may not be contiguous in any other address space. ULP Buffer A buffer owned above the RDMA layer and exposed to the RDMA layer either as a Tagged Buffer or an Untagged Buffer. 2.5. Terminology Note The following terms have been avoided in this document to avoid confusion or overlap as noted. Chunk Reserved for SCTP (use DDP Segment) Frame Reserved for the Data Link Layer Sender/Receiver or Requestor/Responder Data Sink and Data Source are clearer and are preferred in DDP context. Talpey, et al Expires November 2002 [Page 11] Internet-Draft RDMA over IP Requirements May 2002 Notification Use RDMA Completion in RDMA context. Talpey, et al Expires November 2002 [Page 12] Internet-Draft RDMA over IP Requirements May 2002 3. Requirements The following sections outline the requirements for ROI components. 3.1. Implementation Goals ROI MUST enable Direct Data Placement and Remote Direct Memory Access semantics over existing Internet Protocols. ROI MUST enable cost competitive solutions. ROI MUST provide high bandwidth and bandwidth aggregation. ROI MUST enable low host system overhead. ROI SHOULD keep the protocol simple. ROI SHOULD enable creation of optimized implementations. Targeted optimizations SHOULD include reducing memory bus crossings, reducing host-adapter interactions, and enabling parallel processing. ROI SHOULD be specified as a layered implementation atop IP transport. 3.2. Transport Requirements The following are requirements placed by ROI on any IP transport Lower Layer Protocol. 3.2.1. Layering The ROI protocols MUST NOT require changes to any supported IP transport, nor require that new semantics be imposed. The ROI protocols SHOULD NOT replicate services available at lower layers. The ROI protocols SHOULD expose fundamental properties of the underlying IP transport to the ULP to the maximum extent possible, consistent with the explicit requirements of ROI. - ROI over a connection-oriented transport MUST expose connection-oriented semantics to the ULP. - ROI over a connectionless transport MUST expose connectionless semantics to the ULP. Talpey, et al Expires November 2002 [Page 13] Internet-Draft RDMA over IP Requirements May 2002 - ROI over an explicitly unreliable protocol SHOULD expose unreliable semantics to the ULP. Certain ROI features MAY have the side effect of providing information to the ULP about datagram loss and MAY involve retries, but assuring reliable delivery of payload MUST NOT be their primary purpose. ROI MUST use transport connections conservatively. ROI MUST be designed to allow future substitution of transport protocols with minimal changes to ROI protocol operation, message structures and formats. 3.2.2. Network Infrastructure ROI MUST function over a variety of IP network topologies (e.g. dedicated LAN, shared LAN, private WAN, public Internet). ROI MUST be compatible with both IPv4 and IPv6. ROI SHOULD NOT require changes to infrastructure beyond those already required by the supported IP transport. ROI SHOULD function correctly through middleboxes (e.g. NATs, firewalls) to the extent that the supported IP transport allow. [MIDTAX] 3.2.3. Ordering and Reliability The transport MAY support reliable operation. The transport MUST detect duplicate transmissions and thereby deliver ROI Operations at most once, or signal an error. The transport MAY support unordered delivery. ROI MUST provide support for the ordering of Completions within classes of RDMA Operations. 3.2.4. Connection model The ROI protocols MUST specify a binding to connection oriented transports. The ROI protocols MAY specify a binding to datagram oriented transports. Talpey, et al Expires November 2002 [Page 14] Internet-Draft RDMA over IP Requirements May 2002 The ROI protocols are NOT REQUIRED to support broadcast or multicast operations. 3.2.5. Integrity The ROI protocols MUST specify validation mechanisms that cover at least the DDP Placement information. DDP Placement information MUST be validated prior to Data Placement. The data MUST be validated by the transport before Data Delivery. The ROI protocols SHOULD NOT guarantee stronger integrity than the underlying transport. The ROI protocols MAY provide stronger integrity guarantees by means of optional facilities. 3.2.6. DDP Transport Interaction The DDP protocol SHOULD be capable of presenting data to the IP transport layer in DDP Segments that allow the transmission of the data within the IP transport layer's optimal segment and without requiring fragmentation and reassembly. All DDP Headers and payload MUST appear as ordinary payload within IP transport segments. DDP Header information MAY also be duplicated in extensible IP transport headers as allowed by the respective standards. DDP MAY also use information reported to it by the underlying IP transport. 3.2.7. Congestion control Any IP transport protocol underlying the DDP protocol MUST support congestion control as described in RFC2914. [CONG] The ROI protocols are NOT REQUIRED to provide congestion control. To provide it would duplicate the lower mechanism. The ROI protocols are NOT REQUIRED to implement flow control, as they will operate on top of transports with flow control and below applications with flow control. Talpey, et al Expires November 2002 [Page 15] Internet-Draft RDMA over IP Requirements May 2002 3.3. Direct Data Placement Requirements The following are requirements applicable to the DDP layer. 3.3.1. Transport DDP MUST be supported over SCTP. DDP MAY be defined over any IP transport meeting the requirements of section 3.2. 3.3.2. Placement The DDP Segment MUST contain DDP Headers that facilitate Placement of the data into the destination buffers without interpretation of the Upper Layer Protocol. DDP headers MUST be self-contained and self-describing. DDP MUST enable efficient Direct Data Placement of incoming data. DDP is NOT REQUIRED to provide ordering guarantees between DDP Streams. 3.3.3. Memory Model The contents of all Untagged Buffers, and of writeable Tagged Buffers which are Advertised to the Remote Peer, and passed to DDP by any ULP are indeterminate unless a successful Data Delivery occurs. The Placement of data MUST NOT be dependent upon any previous or subsequent DDP Segments. Access to Memory Regions MUST be available on a byte-level granularity, MUST be strictly bounds checked and MUST NOT permit "wrapping" or "overflow". Memory Regions MUST support protection attributes specifying at least "read" and "write". All Memory Region accesses MUST be checked for validity according to the protection attributes of the region. 3.3.4. Data Delivery The Data Delivery of Send and RDMA Write Operations on a single DDP Stream MUST be delivered to the ULP in the sequence Talpey, et al Expires November 2002 [Page 16] Internet-Draft RDMA over IP Requirements May 2002 in which all such Operations were issued. The Data Delivery of RDMA Read Operations on a single DDP Stream MUST be delivered to the ULP in the sequence in which they were issued. The Data Delivery of Send, RDMA Write and RDMA Read Operations SHOULD be delivered promptly. 3.3.5. Header Contents and Validation The DDP protocol MUST support a validation method for DDP Headers and payload which encompasses any requirements made by its Upper Layer Protocols as well as facilities provided by its Lower Layer Protocols. The DDP layer MUST signal to the ULP any unrecoverable transport error, including unrecoverable data corruption. 3.4. Remote Direct Memory Access Requirements The following are requirements applicable to the RDMA layer. 3.4.1. Send The RDMA protocol MUST support a Send Operation, capable of employing DDP to send a data payload to an Untagged Buffer at the Remote Peer and supporting a defined RDMA Completion ordering at both the Data Source and Data Sink. 3.4.2. Remote write The RDMA protocol MUST support an RDMA Write Operation, capable of employing DDP to send a data payload to a Tagged Buffer at the Remote Peer and supporting a defined RDMA Completion ordering at the Data Source. 3.4.3. Remote read The RDMA protocol MUST support an RDMA Read Operation, capable of employing DDP to retrieve a data payload from a Tagged Buffer at the Remote Peer and supporting a defined RDMA Completion ordering at the Data Sink. 3.4.4. Ordering and Completion The RDMA protocol MUST provide ordering rules and error semantics for all its Operations. Talpey, et al Expires November 2002 [Page 17] Internet-Draft RDMA over IP Requirements May 2002 The RDMA protocol MUST provide the ability to select whether RDMA Completions are required at the Data Sink. The RDMA protocol MUST successfully perform each ULP-requested Operation in the prescribed order, or return an error. Successful ULP-requested Operations MUST be performed exactly once. The RDMA protocol MUST support a mechanism for Solicited Events. 3.4.5. Memory Model RDMA MUST provide a way for the ULP to specify that a particular remote peer has read, write or read-write access to an Advertised RDMA Buffer. The RDMA protocol is NOT REQUIRED to include a means to communicate the granted access permissions to the remote peer. The RDMA protocol MUST include a way to report an access violation to the end-point that requested the forbidden access. RDMA MUST support byte-granularity specification of the base address and size of each Advertised RDMA Buffer. An RDMA implementation MUST enforce the bounds and access permissions of each Advertised RDMA Buffer. 3.5. Upper Layer Protocol Requirements The following are requirements applicable to the layers above RDMA. Upper Layer Protocol implementations SHOULD NOT modify the contents of buffers passed to the RDMA and DDP layers until their Data Delivery is implied from an appropriate RDMA Completion, subject to the ordering rules. Modifying the contents of active buffers will result in undefined behavior. Upper Layer Protocol implementations SHOULD choose a transport with appropriate semantics to support its needs, such as ordering and reliability. The ROI protocols are NOT REQUIRED to support any additional transport semantics on any Stream. Upper Layer Protocol implementations SHOULD provide their own flow control. Talpey, et al Expires November 2002 [Page 18] Internet-Draft RDMA over IP Requirements May 2002 Upper Layer Protocol implementations MUST be prepared to handle both local and remote errors on any request. Upper Layer Protocol implementations MUST be prepared that certain errors will be returned by operations subsequent to the operation that encountered them. In this case, unsignaled operations MAY be left in an indeterminate state. As well, the ROI implementation MAY have terminated the RDMA or DDP Stream. 3.6. Security Requirements The following are requirements relevant to security. The ROI protocols MUST be compatible with and be able to employ existing Internet security. The ROI protocols are NOT REQUIRED to establish the security association between the Remote Peer and Local Peer. The ROI protocols MUST rely upon supported IP transport Lower Layer Protocol implementations to support at least the following security properties. Integrity Encryption Authentication Confidentiality The ROI protocols MUST address the security issues inherent in the Advertisement of Memory Regions, especially as they will allow or prevent access within the scope of a single RDMA Stream. The ROI protocols MUST NOT permit ULPs or applications to access memory which has not explicitly been advertised to them by the Remote Peer. The ROI protocols MUST require implementations to enforce all supported protection attributes for Memory Regions. The ROI protocols MUST specify the protected scope of Advertised Memory Regions across all Remote Peers and all DDP Streams. Talpey, et al Expires November 2002 [Page 19] Internet-Draft RDMA over IP Requirements May 2002 4. Acknowledgements The authors gratefully acknowledge the previous work and valuable advice of Steph Bailey, David Black, Jeff Chase, Jeff Mogul, Jim Pinkerton, Renato Recio, Allyn Romanow and Costa Sapuntzakis, as well as the many others participating in the RDMA discussion to date. 5. References [ROM] Allyn Romanow, Jeff Mogul, Tom Talpey, Steph Bailey, "RMDA over IP Problem Statement", Work In Progress, http://www.ietf.org/internet-drafts/draft-romanow-rdma-over- ip-problem-statment.txt [HP97] J. L. Hennessy, D. A. Patterson, Computer Organization and Design, 2nd Edition, San Francisco: Morgan Kaufmann Publishers, 1997 [STREAM] The STREAM Benchmark Reference Information, http://www.cs.virginia.edu/stream/ [SCTP] R. Stewart et al., "Stream Transmission Control Protocol", Standards Track RFC, http://www.ietf.org/rfc/rfc2960 [ISCSI1] iSCSI Requirements, Informational Work In Progress, http://www.ietf.org/internet-drafts/draft-ietf-ips-iscsi- reqmts-06.txt [ISCSI2] iSCSI Specification, Standards Track Work In Progress, http://www.ietf.org/internet-drafts/draft-ietf-ips- iscsi-12.txt [MYR] Myrinet, http://www.myricom.com [DAFS] Direct Access File System, http://www.dafscollaborative.org http://www.ietf.org/internet-drafts/draft-wittle-dafs-00.txt [FIBRE] Fibre Channel Standard, Talpey, et al Expires November 2002 [Page 20] Internet-Draft RDMA over IP Requirements May 2002 http://www.fibrechannel.com/technology/index.master.html [IB] InfiniBand Architecture Specification, Volumes 1 and 2, Release 1.0.a. http://www.infinibandta.org [SDP] Sockets Direct Protocol, http://www.infinibandta.org [SVR] Compaq Servernet, http://nonstop.compaq.com/view.asp?PAGE=ServerNet [VI] Virtual Interface Architecture Specification Version 1.0, http://www.viarch.org/html/collateral/san_10.pdf [CONG] S. Floyd, "Congestion Control Principles", Best Current Practice, http://www.ietf.org/rfc/rfc2914.txt [RFCTERMS] S. Bradner, "Key words for use in RFCs to Indicate Requirement Levels", Best Current Practice, http://www.ietf.org/rfc/rfc2119.txt [MIDTAX] B. Carpenter, S. Brim, "Middleboxes: Taxonomy and Issues", Informational RFC, http://www.ietf.org/rfc/rfc3234.txt Authors' Addresses David Robinson Sun Microsystems, Inc. 901 San Antonio Road Palo Alto, CA 94303 USA Phone: +1 512 401-1757 EMail: david.robinson@sun.com Tom Talpey Network Appliance 375 Totten Pond Road Waltham, MA 02451 USA Phone: +1 781 768-5329 EMail: thomas.talpey@netapp.com Talpey, et al Expires November 2002 [Page 21] Internet-Draft RDMA over IP Requirements May 2002 Robert R. Teisberg Hewlett Packard Corporation 14231 Tandem Blvd. Austin, TX 78728 USA Phone: +1 512 432-8119 EMail: robert.teisberg@hp.com Jim Wendt Hewlett Packard Corporation 8000 Foothills Boulevard Roseville, CA 95747-5668 USA Phone: +1 916 785-5198 EMail: jim_wendt@hp.com Full Copyright Statement Copyright (C) The Internet Society (2002). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Talpey, et al Expires November 2002 [Page 22]