S. Bailey (Sandburst) Internet-draft Expires: July 2002 The Direct Data Placement Protocol (DDPP) Core draft-bailey-roi-ddpp-core-00 Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Copyright Notice Copyright (C) The Internet Society (2002). All Rights Reserved. Abstract This document defines the core of a Direct Data Placement Protocol (DDPP) to run on Internet Protocol-suite transport protocols. The DDPP core is mapped to specific transport protocols in separate documents. Table Of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 2 2. DDP-Decorated Messages In DDPP . . . . . . . . . . . . . . 2 2.1. Splitting DDP-Decorated Messages . . . . . . . . . . . . . 2 Bailey Expires July 2002 [Page 1] Internet-Draft DDPP Protocol Core 12 Feb 2002 2.2. DDP-decoration Structure . . . . . . . . . . . . . . . . . 3 3. Operation Ordering In DDPP . . . . . . . . . . . . . . . . 4 3.1. Ordering On Reliable, Ordered Transports . . . . . . . . . 6 3.2. Ordering On Reliable, Unordered Transports . . . . . . . . 6 3.3. Ordering On Unreliable, Ordered Transports . . . . . . . . 7 3.4. Ordering On Unreliable, Unordered Transports . . . . . . . 7 4. Transport Topology In DDPP . . . . . . . . . . . . . . . . 7 5. Negotiating DDPP . . . . . . . . . . . . . . . . . . . . . 8 6. Security Considerations . . . . . . . . . . . . . . . . . 8 7. IANA Considerations . . . . . . . . . . . . . . . . . . . 8 References . . . . . . . . . . . . . . . . . . . . . . . . 8 Author's Address . . . . . . . . . . . . . . . . . . . . . 9 Full Copyright Statement . . . . . . . . . . . . . . . . . 9 1. Introduction This document defines the core of a Direct Data Placement Protocol (DDPP) to run on Internet Protocol-suite transport protocols. The DDPP core is mapped to specific transport protocols in separate documents. DDPP follows the architecture and terminology of `The Architecture of Direct Data Placement (DDP) And Remote Direct Memory Access (RDMA) On Internet Protocols' (DRARCH) [DRARCH]. A thorough understanding of DRARCH is necessary to understand this document. 2. DDP-Decorated Messages In DDPP DDP-decorated messages allow a receiving network interface to directly place the data in a client protocol buffer. A DDP-decorated message submitted to DDPP by a client protcol may be split into a group of smaller DDP-decorated messages which are each submitted to the transport. Each DDP-decorated message submitted the transport carries its own, complete DDP-decoration information. 2.1. Splitting DDP-Decorated Messages DDPP processes a client protocol request to send a DDP-decorated message of arbitrary length by potentially sending a group of smaller DDP-decorated messages with equivalent content. A group of DDP-decorated messages corresponding to a client protocol request: o MAY be sent in any order. For example, 1000 octets of DDP- decorated data could be sent as two messages, the first Bailey Expires July 2002 [Page 2] Internet-Draft DDPP Protocol Core 12 Feb 2002 containing octets 500-999, and the second containing octets 0-499. o MUST request a reception indication in the last message with the client protocol-supplied message identifier, if the client protocol requested a reception indication. o MUST NOT request a reception indication in any message other than the final one. DDPP mappings to unreliable or unordered transports MUST provide client protocols a way to ensure DDP-decorated messages are sent atomically, or not at all when the client protocol requests this behavior; for example, by defining a DDP-decorated sending operation that returns an error if the message can not be sent atomically. DDPP on a reliable, ordered transport MAY also provide this capability. 2.2. DDP-decoration Structure There are two DDP-decoration elements which appear `on the wire': a buffer address, composed of a steering tag, and a buffer offset, and notification information, composed of a notification request flag, and a message identifier. DDPP organizes these as: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |N| Message Identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | STag | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + Offset + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ N - Notify Flag : 1 bit (boolean flag) if set to 1, notify the client protocol of the reception of this DDP-decorated message. Message Identifier: 31 bits (unsigned integer) Passed to the client protocol when the Notify Flag is set. The Message Identifier is opaque to DDPP and can be structured Bailey Expires July 2002 [Page 3] Internet-Draft DDPP Protocol Core 12 Feb 2002 in any way by the client protocol. The Message Identifier field is ignored by DDPP on DDP-decorated messages without the Notify Flag set and may be set to any value by the sending DDPP implementation. For example, if a DDP-decorated message is split into several smaller DDP-decorated messages, the Message Identifier field in each might contain the same value, even thought the Notify Flag is only set in the last message. STag: 32 bits (unsigned integer) The steering tag identifying the destination buffer into which to place the contents of a DDP-decorated message. Offset: 64 bits (unsigned integer) The offset in the destination buffer at which to begin placing the contents of a DDP-decorated message. A DDPP transport mapping MAY arrange these components differently, but all four components MUST be present, or directly computable from information available in every transport message containing a DDP-decorated message. The STag MUST be 32 bits, and the Offset MUST be 64 bits. The Message Identifier MUST be at least 15 bits and SHOULD be at least 31 bits. 3. Operation Ordering In DDPP The ordering among: o set()s, o undecorated messages, and o DDP-decorated message reception indications and their relationship to corresponding operations on the sender is defined in DDPP according to underlying transport characteristics: o reliable or unreliable, and o ordered or unordered. A primary principle in DDPP is that absolutely minimal restrictions are imposed on ordering among set()s. One view of the ordering rules of DDPP is that messages are passed to DDPP by the transport, and DDPP can accumulate and process these messages in any way, and in any order, as long as it conforms to Bailey Expires July 2002 [Page 4] Internet-Draft DDPP Protocol Core 12 Feb 2002 the rules defined below. While such accumulation and haphazard, nondeterministic processing of DDPP messages may seem unlikely in a real implementation, in fact, it does reflect the range of behaviors exhibited when considering a wide range of implementations. Such liberal rules permit very efficient implementations that do not violate transport semantics either in the transport interface to DDPP, or in the `pass-through' of transport semantics to the client protocol. Operation ordering in DDPP is defined in terms of: o `submission' of messages to DDPP by the client protocol by the sender, o `reception' of messages by DDPP from the transport by the receiver, and o `delivery' of DDP-decorated message reception indications and undecorated messages to the client protocol by the receiver. For reception properties, DDP-decorated messages resulting from splitting a single client protocol message are all considered to be separate messages. A set() to buffer `b', address `a', with value `v' `corresponds' to a DDP-decorated message `m' if m also targets buffer b, address a with value v. A set() which corresponds to a DDP-decorated message that has been submitted to DDPP by the client protocol is called a ``corresponding set()'. Regardless of transport characteristics, DDPP: o MUST only perform corresponding set()s, o MAY perform a corresponding set() more than once, o MAY perform corresponding set()s in any order, o MUST perform set(a,v) for every (a,v) that corresponds to a received message `m' before m's reception indication (if any) is delivered. o MUST only perform set()s on registered buffers. If the transport is ordered, DDPP: o MUST only perform set()s that correspond to messages that follow all delivered reception indications and all delivered Bailey Expires July 2002 [Page 5] Internet-Draft DDPP Protocol Core 12 Feb 2002 undecorated messages. If the transport is reliable, DDPP: o MUST only perform set()s that correspond to messages for which a reception indication has not yet been delivered. 3.1. Ordering On Reliable, Ordered Transports On a reliable, ordered transport, DDPP: o MUST not deliver a reception indication more than once, o MUST NOT deliver a reception indication before all preceding reception indications and undecorated messages are delivered, o MUST not deliver an undecorated message more than once. o MUST NOT deliver an undecorated message before all preceding reception indications and undecorated messages are delivered, o MUST perform set(a,v) for every (a,v) that corresponds to a received message before a subsequent reception indication or undecorated message is delivered. These rules allow subsequent reception indications and subsequent undecorated messages to act as implicit reception indications: delivery of a subsequent reception indication or subsequent undecorated message implies all set()s corresponding to preceding DDP-decorated messages have been performed. For a reliable, ordered transport, delivery of the reception indication on the last of a group of DDP-decorated messages sent in place of a single client protocol message is equivalent to delivery of a reception indication for a single DDP-decorated message carrying the same data. 3.2. Ordering On Reliable, Unordered Transports On a reliable, unordered transport, DDPP: o MUST not deliver a reception indication more than once, o MUST not deliver an undecorated message more than once. Bailey Expires July 2002 [Page 6] Internet-Draft DDPP Protocol Core 12 Feb 2002 3.3. Ordering On Unreliable, Ordered Transports On an unreliable, ordered transport, DDPP: o MUST not deliver a reception indication more than once, o MUST NOT deliver a reception indication before a preceding reception indication or undecorated message, o MUST not deliver an undecorated message more than once, o MUST NOT deliver an undecorated message before a preceding reception indication or undecorated message. 3.4. Ordering On Unreliable, Unordered Transports On an unreliable, unordered transport, in general, no additional, transport-dependent rules apply to DDPP. Particular unreliable, unordered transports may have additional characteristics that permit useful ordering properties. For example, a DDPP mapping to an unreliable datagram protocol on a network with a maximum datagram lifetime of `MDL' could define, as a function of MDL, the maximum time between submitting a DDP- decorated message, and a set() that corresponds to it. Unregistering a buffer is another way for a receiver to limit the maximum time between submitting a DDP-decorated message and a set() that corresponds to it. However, if another buffer is registered subsequently with the same STag, set()s may be performed on the new buffer that were destined for the old one. One possible way of preventing immediate reuse of STags is to give the client protocol some control over STags assigned to registered buffers. 4. Transport Topology In DDPP Transports support some combination of: o single source, or multisource, o single destination, or multidestination (multicast or anycast). No special considerations apply to DDPP on multisource transports. DDPP on multidestination transports must ensure that DDP-decorated messages destined for many receivers can be placed in the appropriate buffer on each receiver. The two tools for doing this Bailey Expires July 2002 [Page 7] Internet-Draft DDPP Protocol Core 12 Feb 2002 are: o different receivers assigning the same buffer address (STag and Offset) when registering the buffer, o senders sending several messages with the same contents and different buffer addresses. A DDPP multicast transport mapping could use either of these techniques, or both in combination. However, if no receivers assign the same buffer address, there will be no economy of data transport compared to using a single destination transport. Any DDPP multicast transport mapping must carefully trade off the implementation restrictions resulting from requiring control of buffer address assignment, and the benefits of multicast data transport. For example, it might be reasonable to expect support for a small set of distinguished multicast buffer addresses by any multicast-capable DDPP implementation. This would be analogous to the small set of distinguished multicast network addresses within the larger network address space. A DDPP anycast transport must ensure that all different receivers assign the same buffer address, because the choice of destination may be beyond the control of the data source. 5. Negotiating DDPP Negotiating the use of DDPP is the sole responsibility of the client protocol. Note that DDPP is a simplex protocol and MAY be enabled in only one direction by a pair of participants. Some client protocols (e.g. RDMA) MAY chose to require DDPP a priori, while others MAY define an in- or out-of-band negotiation process to dynamically enable DDPP per sender/receiver pair. 6. Security Considerations [TODO] 7. IANA Considerations [TODO] 8. References [DRARCH] Bailey, S., "The Architecture of Direct Data Placement (DDP) And Remote Direct Memory Access (RDMA) On Internet Protocols", February 2002. http://www.cs.uchicago.edu/~steph/draft- Bailey Expires July 2002 [Page 8] Internet-Draft DDPP Protocol Core 12 Feb 2002 bailey-roi-ddp-rdma-arch-00.txt Author's Address Stephen Bailey Sandburst Corporation 600 Federal Street Andover, MA 01810 USA Email: steph@sandburst.com Full Copyright Statement Copyright (C) The Internet Society (2002). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Bailey Expires July 2002 [Page 9]