Network Working Group D. Otis Internet Draft SANlight Document: draft-otis-sctp-ddp-01.txt Expires: October, 2002 April 11, 2002 SCTP DDP Adaptation Status of this Memo This document is an internet-draft and is in full conformance with all provisions of Section 10 of [RFC2026]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract In many applications, direct placement of data avoids the overhead of multiple copies or excessive context switching and is a desirable feature. To accomplish this using an Internet Protocol, a direct placement adaptation layer is defined within this document. This adaptation exists as a small shim sitting above SCTP. This shim fragments messages for path compliance, prefixes the placement header, and may place data directly into a user buffer. The ultimate goal is to have placement occur by the network interface card, where this shim coordinates such placement while proper network layering is maintained. As SCTP was not designed to directly handle offset based fragmentation, the shim handles message fragmentation to introduce the offsets as well as determine message reception signaling as a result of unordered delivery needed for immediate placement. Table of Contents 1. Introduction..................................................2 1.1 Conventions...............................................3 2. Adaptation Layer Formats......................................3 2.1 Adaptation Layer Indicator................................3 2.2 DATA Chunk Format.........................................4 3. Procedures....................................................8 3.1 Association Initialization................................8 3.2 DDP Data Placement........................................8 Otis Expires - October 2002 [Page 1] Internet Draft SCTP DDP Adaptation April 11, 2002 4. IANA considerations...........................................9 5. Security Considerations.......................................9 6. Acknowledgments..............................................10 7. Authors' Addresses...........................................10 8. References...................................................10 9. Full Copyright Statement.....................................10 1. Introduction To reduce the overhead of multiple copies or excessive context switching, a direct placement adaptation layer is defined within this document. A small shim sitting directly above SCTP enables data objects (messages) to be directly placed into user buffers without assembly buffering by means of an offset within each message fragment. This shim also provides a means of signaling message boundary conditions as well as related message actions. The optimal implementation assumes hardware able to validate each DATA Chunk as received prior to placement and each DATA Chunk carries an offset within an identified user buffer. Some may include this adaptation layer within their SCTP implementations to maximize performance, but the behavior of SCTP will be unaffected. In order to accomplish this, the new adaptation layer indication as defined in [STEWa] is specified. The definition of Direct Data Placement vectors, associations with Streams and Payload Protocol Identifiers, together with multi- adapter cumulative TSN synchronization is defined in [DDP-IOV]. The offset within each message fragment alleviates a need for message reassembly buffering by leveraging the direct placement capability. Depending on the mode of operation, the message boundary signal is posted to the ULP delineated or enqueued by either Stream and DDP_Tag or just Stream. In the ANONMYOUS_MODE, for each Stream, one pending message acknowledgement constraint per DDP_Tag is required. The final message fragment carries signals that denote a message boundary. These signals are contained in the DDP Flags field and are held by the shim layer until qualified by the cumulative TSN. TSN qualification and the use of DDP Tags removes head of queue blocking and restrictions on partial message write operations even within a single Stream. The Payload Protocol Identifier (PPI) acts as a reference to definitions for the use of the shim layer. PPI rules for handling the shim layer MUST consider whether messages are to be sent concurrently or if messages are to be serialized. If messages are to be processed in their starting order, then the final fragment of each message MUST be serialized by this starting order, as example. If order is not important, then messages may be sent on any Stream and allowed to complete sending at any time. Otis Expires - October 2002 [Page 2] Internet Draft SCTP DDP Adaptation April 11, 2002 If message signaling order is required, then the set of messages retaining order MUST be sent on the same Stream. If Placement order is required, then acknowledgement either from the ULP or the transport, as defined by the PPI, MUST be utilized and, as such, these messages may be sent on any Stream. The DDP layer does not constrain how messages are ordered, but implies potential sending limitations depending on the ordering rules desired. In the ANONYMOUS_MODE, messages are placed into arbitrary buffers. The TSN, DDP_Flags, DDP_Tag, and DDP_Offset are saved with the arbitrary buffer containing the message. Multiple fragments of a single message may be placed into a single arbitrary buffer using the DDP_Offset to place the message into a common buffer if size permits. The DDP_Tag MUST be unique for every message pending acknowledgement. Each message may have DDP_Context included. The use of the DDP_Context information is not defined by this shim layer. If this information is present, each message delineator may link arbitrary buffers containing Context information associated with each message or message fragment. This context link may be in the form of a common status queue, as example. 1.1 Conventions The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, NOT RECOMMENDED, MAY, and OPTIONAL, when they appear in this document, are to be interpreted as described in [RFC2119]. DDP is a mnemonic for Direct Data Placement and ULP is for Upper-Level Protocol. 2. Adaptation Layer Formats 2.1 Adaptation Layer Indicator Three separate adaptation layers are defined which MAY appear in the INIT or INIT-ACK with the following format as defined in [STEWa]. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type =0xC006 | Length = Variable | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | DDP Adaptation Indication | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Otis Expires - October 2002 [Page 3] Internet Draft SCTP DDP Adaptation April 11, 2002 DDP Adaptation Indication: The following values are allowed and one of them MUST be present to enable specific behaviors defined in this document: DDP_ANONYMOUS_PLACE - 0x00000001 DDP_STREAM_PLACE - 0x00000002 DDP_TAG_PLACE - 0x00000003 If DDP_ANONYMOUS_PLACE is specified, message semantics are delineated by Stream and the buffer is referenced anonymously (arbitrarily selected by the receiver). The DDP Offset will contain a message byte displacement. The DDP Tag field is passed- through but holds no meaning for the shim except to isolate individual messages. If DDP_STREAM_PLACE is specified, message semantics are delineated by Stream that references the user buffer where length is the only range limit. The DDP Offset will contain a message byte displacement. The DDP Tag field is passed-through but holds no meaning for the shim. If DDP_TAG_PLACE is specified, then the DDP Tag references the user buffer and delineates message semantics. The DDP Offset will contain a message byte offset within the user buffer. For Upper-Level Protocols that utilize the DDP shim, the Payload Protocol Identifier will indicate either a null value (0) or an IANA registered protocol identity. 2.2 DATA Chunk Format The following format MUST be used on all DATA Chunks. Note that the format expands the existing DATA Chunk but where direct placement fields are considered user data by the SCTP stack. In addition, to allow immediate placement, all DATA Chunks are sent as Unordered and the shim is required to perform all message fragmentation prior to being delivered to SCTP where SCTP is placed in a mode to refuse messages larger than the path MTU. Otis Expires - October 2002 [Page 4] Internet Draft SCTP DDP Adaptation April 11, 2002 Common Header: (for Data Chunk) 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type = 0 | Reserved|U|B|E| Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | TSN | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Stream Identifier S | Stream Sequence Number n | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Payload Protocol Identifier (PPI) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ DDP Header: (Common Header extension) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | DDP Mode | DDP Flags | DDP Header Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | DDP Tag | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + DDP Offset + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ \ \ / (DDP Context) / \ \ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ DDP Data: (follows DDP Header and included in Common Header) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ \ \ / DDP Data (seq n of Stream S) / \ \ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Note, the following fields: Type, Reserved, U,B,E Length, TSN, Stream Identifier, Stream Sequence Number, Payload Protocol Identifier are defined in [RFC2960] where the reader should refer for any details for these fields. In all modes, Type will always be 0, the U, B, and E flags MUST be set, the Length will indicate the unpadded length of the DATA Chunk, the TSN will represent a unique value associated with the DATA Chunk, the Stream Identifier will indicate the Stream a message was sent, the Stream Sequence Number Otis Expires - October 2002 [Page 5] Internet Draft SCTP DDP Adaptation April 11, 2002 is invalid, and the Payload Protocol Identifier will be determined by the layer above the shim. DDP Mode: 8 bits (unsigned integer) This field will hold one of the following values indicating a valid DDP extension: 0x01 - ANONYMOUS_MODE. In this mode, placement is into anonymous buffers. The Offset will contain a message byte displacement. Data in this chunk is not directly placed into user buffers. The DDP Tag field MAY contain information used by the ULP above the shim. The DDP Tag field will be repeated for all fragments within a message and used to isolate messages. 0x02 - STREAM_MODE. In this mode, Direct Data Placement uses Stream to reference user buffers. This mode MUST only be used when the adaptation layer indication was DDP_STREAM_PLACE. The DDP Tag field MAY contain information used by the ULP above the shim. The DDP Tag field will be repeated for all fragments within a message. 0x03 - TAG_MODE. In this mode, Direct Data Placement uses the DDP Tag to reference user buffers. This mode MUST only be used when the adaptation layer indication was DDP_TAG_PLACE. The DDP Tag field SHALL contain an index used in conjunction with the Stream field to reference placement information. The DDP Tag field will be repeated for all fragments within a message. DDP Flags: 8 bits (unsigned integer) Bit 0 - Acknowledgement Requested. A signal provided to the ULP above the shim to indicate an Acknowledgement was requested upon message reception. Bit 1 - Disclose. A signal provided to the ULP above the shim to indicate message reception that MAY invoke a process related to the current buffer. Bit 2 - Release Buffer. A signal provided to the ULP that the current buffer MAY be released to a process upon message reception. Bit 3-7 - Reserved. Message signals are held until the cumulative TSN is greater than or equal to the TSN of the DATA chunk carrying the DDP Flag field. Comparisons and arithmetic on TSNs in this document SHOULD use Otis Expires - October 2002 [Page 6] Internet Draft SCTP DDP Adaptation April 11, 2002 Serial Number Arithmetic as defined in [RFC1982] where SERIAL_BITS = 32. DDP Header Length: 16 bits (unsigned integer) This value represents the size of the DDP Header in bytes including the DDP Mode, DDP Flags, DDP Header Length, DDP Tag, DDP Offset, and optional DDP Context field but excluding DDP Data. Therefore, if the DDP Context field is zero-length, the DDP Header Length field will be set to 16. The DDP Header Length field does not count any padding. DDP Tag: 32 bits (unsigned integer) When the DDP Mode is set to ANONYMOUS_MODE or STREAM_MODE, this may hold information used by the ULP; otherwise, it holds the reference to the user buffer. This tag is used to lookup the actual buffer address, limits, and restrictions in the local endpoints tag lookup cache. DDP Offset: 64 bits (unsigned integer) When the DDP Mode is not set to ANONYMOUS_MODE or STREAM_MODE, this value holds the placement byte offset for the DDP Data. The local endpoint MUST verify the offset is within a valid range for the user buffer. If the DDP Mode is set to ANONYMOUS_MODE or STREAM_MODE, then this value SHALL hold the message byte displacement where only the least significant 32 bits are valid. The Message byte displacement is always from the origin of the message. DDP Context: variable byte length The DDP Context is optional information used by the shim layer. The Payload Protocol Identifier defines the handling of this information. The total length of a DDP Header (including DDP Mode, DDP Flags, DDP Header Length, DDP Tag, DDP Offset and DDP Context fields) MUST be a multiple of 4 bytes. If the DDP Header Length is not a multiple of 4 bytes, the sender MUST pad the DDP Context field with all zero bytes and this padding is not included in the DDP Header Length field. The sender SHOULD never pad with more than 3 bytes. The receiver MUST not include padding bytes in the DDP Context. DDP Data: variable byte length This Data field is aligned to a 32-bit boundary immediately following the DDP Context. The DDP Offset field affects the placement of this field. Otis Expires - October 2002 [Page 7] Internet Draft SCTP DDP Adaptation April 11, 2002 3. Procedures 3.1 Association Initialization At the startup of an association, an endpoint wishing to perform DDP placement MUST include an adaptation layer indication in its INIT or INIT-ACK (as defined in 2.1). After the exchange of the first two messages (INIT and INIT-ACK), an endpoint MUST verify that the peer supports the DDP Mode by confirmation that the peer included one of the adaptation indications. If the peer did specify a DDP adaptation, then ALL DATA chunks MUST contain the header extensions specified in section 2.2 and the endpoint SHOULD enable the indicated adaptation. The value of the Payload Protocol Identifier in subsequent Data Chunks is defined by the ULP. If the peer endpoint did NOT specify a DDP placement adaptation then the local endpoint MUST disable DDP adaptation and it MUST NOT send DATA chunks with the additional fields as specified in section 2.2. 3.2 DDP Data Placement 3.2.1 Receiver Side Behavior When a DATA chunk arrives and DDP Placement adaptation has been enabled, the following procedures MUST be performed. R1 - If the DDP Mode is set to STREAM_MODE and the peer endpoint did not indicate DDP_STREAM_PLACE in its adaptation indication, the endpoint MUST abort the association. R2 - If the DDP Mode is set to TAG_MODE and the peer endpoint did not indicate DDP_TAG_PLACE in its adaptation indication, the endpoint MUST abort the association. R3 - If the DDP Mode is set to a recognized mode other than ANONYMOUS_MODE, the endpoint MUST use its lookup cache to determine the buffer to receive the payload of this DATA chunk. The TAG_MODE uses DDP Tag to obtain buffer related information related to the Stream. The buffer SHOULD be indexed by DDP Offset and the data SHOULD be directly placed within the buffer. Note: Great caution MUST be taken when referencing buffers with offsets. The DDP Tag SHOULD NOT be a direct memory address but instead an index to be translated into a memory address, memory limits, and restrictions. The DDP Offset MUST be carefully verified to assure that the offset is within the valid range of the indicated buffer. If any data placement specification is incorrect, the association SHOULD be aborted. Otis Expires - October 2002 [Page 8] Internet Draft SCTP DDP Adaptation April 11, 2002 R4 - Otherwise, if the DDP Mode is set to ANONYMOUS_MODE, the endpoint MUST pass the message to into anonymous buffers for the process associated with the Stream. R5 - Send signals to processes associated with the buffers when the cumulative TSN is greater than or equal to the TSN of the DATA chunk carrying the message signals held in the DDP Flag field. 3.2.2 Sender Side Behavior The sender of a message MUST always include the DDP Header extension if a DDP adaptation is enabled. The sender MUST perform the following when sending data: S1 - If DDP_ANONYMOUS_PLACE was specified by the sender in its adaptation indication, the DDP Mode MUST be set to the value of ANONYMOUS_MODE. S2 - If the message is not to be directly placed into a user buffer (such as a negotiation message or a request), the sender MUST specify the value of ANONYMOUS_MODE in the DDP Mode and the DDP Offset field will contain a message byte displacement. The DDP Tag field may contain information used by the ULP. S2 - If the message is to be directly placed into a user buffer, the DDP Mode SHOULD be set to the appropriate STREAM_MODE or TAG_MODE. The PPI, Stream, DDP Tag, DDP Offset, and optional DDP Context fields are set. Message signals SHOULD be placed into the DDP Flags field in the outgoing DATA chunk. For messages fragmented by the shim, only the last DATA chunk of the message will include the message signals in the DDP Flag field and each subsequent fragment will have the DDP Offset byte value advanced according to the sum of each previous fragment size. The PPI, DDP Context, and DDP Tag, if not in TAG_MODE, are fields passed to the ULP above the receiving shim layer. 4. IANA considerations This document defines three new Adaptation Layer Indications as specified within section 2.1. 5. Security Considerations Any direct placement of memory poses a significant security risk. Great caution MUST be taken when referencing offsets to memory addresses in behalf of peer endpoints. The DDP Tag SHOULD NOT be a direct memory address passed to a peer but instead an index to be translated into a memory address. Otis Expires - October 2002 [Page 9] Internet Draft SCTP DDP Adaptation April 11, 2002 The DDP Offset MUST be carefully verified to assure that the offset is within a valid range of the buffer. If any data placement specification is incorrect the association SHOULD be aborted. 6. Acknowledgments The author would like to thank the following people that have provided comments and input- Randall Stewart, Stephen Bailey, Allyn Romanow, David Black, and Caitlin Bestler. 7. Authors' Addresses Douglas Otis 50 W San Fernando St. Suite 420 San Jose, CA 95113-2429 USA Email dotis@sanlight.net 8. References [RFC1982] Elz, R. and R. Bush, "Serial Number Arithmetic", RFC 1982, August 1996. [RFC2026] Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, RFC 2026, October 1996. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2960] R. Stewart, Q. Xie, K. Morneault, C. Sharp, H. J. Schwarzbauer, T. Taylor, I. Rytina, M. Kalla, L. Zhang, and, V. Paxson, "Stream Control Transmission Protocol", RFC 2960, October 2000. [STEWa] Stewart, Ramalho, Xie, Tuexen, Rytina, Conrad, "SCTP Extensions for Dynamic Reconfiguration of IP Addresses", November 2001, draft-ietf-tsvwg-addip-sctp-04.txt, work-in-progress. [DDP-IOV] Otis, "IO Vectoring to support DDP", March 2002, draft- otis-ddp-iov-00.txt, work-in-progress. 9. Full Copyright Statement Copyright (C) The Internet Society (2002). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, Otis Expires - October 2002 [Page 10] Internet Draft SCTP DDP Adaptation April 11, 2002 published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process MUST be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Funding for the RFC Editor function is currently provided by the Internet Society. Otis Expires - October 2002 [Page 11]