Network Working Group D. Otis Internet Draft SANlight Document: draft-otis-sctp-ddp-00.txt Expires: September, 2002 March 22, 2002 SCTP DDP Adaptation Status of this Memo This document is an internet-draft and is in full conformance with all provisions of Section 10 of [RFC2026]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract In many applications, direct placement of data without the overhead of multiple copies or excessive context switching is a desirable feature. To accomplish this goal, a direct placement adaptation layer is defined within this document. This draft proposes a small shim that sits above SCTP and that possibly places data directly into a user buffer. The ultimate goal is to have placement occur by the network interface card, where this shim will coordinate such placement while proper network layering is maintained. As SCTP was not designed to directly handle offset based fragmentation, the shim must handle message fragmentation to introduce the proper offsets as well as determine message reception as a result of the required unordered delivery needed for immediate placement. Table of Content 1 Introduction 2 1.1 Conventions 2 2 Adaptation Layer Formats 2 2.1 Adaptation Layer Indicator 2 2.2 DATA chunk format 3 3 Procedures 6 3.1 Association Initialization 6 3.2 DDP Data Placement 7 3.2.1 Receiver Side Behavior 7 3.2.2 Sender Side Behavior 7 4 IANA considerations 8 Otis SCTP DDP Adaptation Page [2] 5 Security Considerations 8 6 Acknowledgments 8 7 Authors' Addresses 9 8 References 9 1 Introduction In many applications, direct placement of data without the overhead of multiple copies or excessive context switching is a desirable feature. To accomplish this goal, a direct placement adaptation layer is defined within this document. A small shim sitting directly above SCTP enables data to be directly placed into user buffers without assembly buffering. This assumes hardware able to validate each DATA chunk as received prior to placement and each DATA Chunk carries an offset within an identified user buffer. Some implementations may include this adaptation layer within their SCTP implementations to obtain maximum performance, but the behavior of SCTP will be unaffected. In order to accomplish this, this draft specifies the use of the new adaptation layer indication as defined in [STEWa]. The definition of the Direct Data Placement vectors together with associations with Streams and Payload Protocol Identifiers together with multi-adapter cumulative TSN synchronization is defined in [DDP-IOV] draft. As messages are sent in unordered data chunks, only the final message fragment carries the signals that denote a message boundary. These signals are contained in the DDP Flags field and are held by the shim layer until qualified by the cumulative TSN. As a result of this together with the use of DDP Tags, there is no head of queue blocking nor restrictions on partial message write operations even if within a single Stream. 1.1 Conventions The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, NOT RECOMMENDED, MAY, and OPTIONAL, when they appear in this document, are to be interpreted as described in [RFC2119]. DDP is a mnemonic for Direct Data Placement and ULP is for Upper-Level Protocol. 2 Adaptation Layer Formats 2.1 Adaptation Layer Indicator Three separate adaptation layers are defined which MAY appear in the INIT or INIT-ACK with the following format as defined in [STEWa]. Otis SCTP DDP Adaptation Page [3] 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type =0xC006 | Length = Variable | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | DDP Adaptation Indication | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ DDP Adaptation Indication: The following values are allowed and one of them MUST be present to enable specific behaviors defined in this document: DDP_ANONYMOUS_PLACE - 0x00000001 DDP_STREAM_PLACE - 0x00000002 DDP_TAG_PLACE - 0x00000003 If DDP_ANONYMOUS_PLACE is specified, message semantics are delineated by Stream and the buffer is referenced anonymously (arbitrarily selected by the receiver). The DDP Offset will contain either zero or a message byte displacement. The DDP Tag field is passed-through but holds no meaning for the shim. If DDP_STREAM_PLACE is specified, message semantics are delineated by Stream that references the user buffer where length is the only range limit. The DDP Offset will contain a message byte displacement. The DDP Tag field is passed-through but holds no meaning for the shim. If DDP_TAG_PLACE is specified, then the DDP Tag references the user buffer and delineates message semantics. The DDP Offset will contain a message byte offset within the user buffer. For Upper-Level Protocols that utilize the DDP shim, the Payload Protocol Identifier will indicate either a null value (0) or an IANA registered protocol identity. 2.2 DATA chunk format The following format MUST be used on all DATA chunks. Note that the format expands the existing DATA chunk but where direct placement fields are considered user data by the SCTP stack. In addition, to allow immediate placement, all DATA chunks are sent as Unordered and the shim is required to perform all message fragmentation prior to being delivered to SCTP where SCTP is placed in a mode to refuse messages larger than the path MTU. Otis SCTP DDP Adaptation Page [4] Common Header: (for Data Chunk) 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type = 0 | Reserved|U|B|E| Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | TSN | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Stream Identifier S | Stream Sequence Number n | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Payload Protocol Identifier (PPI) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ DDP Header: (Common Header extension) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | DDP Mode | DDP Flags | DDP Header Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | DDP Tag | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + DDP Offset + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ \ \ / (DDP Context) / \ \ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ DDP Data: (follows DDP Header and included in Common Header) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ \ \ / DDP Data (seq n of Stream S) / \ \ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Note, the following fields: Type, Reserved, U,B,E Length, TSN, Stream Identifier, Stream Sequence Number, Payload Protocol Identifier are defined in [RFC2960] where the reader should refer for any details for these fields. In the case for STREAM_MODE and TAG_MODE, Type will always be 0, the U, B, and E flags MUST be set, the Length will indicate the unpadded length of the DATA chunk, the TSN will represent a unique value associated with the DATA chunk, the Stream Identifier will indicate the Stream a message was sent, the Stream Sequence Number is invalid, and the Payload Protocol Identifier will be determined by the layer above the shim. An exception can occur when ANONYMOUS_MODE is active, the U, B, and E flags may be any Otis SCTP DDP Adaptation Page [5] value, and the Stream Sequence Number will be valid if the U flag is not set. DDP Mode: 8 bits (unsigned integer) This field will hold one of the following values indicating a valid DDP extension: 0x01 - ANONYMOUS_MODE. In this mode, placement is into anonymous buffers. The Offset will contain either zero or a message byte displacement. Data in this chunk is not directly placed into user buffers. The DDP Tag field MAY contain information used by the ULP above the shim. 0x02 - STREAM_MODE. In this mode, Direct Data Placement uses Stream to reference user buffers. This mode MUST only be used when the adaptation layer indication was DDP_STREAM_PLACE. The DDP Tag field MAY contain information used by the ULP above the shim. 0x03 - TAG_MODE. In this mode, Direct Data Placement uses the DDP Tag to reference user buffers. This mode MUST only be used when the adaptation layer indication was DDP_TAG_PLACE. DDP Flags: 8 bits (unsigned integer) Bit 0 - Acknowledgement Requested. A signal provided to the ULP above the shim to indicate an Acknowledgement was requested upon message reception. Bit 1 - Disclose. A signal provided to the ULP above the shim to indicate message reception that MAY invoke a process related to the current buffer. Bit 2 - Release Buffer. A signal provided to the ULP that the current buffer MAY be released to a process upon message reception. Bit 3-7 - Reserved. Message signals are held until the cumulative TSN is greater than or equal to the TSN of the DATA chunk carrying the DDP Flag field. Comparisons and arithmetic on TSNs in this document SHOULD use Serial Number Arithmetic as defined in [RFC1982] where SERIAL_BITS = 32. DDP Header Length: 16 bits (unsigned integer) This value represents the size of the DDP Header in bytes including the DDP Mode, DDP Flags, DDP Header Length, DDP Tag, DDP Offset, and optional DDP Context field but excluding DDP Data. Therefore, if the DDP Context field is zero-length, the DDP Header Length field will be set to 16. The DDP Header Length field does not count any padding. Otis SCTP DDP Adaptation Page [6] DDP Tag: 32 bits (unsigned integer) When the DDP Mode is set to ANONYMOUS_MODE or STREAM_MODE, this may hold information used by the ULP, otherwise, it holds the reference to the user buffer. This tag is used to lookup the actual buffer address, limits, and restrictions in the local endpoints tag lookup cache. DDP Offset: 64 bits (unsigned integer) When the DDP Mode is not set to ANONYMOUS_MODE, this value holds the placement byte offset for the DDP Data. The local endpoint MUST verify the offset is within a valid range for the user buffer. If the DDP Mode is set to ANONYMOUS_MODE, then this value shall hold either zero or the message byte displacement. DDP Context: variable byte length The DDP Context is optional information used by the shim layer. The Payload Protocol Identifer defines the handling of this information. The total length of a DDP Header (including DDP Mode, DDP Flags, DDP Header Length, DDP Tag, DDP Offset and DDP Context fields) MUST be a multiple of 4 bytes. If the DDP Header Length is not a multiple of 4 bytes, the sender MUST pad the DDP Context field with all zero bytes and this padding is not included in the DDP Header Length field. The sender should never pad with more than 3 bytes. The receiver MUST not include padding bytes in the DDP Context. DDP Data: variable byte length This Data field is aligned to a 32-bit boundary immediately following the DDP Context. The DDP Offset field may affect the placement of this field as the user data. 3 Procedures 3.1 Association Initialization At the startup of an association, an endpoint wishing to perform DDP placement MUST include an adaptation layer indication in its INIT or INIT-ACK (as defined in 2.1). After the exchange of the first two messages (INIT and INIT-ACK), an endpoint MUST verify that the peer supports the DDP Mode by confirmation that the peer included one of the adaptation indications. If the peer did specify a DDP adaptation, then ALL DATA chunks MUST contain the header extensions specified in section 2.2 and the endpoint SHOULD enable the indicated adaptation. The value of the Payload Protocol Identifier in subsequent Data Chunks is defined by the ULP. If the peer endpoint did NOT specify a DDP placement adaptation then the local endpoint MUST disable DDP adaptation and it MUST NOT send DATA chunks with the additional fields as specified in section 2.2. Otis SCTP DDP Adaptation Page [7] 3.2 DDP Data Placement 3.2.1 Receiver Side Behavior When a DATA chunk arrives and DDP Placement adaptation has been enabled, the following procedures MUST be performed. R1 - If the DDP Mode is set to STREAM_MODE and the peer endpoint did not indicate DDP_STREAM_PLACE in its adaptation indication, the endpoint MUST abort the association. R2 - If the DDP Mode is set to TAG_MODE and the peer endpoint did not indicate DDP_TAG_PLACE in its adaptation indication, the endpoint MUST abort the association. R3 - If the DDP Mode is set to a recognized mode other than ANONYMOUS_MODE, the endpoint MUST use its lookup cache to determine the buffer to receive the payload of this DATA chunk. For modes using DDP Tag, this field SHOULD be used to obtain buffer related information. The buffer SHOULD be indexed by DDP Offset and the data SHOULD be directly placed within the buffer. Note: Great caution must be taken when referencing buffers with offsets. The DDP Tag SHOULD NOT be a direct memory address but instead an index to be translated into a memory address, memory limits, and restrictions. The DDP Offset must be carefully verified to assure that the offset is within the valid range of the indicated buffer. If any data placement specification is incorrect, the association SHOULD be aborted. R4 - Otherwise, if the DDP Mode is set to ANONYMOUS_MODE, the endpoint MUST pass the message to into anonymous buffers for the process associated with the Stream. R5 - Send signals to processes associated with the buffers when the cumulative TSN is greater than or equal to the TSN of the DATA chunk carrying the message signals held in the DDP Flag field. 3.2.2 Sender Side Behavior The sender of a message MUST always include the DDP Header extension if a DDP adaptation is enabled. The sender MUST perform the following when sending data: S1 - If DDP_ANONYMOUS_PLACE was specified by the sender in its adaptation indication, the DDP Mode must be set to the value of ANONYMOUS_MODE. S2 - If the message is not to be directly placed into a user buffer (such as a negotiation message or a request), the sender MUST specify the value of ANONYMOUS_MODE in the DDP Mode and the DDP Otis SCTP DDP Adaptation Page [8] Offset field will contain either zero or a message byte displacement. The DDP Tag field may contain information used by the ULP. S2 - If the message is to be directly placed into a user buffer, the DDP Mode SHOULD be set to the appropriate STREAM_MODE or TAG_MODE. The PPI, Stream, DDP Tag, DDP Offset, and optional DDP Context fields are set. Message signals SHOULD be placed into the DDP Flags field in the outgoing DATA chunk. For messages fragmented by the shim, only the last DATA chunk of the message will include the message signals in the DDP Flag field and each subsequent fragment will have the DDP Offset byte value advanced according to the sum of each previous fragment size. The PPI, DDP Context, and DDP Tag, if not in TAG_MODE, are fields passed to the ULP above the receiving shim layer. 4 IANA considerations This document defines three new Adaptation Layer Indications as specified within section 2.1. 5 Security Considerations Any direct placement of memory poses a significant security risk. Great caution must be taken when referencing offsets to memory addresses in behalf of peer endpoints. The DDP Tag SHOULD NOT be a direct memory address passed to a peer but instead an index to be translated into a memory address. The DDP Offset must be carefully verified to assure that the offset is within a valid range of the buffer. If any data placement specification is incorrect the association SHOULD be aborted. 6 Acknowledgments The author would like to thank the following people that have provided comments and input- Randall Stewart, Stephen Bailey, Allyn Romanow, and Caitlin Bestler. Otis SCTP DDP Adaptation Page [9] 7 Authors' Addresses Douglas Otis 800 E. Middlefield Mountain View, CA 94043 USA Email dotis@sanlight.net 8 References [RFC1982] Elz, R. and R. Bush, "Serial Number Arithmetic", RFC 1982, August 1996. [RFC2026] Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, RFC 2026, October 1996. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2960] R. Stewart, Q. Xie, K. Morneault, C. Sharp, H. J. Schwarzbauer, T. Taylor, I. Rytina, M. Kalla, L. Zhang, and, V. Paxson, "Stream Control Transmission Protocol", RFC 2960, October 2000. [STEWa] - Stewart, Ramalho, Xie, Tuexen, Rytina, Conrad, "SCTP Extensions for Dynamic Reconfiguration of IP Addresses", November 2001, draft-ietf-tsvwg-addip-sctp-03.txt, work-in-progress. Otis SCTP DDP Adaptation Page [10] Full Copyright Statement Copyright (C) The Internet Society (2002). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Funding for the RFC Editor function is currently provided by the Internet Society.