INTERNET-DRAFT John Lazzaro January 16, 2003 CS Division Expires: July 16, 2003 UC Berkeley Framing RTP and RTCP Packets over Connection-Oriented Transport Status of this Memo This document is an Internet-Draft and is subject to all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Abstract This memo defines a method for framing Real Time Protocol (RTP) and Real Time Control Protocol (RTCP) packets onto connection-oriented transport (such as TCP and TLS). We show how a session description may specify the use of the framing method. We also show how to use the framing method with the Real Time Streaming Protocol (RTSP). Lazzaro [Page 1] INTERNET-DRAFT 16 January 2003 1. Introduction The Audio/Video Profile (AVP, [1]) for the Real-Time Protocol (RTP, [2]) does not define a method for framing RTP and Real Time Control Protocol (RTCP) packets onto connection-oriented transport protocols (such as TCP and TLS). However, earlier versions of [1] did define a framing method, and this method is in use in several implementations. In this memo, we document this framing method. We show how a session description [4] may specify the use of the framing method, and show how to use the framing method with the Real Time Streaming Protocol (RTSP, [5]). 2. The Framing Method Figure 1 defines the framing method. A 16-bit unsigned integer LENGTH field, coded in network byte order (big-endian), begins the frame. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 --------------------------------------------------------------- | LENGTH | RTP or RTCP packet ... | --------------------------------------------------------------- Figure 1 -- The bitfield definition of the framing method If LENGTH is non-zero, an RTP or RTCP packet follows the LENGTH field. The value coded in the LENGTH field MUST equal the number of octets in the RTP or RTCP packet. Zero is a valid value for LENGTH, and codes the null packet. This framing method does not use frame markers (i.e. an octet of constant value that would precede the LENGTH field). Frame markers are useful for detecting errors in the LENGTH field. In lieu of a frame marker, receivers SHOULD monitor the RTP and RTCP header fields whose values are predictable (for example, the RTP version number). Lazzaro [Page 2] INTERNET-DRAFT 16 January 2003 3. Undefined Properties The framing method does not specify properties above the level of a single packet. In particular, Section 2 does not specify: o The number of RTP or RTCP streams on the connection. The framing method is commonly used for sending a single RTP or RTCP stream over a connection. However, Section 2 does not define this common use as normative, so that (for example) a memo that defines an RTP SSRC multiplexing protocol may use the framing method. o Bi-directional issues. Section 2 defines a framing method for use in one direction on a connection. The relationship between framed packets flowing in defined direction and in the reverse direction is not specified. o Packet loss and reordering. The reliable nature of a connection does not imply that a framed RTP stream has a contiguous sequence number ordering. For example, if the connection is used to tunnel a UDP stream through a network middlebox that only passes TCP, the sequence numbers in the framed stream reflect any packet loss or reordering on the UDP portion of the end-to-end flow. o Out-of-band semantics. Section 2 does not define the RTP or RTCP semantics for closing a TCP socket, or of any other "out of band" signal for the connection. Memos that normatively include the framing method MAY specify these properties. Lazzaro [Page 3] INTERNET-DRAFT 16 January 2003 4. Session Descriptions for RTP/AVP over TCP or TLS [3] defines how to specify connection-oriented media streams in session descriptions. In this section, we show how to use [3] with the framing method. Figure 2 shows the syntax of a media (m=) line [4] of a session description: "m=" media SP port ["/" integer] SP proto 1*(SP fmt) CRLF Figure 2 -- Syntax for an SDP media (m=) line (from [4]). [3] defines "TCP" and "TLS" as the tokens that specify TCP or TLS transport. We now define how to specify that an RTP/AVP stream that uses the framing method appears on the TCP or TLS connection. At least two tokens MUST follow . The first token MUST be the constant "RTP/AVP". Subsequent tokens MUST be unique unsigned integers in the range 0 to 127, that specify an RTP payload type associated with the stream. Lazzaro [Page 4] INTERNET-DRAFT 16 January 2003 The session descriptions in Figure 3 and Figure 4 show how to apply this syntax definition. v=0 o=first 2520644554 2838152170 IN IP4 first.example.net s=Example t=0 0 c=IN IP4 192.0.2.105 m=audio 9 TCP RTP/AVP 11 a=direction:active Figure 3 -- TCP session description for first participant. v=0 o=second 2520644554 2838152170 IN IP4 second.example.net s=Example t=0 0 c=IN IP4 192.0.2.94 m=audio 16112 TCP RTP/AVP 10 11 a=direction:passive Figure 4 -- TCP session description for second participant. Figures 3 and 4 define two parties that participate in a connection- oriented RTP/AVP session. The first party (Figure 3) is capable of receiving stereo L16 streams (static payload type 11, as defined in [1]). The second party (Figure 4) is capable of receiving mono (static payload type 10, as defined in [1]) or stereo L16 streams. [3] defines procedures for initiating TCP and TLS connections at the start of a session. Session descriptions may use the "direction" attribute to customize these procedures. For example, the direction attribute in Figure 3 specifies that the first party is an "active" party that initiates TCP connections. The direction attribute in Figure 4 specifies that the second party is a "passive" party that accepts TCP connections. In the example, the first party connects to the network address (192.0.2.94) and port (16112) of the second party. Once the connection is established, it is used bi- directionally: each party sends and receives on the connection. Lazzaro [Page 5] INTERNET-DRAFT 16 January 2003 We now normatively define the RTP/AVP semantics for the port information on media lines that use TCP or TLS tokens: The connection a receiver accepts on the port specified on the media line exclusively carries RTP packets. If a media stream uses RTCP, a second connection exclusively carries RTCP packets. The port number on which the receiver accepts a connection for RTCP is chosen using the algorithms defined in [4] and related documents. These algorithms may use the RTP port number as a parameter. In our example, the first party initiates a TCP connection to port 16112 of the second party. Once the connection is established, the first party sends framed RTP packets to the second party on one direction of the connection, and the second party sends framed RTP packets to the first party in the other direction of the connection. In addition, the first party initiates an RTCP TCP connection to port 16113 (16112 + 1, as defined in [4]) of the second party. Once the connection is established, the first party sends framed RTCP packets to the second party on one direction of the connection, and the second party sends framed RTCP packets to the first party in the other direction of the connection. To conclude this session, we revisit the list of undefined framing properties in Section 3. For each property, we define the normative semantics when used with [3]. o The number of RTP or RTCP streams on the connection. The connection carries RTP or RTCP packets for the streams defined on the media lines associated with the connection. o Bi-directional issues. Both directions of a connection carry the same type of packets (RTP or RTCP). o Packet loss and reordering. An RTP stream has a contiguous sequence number ordering. Packets in an RTCP stream are never "lost", and appear in the order defined in [2]. o Out-of-band semantics. As defined in [3]. Lazzaro [Page 6] INTERNET-DRAFT 16 January 2003 5. RTP/AVP streams over TCP or TLS in RTSP. The Real Time Streaming Protocol (RTSP, [5]) is a session management tool for content streaming applications. RTSP defines the SETUP method, that clients and servers may use to agree on the network transport for an RTP/AVP media stream. The SETUP method provides a way to specify that a TCP connection will be established to carry an RTP/AVP stream. To specify this behavior in a SETUP request or reply, the "transport-spec" parameter on the Transport line should be set to "RTP/AVP/TCP", and the "interleaved" parameter should NOT present on the Transport line. However, [5] does not define operational details for this mode of operation. Below, we define semantics for using the framing method with RTSP. o The Transport line parameters "destination" and "source" indicate the network addresses on which the client (destination) and server (source) are willing to accept a TCP connection. o The Transport line parameter "client_port" codes the port on which a client wishes to accept a TCP connection that carries RTP packets. The Transport line parameter "server_port" codes the port on which a server wishes to accept a TCP connection that carries RTP packets. The discard port value (9) codes an unwillingness to accept connections. o The Transport line parameter "client_rtcp_port" codes the port on which a client wishes to accept a TCP connection that carries RTCP packets. The Transport line parameter "server_rtcp_port" codes the port on which a server wishes to accept a TCP connection that carries RTCP packets. The discard port value (9) codes an unwillingness to accept connections. If client_rtcp_port and server_rtcp_port are not present on the Transport line, the stream does not use RTCP. o The semantics for establishing connections follow [3]. If both client_port and server_port parameter values are not 9, the semantics of the "both" direction attribute [3] apply. Otherwise, the party whose port is set to 9 follows the semantics of the "active" direction attribute, and the party whose port it not set to 9 follows the semantics of the "passive" direction attribute. The semantics for client_rtcp_port and server_rtcp_port work in the same way. Lazzaro [Page 7] INTERNET-DRAFT 16 January 2003 Once an RTP TCP connection is established, the server uses the framing method to send RTP packets over the connection to the server. If two RTP TCP connections are inadvertently established, server and client follow the instructions in [3] to manage the two connections. If RTCP is in use, once the RTCP TCP connection is established, the server and client use the framing method to exchange RTCP packets over the connection. If two RTCP TCP connections are inadvertently established, server and client follow the instructions in [3] to manage the two connections. The undefined framing properties (Section 3) for these connections follow the properties defined in Section 4 for use with [3]. 6. Congestion Control Applications that send RTP/AVP streams over UDP transport have a responsibility to implement congestion control, in order to protect the network. Applications are relieved of this responsibility for RTP/AVP streams framed on connection-oriented transport, as the transport implements its own congestion control. However, apart from responsibility issues, senders SHOULD implement RTP congestion control over connection-oriented transport, in order to improve the quality of the received media stream under adverse network conditions. 7. Security Considerations Attackers may send framed packets with large LENGTH values, to exploit security holes in applications. For example, a C implementation may declare a 1500-byte array as a stack variable, and use LENGTH as the bound on the loop that reads the framed packet into the array. This code would work fine for friendly applications that use Etherframe-sized RTP packets, but may be open to exploit by an attacker. 8. Acknowledgements This memo, in part, documents discussions on the AVT mailing list about TCP and RTP. Thanks to all of the participants in these discussions. Lazzaro [Page 8] INTERNET-DRAFT 16 January 2003 Appendix A. Normative References [1] H. Schulzrinne and S. Casner. RTP Profile for Audio and Video Conferences with Minimal Control. Work in progress, draft-ietf-avt-profile-new-12.txt. [2] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson. RTP: A transport protocol for real-time applications. Work in progress, draft-ietf-avt-rtp-new-11.txt. [3] D. Yon. Connection-Oriented Media Transport in SDP. . [4] M. Handley, V. Jacobson and C. Perkins. SDP: Session Description Protocol. Work in progress, draft-ietf-mmusic-sdp-new-10.txt. [5] H. Schulzrinne, A. Rao, and R. Lanphier. Real Time Streaming Protocol (RTSP). Work in progress, draft-ietf-mmusic-rfc2326bis-00.txt. Appendix A. Author Address John Lazzaro UC Berkeley CS Division 315 Soda Hall Berkeley CA 94720-1776 Email: lazzaro@cs.berkeley.edu Lazzaro [Page 9]