RTCWEB J. Lennox Internet-Draft Vidyo Intended status: Standards Track A. Romanow Expires: April 26, 2012 Cisco Systems October 24, 2011 Real-Time Transport Protocol (RTP) Usage for Telepresence Sessions draft-lennox-clue-rtp-usage-00 Abstract This document describes mechanisms and recommended practice for transmitting the media streams of telepresence sessions using the Real-Time Transport Protocol (RTP). Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on April 26, 2012. Copyright Notice Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Lennox & Romanow Expires April 26, 2012 [Page 1] Internet-Draft RTP Usage for Telepresence October 2011 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Source multiplexing . . . . . . . . . . . . . . . . . . . . . . 3 4. Demultiplexing sources . . . . . . . . . . . . . . . . . . . . 4 5. Transmission of presentation sources . . . . . . . . . . . . . 5 6. Other considerations . . . . . . . . . . . . . . . . . . . . . 5 7. Security Considerations . . . . . . . . . . . . . . . . . . . . 5 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 5 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 5 9.1. Normative References . . . . . . . . . . . . . . . . . . . 5 9.2. Informative References . . . . . . . . . . . . . . . . . . 6 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 6 Lennox & Romanow Expires April 26, 2012 [Page 2] Internet-Draft RTP Usage for Telepresence October 2011 1. Introduction Telepresence systems, of the architecture described by [I-D.ietf-clue-telepresence-use-cases] and [I-D.ietf-clue-telepresence-requirements], will send and receive multiple media streams, where the number of streams in use is potentially large and asymmetric between endpoints, and streams can come and go dynamically. These characteristics lead to a number of architectural design choices which, while still in the scope of potential architectures envisioned by the Real-Time Transport Protocol [RFC3550], must be fairly different than those typically implemented by the current generation of voice or video conferencing systems. This document makes recommendations about how streams should be encoded and transmitted in RTP for this telepresence architecture. 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119] and indicate requirement levels for compliant implementations. 3. Source multiplexing Telepresence sessions have lots of media streams: easily dozens at a time (given, e.g., a continuous presence screen in a multi-point conference), potentially out of a possible pool of hundreds. Furthermore, endpoints will have an asymmetric number of media streams. In such an environment the usual model of existing SIP endpoints -- sending zero or one source (in each direction) per RTP session -- doesn't scale, and mapping asymmetric numbers of sources to sessions is needlessly complex. Therefore, telepresence systems SHOULD use a single RTP session per media type, except where there's a need to give sessions different transport treatment. All sources of the same media type are sent over this single RTP session. This architecture (known as "source multiplexing") was defined by [RFC3550], but was used rarely until more recently by some Telepresence systems. Multiplexing multiple media streams in this way has additional advantages. It makes going through middle boxes considerably easier, as it allows Telepresence devices to work through SIP B2BUAs that do Lennox & Romanow Expires April 26, 2012 [Page 3] Internet-Draft RTP Usage for Telepresence October 2011 not support multiple media lines of the same media type. It also simplifies NAT and firewall traversal by allowing endpoint to deal with only a single address/port mapping per media type rather than multiple mappings. During call setup, a single RTP session is negotiated for each media type. In SDP, only one media line is negotiated per media and multiple media streams are sent over the same UDP channel negotiated using the SDP media line. 4. Demultiplexing sources The main issue to work out is how receivers demultiplex and interpret the multiple media streams. There are several alternatives that need to be analyzed in light of their advantages and disadvantages for different scenarios. In the CLUE Framework [I-D.ietf-clue-framework], different types of audio and video captures are distinguished. There are main and presentation video captures, and "switched" and "composite/mixed" synthetic captures [edt. names may chance in the Framework]. In RTP [RFC3550], a synchroniztion source (SSRC) value corresponds to a single stream of media, which is fed into a single decoding process. It therefore seems fairly clear that statically-transmitted captures, and probably composited/mixed captures, should be distinguished by SSRC values. There may, however, be additional meta-data needed as well, for instance to communicate layout decisions or recommendations about the sources For switched sources, however, things are less clear. Switched sources could be sent with an SSRC corresponding to the switched source, with some meta-data indicating the original source that is being sent at any given time. Alternately, they could be sent with an SSRC indicating the original source that is being transmitted, with meta-data indicating that the source is being sent because it corresponds to a receiver's requested switched source. In each case, this meta-data could either be sent in-band in the RTP session, or out-of-band using some sort of lightweight signaling protocol. If the meta-data is sent in the RTP session, it could either be sent as the RTP CSRC, or as a new RTP header tension [RFC5285]. In the semantics of RTP, the CSRC would seem to make sense for describing composited sources, and for switched sources if the SSRC corresponds to the switch rather than the original sources. RTP header extensions would probably make more sense for other cases. Lennox & Romanow Expires April 26, 2012 [Page 4] Internet-Draft RTP Usage for Telepresence October 2011 All these potential approaches need further study to identify their benefits and drawbacks. 5. Transmission of presentation sources Most existing videoconferencing systems use separate RTP sessions for main and presentation video sources, distinguished by the SDP content attribute [RFC4796]. It seems likely that telepresence systems should not maintain this transport-level separation, but should instead send both main and presentation video over the single video RTP session, in just the same manner as with multiple main video sources. However, some receivers need to be able to treat main and presentation video differently, so there will need to be a mechanism for receivers to reliably distinguish the types of sources. 6. Other considerations As currently defined, H.281 Far-End Camera Control [ITU.H281.1994][RFC4573] does not, in SIP-based videoconferences, support selecting among multiple remote sources (though it does in H.323 conferences controled by an MCU, which can assign terminal IDs to sources). When RTP sessions contain multiple sources, this limitation becomes pressing. (However, this problem does not appear to be in scope of the CLUE working group.) 7. Security Considerations The security considerations for multiplexed RTP do not seem to be different than for non-multiplexed RTP. 8. IANA Considerations This document makes no requests of IANA. Note to RFC Editor: please remove this section before publication as an RFC. 9. References 9.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. Lennox & Romanow Expires April 26, 2012 [Page 5] Internet-Draft RTP Usage for Telepresence October 2011 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003. 9.2. Informative References [I-D.ietf-clue-framework] Romanow, A., Duckworth, M., Pepperell, A., and B. Baldino, "Framework for Telepresence Multi-Streams", draft-ietf-clue-framework-00 (work in progress), October 2011. [I-D.ietf-clue-telepresence-requirements] Romanow, A. and S. Botzko, "Requirements for Telepresence Multi-Streams", draft-ietf-clue-telepresence-requirements-00 (work in progress), August 2011. [I-D.ietf-clue-telepresence-use-cases] Romanow, A., Botzko, S., Duckworth, M., Even, R., and I. Communications, "Use Cases for Telepresence Multi- streams", draft-ietf-clue-telepresence-use-cases-01 (work in progress), July 2011. [ITU.H281.1994] International Telecommunications Union, "A far end camera control protocol for videoconferences using H.224", ITU- T Recommendation H.281, 11 1994. [RFC4573] Even, R. and A. Lochbaum, "MIME Type Registration for RTP Payload Format for H.224", RFC 4573, July 2006. [RFC4796] Hautakorpi, J. and G. Camarillo, "The Session Description Protocol (SDP) Content Attribute", RFC 4796, February 2007. [RFC5285] Singer, D. and H. Desineni, "A General Mechanism for RTP Header Extensions", RFC 5285, July 2008. Lennox & Romanow Expires April 26, 2012 [Page 6] Internet-Draft RTP Usage for Telepresence October 2011 Authors' Addresses Jonathan Lennox Vidyo, Inc. 433 Hackensack Avenue Seventh Floor Hackensack, NJ 07601 US Email: jonathan@vidyo.com Allyn Romanow Cisco Systems San Jose, CA 95134 USA Email: allyn@cisco.com Lennox & Romanow Expires April 26, 2012 [Page 7]