Network Working Group J. Lennox Internet-Draft Vidyo Intended status: Informational K. Gross Expires: January 16, 2014 AVA S. Nandakumar G. Salgueiro Cisco Systems B. Burman Ericsson July 15, 2013 A Taxonomy of Grouping Semantics and Mechanisms for Real-Time Transport Protocol (RTP) Sources draft-lennox-raiarea-rtp-grouping-taxonomy-01 Abstract The terminology about, and associations among, Real-Time Transport Protocol (RTP) sources can be complex and somewhat opaque. This document describes a number of existing and proposed relationships among RTP sources, and attempts to define common terminology for discussing protocol entities and their relationships. This document is still very rough, but is submitted in the hopes of making future discussion productive. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on January 16, 2014. Copyright Notice Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved. Lennox, et al. Expires January 16, 2014 [Page 1] Internet-Draft RTP Grouping Taxonomy July 2013 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1. End Point . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.1. Alternate Usages . . . . . . . . . . . . . . . . . . 4 2.1.2. Characteristics . . . . . . . . . . . . . . . . . . . 4 2.2. Capture Device . . . . . . . . . . . . . . . . . . . . . 4 2.2.1. Alternate Usages . . . . . . . . . . . . . . . . . . 4 2.2.2. Characteristics . . . . . . . . . . . . . . . . . . . 5 2.3. Media Source . . . . . . . . . . . . . . . . . . . . . . 5 2.3.1. Alternate Usages . . . . . . . . . . . . . . . . . . 5 2.3.2. Characteristics . . . . . . . . . . . . . . . . . . . 5 2.4. Media Stream . . . . . . . . . . . . . . . . . . . . . . 6 2.4.1. Alternate Usages . . . . . . . . . . . . . . . . . . 6 2.4.2. Characteristics . . . . . . . . . . . . . . . . . . . 6 2.5. Media Provider . . . . . . . . . . . . . . . . . . . . . 6 2.5.1. Alternate Usages . . . . . . . . . . . . . . . . . . 7 2.5.2. Characteristics . . . . . . . . . . . . . . . . . . . 7 2.6. RTP Session . . . . . . . . . . . . . . . . . . . . . . . 7 2.6.1. Alternate Usages . . . . . . . . . . . . . . . . . . 7 2.6.2. Characteristics . . . . . . . . . . . . . . . . . . . 7 2.7. Media Transport . . . . . . . . . . . . . . . . . . . . . 8 2.7.1. Characteristics . . . . . . . . . . . . . . . . . . . 8 2.8. Rendering Device . . . . . . . . . . . . . . . . . . . . 8 2.8.1. Characteristics . . . . . . . . . . . . . . . . . . . 8 2.9. Media Renderer . . . . . . . . . . . . . . . . . . . . . 8 2.9.1. Alternate Usages . . . . . . . . . . . . . . . . . . 8 2.9.2. Characteristics . . . . . . . . . . . . . . . . . . . 9 2.10. Participant . . . . . . . . . . . . . . . . . . . . . . . 9 2.10.1. Characteristics . . . . . . . . . . . . . . . . . . 9 2.11. Multimedia Session . . . . . . . . . . . . . . . . . . . 9 2.11.1. Alternate Usages . . . . . . . . . . . . . . . . . . 9 2.11.2. Characteristics . . . . . . . . . . . . . . . . . . 10 2.12. Communication Session . . . . . . . . . . . . . . . . . . 10 2.12.1. Alternate Usages . . . . . . . . . . . . . . . . . . 10 2.12.2. Characteristics . . . . . . . . . . . . . . . . . . 10 3. Relationships . . . . . . . . . . . . . . . . . . . . . . . . 10 Lennox, et al. Expires January 16, 2014 [Page 2] Internet-Draft RTP Grouping Taxonomy July 2013 3.1. Synchronization Context . . . . . . . . . . . . . . . . . 11 3.1.1. RTCP CNAME . . . . . . . . . . . . . . . . . . . . . 12 3.1.2. Clock Source Signaling . . . . . . . . . . . . . . . 12 3.1.3. CLUE Scenes . . . . . . . . . . . . . . . . . . . . . 12 3.1.4. Implicitly via RtcMediaStream . . . . . . . . . . . . 12 3.1.5. Explicitly via SDP Mechanisms . . . . . . . . . . . . 12 3.2. Containment Context . . . . . . . . . . . . . . . . . . . 12 3.2.1. Media Stream Multiplexing . . . . . . . . . . . . . . 13 3.2.2. RTP Session Multiplexing . . . . . . . . . . . . . . 13 3.2.3. Multiple Media Sources in a WebRTC PeerConnection . . 13 3.3. Equivalence Context . . . . . . . . . . . . . . . . . . . 13 3.3.1. Simulcast . . . . . . . . . . . . . . . . . . . . . . 14 3.3.2. Layered MultiStream Transmission . . . . . . . . . . 14 3.3.3. Robustness and Repair . . . . . . . . . . . . . . . . 15 3.3.4. SDP FID Semantics . . . . . . . . . . . . . . . . . . 17 3.4. Session Context . . . . . . . . . . . . . . . . . . . . . 17 3.4.1. Point-to-Point Session . . . . . . . . . . . . . . . 18 3.4.2. Full Mesh Session . . . . . . . . . . . . . . . . . . 19 3.4.3. Centralized Conference Session . . . . . . . . . . . 20 4. Security Considerations . . . . . . . . . . . . . . . . . . . 20 5. Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . 21 6. Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . 21 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 21 8.1. Normative References . . . . . . . . . . . . . . . . . . 21 8.2. Informative References . . . . . . . . . . . . . . . . . 21 Appendix A. Changes From Earlier Versions . . . . . . . . . . . 23 A.1. Changes From Draft -00 . . . . . . . . . . . . . . . . . 23 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 23 1. Introduction The existing taxonomy of sources in RTP is often regarded as confusing and inconsistent. Consequently, a deep understanding of how the different terms relate to each other becomes a real challenge. Frequently cited examples of this confusion are (1) how different protocols that make use of RTP use the same terms to signify different things and (2) how the complexities addressed at one layer are often glossed over or ignored at another. This document attempts to provide some clarity by reviewing the semantics of various aspects of sources in RTP. As an organizing mechanism, it approaches this by describing various ways that RTP sources can be grouped and associated together. 2. Concepts Lennox, et al. Expires January 16, 2014 [Page 3] Internet-Draft RTP Grouping Taxonomy July 2013 This section defines concepts that serve to identify various components in a given RTP usage. For each concept an attempt is made to list any alternate definitions and usages that co-exist today along with various characteristics that further describes the concept. All references to ControLling mUltiple streams for tElepresence (CLUE) in this document map to [I-D.ietf-clue-framework] and all references to Web Real-Time Communications (WebRTC) map to [I-D.ietf-rtcweb-overview]. 2.1. End Point A single entity sending or receiving RTP packets. It may be decomposed into several functional blocks, but as long as it behaves as a single RTP stack entity it is classified as a single "End Point". 2.1.1. Alternate Usages The CLUE Working Group (WG) uses the terms "Media Provider" and "Media Consumer" to describes aspects of End Point pertaining to sending and receiving functionalities. 2.1.2. Characteristics End Points can be identified in several different ways. While RTCP Canonical Names (CNAMEs) [RFC3550] provide a globally unique and stable identification mechanism for the duration of the Communication Session (See Section 2.12), their validity applies exclusively within a synchronization context. Therefore, a mechanisms outside the scope of RTP, such as an application defined mechanisms, must be depended upon to ensure End Point identification when outside this synchronization context. 2.2. Capture Device The physical source of stream of media data of one type such as camera or microphone. 2.2.1. Alternate Usages The CLUE WG uses the term "Capture Device" to identify a physical capture device. WebRTC WG uses the term "Recording Device" to refer to the locally available capture devices in an end-system. Lennox, et al. Expires January 16, 2014 [Page 4] Internet-Draft RTP Grouping Taxonomy July 2013 2.2.2. Characteristics o A Capture Device is identified either by hardware/manufacturer ID or via a session-scoped device identifier as mandated by the application usage. o A Capture Device always corresponds to a Media Source (See Section 2.3 for a definition of this term) but vice-versa might not always be true. For example, in the cases of output from a media production function (i.e., an audio mixer) or a video editing function which can represent data from several Media Sources. 2.3. Media Source A Media Source logically defines the source of a raw stream of media data as generated either by a single capture device or by a conceptual source. A Media Source represents an Audio Source or a Video Source. 2.3.1. Alternate Usages The CLUE WG uses the term "Media Capture" for this purpose. A CLUE Media Capture is identified via indexed notation. The terms Audio Capture and Video Capture are used to identify Audio Sources and Video Sources respectively. Concepts such as "Capture Scene", "Capture Scene Entry" and "Capture" provide a flexible framework to represent media captured spanning spatial regions. The WebRTC WG defines the term "RtcMediaStreamTrack" to refer to a Media Source. An "RtcMediaStreamTrack" is identified by the ID attribute on it. Typically a Media Source is mapped to a single m=line via the Session Description Protocol (SDP) [RFC4566] unless mechanisms such as Source-Specific attributes are in place [RFC5576]. In the latter cases, an m=line can represent either multiple Media Sources or multiple Media Streams (See Section 2.4 for a definition of this term). 2.3.2. Characteristics o A Media Source represents a real-time source of raw stream of audio or video media data. o At any point, it can represent a physical capture source or conceptual source. Lennox, et al. Expires January 16, 2014 [Page 5] Internet-Draft RTP Grouping Taxonomy July 2013 o Typically raw media from a Media Source is compressed via the application of an appropriate encoding mechanism, thus creating an RTP payload for Media Streams (See Section 2.4 for a definition of this term). o Multiple transformations can be applied to the data from a Media Source, thus creating several Media Streams. o Some notable transformations are described in Section 3.3. 2.4. Media Stream Media from a Media Source is encoded and packetized to produce one or more Media Streams representing a sequence of RTP packets. 2.4.1. Alternate Usages The term "Stream" is used by the CLUE WG to define a encoded Media Source sent via RTP. "Capture Encoding", "Encoding Groups" are defined to capture specific details of the encoding scheme. RFC3550 [RFC3550] uses the term Source for this purpose. The equivalent mapping of Media Stream in SDP [RFC4566] is defined per usage. For example, each m=line can describe one Media Stream and hence one Media Source OR a single m=line can describe properties for multiple Media Streams (via [RFC5576] mechanisms for example). 2.4.2. Characteristics o Each Media Stream is identified by a unique Synchronization source (SSRC) [RFC3550] that is carried in every RTP and Real-time Transport Control Protocol (RTCP) packet header. o At any given point, an Media Stream can have one and only SSRC. o Each Media Stream defines a unique RTP sequence numbering and timing space. o Several Media Streams could potentially map to a single Media Source via the source transformations (See Section 3.3). o Several Media Streams can be carried over a single RTP Session. 2.5. Media Provider Lennox, et al. Expires January 16, 2014 [Page 6] Internet-Draft RTP Grouping Taxonomy July 2013 A Media Provider is a logical component within the RTP Stack that is responsible for encoding the media data from one or more Media Sources to generate RTP Payload for the outbound Media Streams. 2.5.1. Alternate Usages Within the SDP usage, an m=line describes the necessary configuration required for encoding purposes. CLUE's "Capture Encoding" provides specific encoding configuration for this purpose. WebRTC WG uses the term "RtcMediaStreamTrack" to qualify as source of the media data that is encoded via the Media Provider. 2.5.2. Characteristics o A Media Source can be multiply encoded by a given Media Provider on-the-fly by allowing various encoded representations. 2.6. RTP Session An RTP session is an association among a group of participants communicating with RTP. It is a group communications channel which can potentially carry a number of Media Streams. Within an RTP session, every participant finds out meta-data and control information (over RTCP) about all the Media Streams in the RTP session. The bandwidth of the RTCP control channel is shared within an RTP Session. 2.6.1. Alternate Usages Within the context of SDP a singe m=line can map to a single RTP Session or multiple m=lines can map to a single RTP Session. The latter is enabled via multiplexing schemes such as BUNDLE [I-D.ietf-mmusic-sdp-bundle-negotiation], for example, that allows mapping of multiple m=lines to a single RTP Session. 2.6.2. Characteristics o Typically an RTP Session can carry one ore more Media Streams, the latter is also termed "SSRC Multiplexing". o Each RTP Session is carried by a single underlying Media Transport unless multiple RTP sessions are multiplexed over a single Transport Flow. Such a scheme is alternatively called "Session Multiplexing" in the RTP context [I-D.westerlund-avtcore-transport-multiplexing]. Lennox, et al. Expires January 16, 2014 [Page 7] Internet-Draft RTP Grouping Taxonomy July 2013 o An RTP Session shares a single SSRC space as defined in RFC3550 [RFC3550]. That is, those End Points can see an SSRC identifier transmitted by any of the other End Points. An End Point can receive an SSRC either as SSRC or as a Contributing source (CSRC) in RTP and RTCP packets, as defined by the endpoints' network interconnection topology. o Multiple RTP Sessions can be related to one another via mechanisms defined in Section 3. 2.7. Media Transport A Media Transport defines an end-to-end transport association for carrying one or more RTP Sessions. The combination of a network address and port uniquely identifies such a transport association, for example an IP address and a UDP port. 2.7.1. Characteristics o Media Transport transmits RTP Packets from a source transport address to a destination transport address. o RTP may depend upon the lower-layer protocol to provide mechanism such as ports to multiplex the RTP and RTCP packets of an RTP Session. 2.8. Rendering Device Represents a physical rendering device such display or speaker. 2.8.1. Characteristics o An End Point can potentially have multiple rendering devices of each type. o Incoming Media Streams are decoded by one or more Media Renderers to provide a representation suitable for rendering the media data over one or more Rendering Devices, as defined by the application usage or system-wide configuration. 2.9. Media Renderer A Media Renderer is a logical component within the RTP Stack that is responsible for decoding the RTP Payload within the incoming Media Streams to generate media data suitable for eventual rendering. 2.9.1. Alternate Usages Lennox, et al. Expires January 16, 2014 [Page 8] Internet-Draft RTP Grouping Taxonomy July 2013 Within the context of SDP, an m=line describes the necessary configuration required to decode either one or more incoming Media Streams. The WebRTC WG uses the term "RtcMediaStreamTrack" to qualify the media data decoded via the Media Renderer corresponding to the incoming Media Stream. 2.9.2. Characteristics o The output from the Media Renderer is usually rendered to a Rendering Device via appropriate mechanisms as explained in Section 2.8 o Incoming Media Streams decoded by the Media Renderer are typically identified via the SSRC. 2.10. Participant A participant is an entity reachable by a single signaling address, and is thus related more to the signaling context than to the media context. 2.10.1. Characteristics o A single signaling-addressable entity, using an application- specific signaling address space, for example a SIP URI. o A participant can have several associated transport flows, including several separate local transport addresses for those transport flows. o A participant can have several multimedia sessions. 2.11. Multimedia Session A multimedia session is an association among a group of participants engaged in the conversation via one or more RTP Sessions. It defines logical relationships among Media Sources that appear in multiple RTP Sessions. 2.11.1. Alternate Usages RFC4566 [RFC4566] defines a multimedia session as a set of multimedia senders and receivers and the data streams flowing from senders to receivers. Lennox, et al. Expires January 16, 2014 [Page 9] Internet-Draft RTP Grouping Taxonomy July 2013 RFC3550 [RFC3550] defines it as set of concurrent RTP sessions among a common group of participants. For example, a videoconference (which is a multimedia session) may contain an audio RTP session and a video RTP session. 2.11.2. Characteristics o Participants in RTP multimedia sessions are identified via mechanisms such as RTCP CNAME or other application level identifiers as appropriate. o A multimedia session can be composed of several parallel RTP Sessions with potentially multiple Media Streams per RTP Session. o Each participant in a multimedia sessions can have multitude of Media Captures and Media Rendering devices. 2.12. Communication Session A communication session is an association among group of participants communicating with each other via a set of multimedia sessions. 2.12.1. Alternate Usages The Session Description Protocol RFC4566 [RFC4566]defines a multimedia session as a set of multimedia senders and receivers and the data streams flowing from senders to receivers. In that definition it is however not clear if a multimedia session includes both the sender's and the receiver's view of the same RTP Stream. 2.12.2. Characteristics o Each participant in a Communication Session is identified via an application-specific signaling address. o A Communication Session is composed of at least one multimedia session per participant, involving one or more parallel RTP Sessions with potentially multiple Media Streams per RTP Session. For example, in a full mesh communication, the Communication Session consists of a set of separate Multimedia Sessions between each pair of Participants. Another example is a centralized conference, where the Communication Session consists of a set of Multimedia Sessions between each Participant and the conference handler. 3. Relationships Lennox, et al. Expires January 16, 2014 [Page 10] Internet-Draft RTP Grouping Taxonomy July 2013 This section provides various relationships that can co-exist between the aforementioned concepts in a given RTP usage. Using Unified Modeling Language (UML) class diagrams [UML], Figure 1 below depicts general relations between a Media Source, its Media Provider(s) and the resulting Media Stream(s). Note: The RTCP Stream related to the RTP Stream is not shown in the figure. +--------------+ <> +-------------------------+ | Media Source |- - - - - ->| Synchronization Context | +--------------+ +-------------------------+ < > 1..* | | 0..* +--------------+ | |<>-+ 0..* | Media | | | Provider | | | |---+ 0..* +--------------+ < > 1 | | 0..* +----------------+ 0..* 1 +-------------+ | Media Stream |----------<>| RTP Session | +----------------+ +-------------+ Figure 1: Media Source Relations Media sources can have a large variety of relationships among them. These relationships can apply both between sources within a single RTP Session, and between Media Sources that occur in multiple RTP Session. Ways of relating them typically involve groups: a set of Media Sources has some relationship that applies to all those in the group, and no others. (Relationships that involve arbitrary non- grouping associations among Media sources, such that e.g., A relates to B and B to C, but A and C are unrelated, are uncommon if not nonexistent.) In many cases, the semantics of groups are not simply that the the members form an undifferentiated group, but rather that members of the group have certain roles. 3.1. Synchronization Context A synchronization context defines requirement on a strong timing relationship between the related entities, typically requiring alignment of clock sources. Such relationship can be identified in multiple ways as listed below. A single Media Source can only belong Lennox, et al. Expires January 16, 2014 [Page 11] Internet-Draft RTP Grouping Taxonomy July 2013 to a single Synchronization Context, since it is assumed that a single Media Source can only have a single media clock and requiring alignment to several Synchronization Contexts will effectively merge those into a single Synchronization Context. A single Multimedia session can contain media from one or more Synchronization Contexts. An example of that is a Multimedia Session containing one set of audio and video for communication purposes belonging to one Synchronization context, and another set of audio and video for presentation purposes (like playing a video file) that has no strong timing relationship and need not be strictly synchronized with the audio and video used for communication. 3.1.1. RTCP CNAME RFC3550 [RFC3550] describes Inter-media synchronization between RTP Sessions based on RTCP CNAME, RTP and Network Time Protocol (NTP) [RFC5905] timestamps. 3.1.2. Clock Source Signaling [I-D.ietf-avtcore-clksrc] provides a mechanism to signal the clock source in SDP, thus allowing a synchronized context to be defined. 3.1.3. CLUE Scenes In CLUE "Capture Scene", "Capture Scene Entry" and "Captures" define an implied synchronization context. 3.1.4. Implicitly via RtcMediaStream The WebRTC WG defines "RtcMediaStream" with one or more "RtcMediaStreamTracks". All tracks in a "RTCMediaStream" are intended to be synchronized when rendered. 3.1.5. Explicitly via SDP Mechanisms RFC5888 [RFC5888] defines m=line grouping mechanism called "Lip Synchronization (LS)" for establishing the synchronization requirement across m=lines when they map to individual sources. RFC5576 [RFC5576] extends the above mechanism when multiple media sources are described by a single m=line. 3.2. Containment Context A containment relationship allows composing of multiple concepts into a larger concept. Lennox, et al. Expires January 16, 2014 [Page 12] Internet-Draft RTP Grouping Taxonomy July 2013 3.2.1. Media Stream Multiplexing Multiple Media Streams can be contained within a single RTP Session via unique SSRC per Media Stream. [I-D.ietf-mmusic-sdp-bundle-negotiation] provides SDP based signaling mechanism to enable this across several m=lines. RFC5576 [RFC5576] enables the same for multiple Media Sources described in a single m=line. 3.2.2. RTP Session Multiplexing [I-D.westerlund-avtcore-transport-multiplexing], for example, describes a mechanism that allow several RTP Sessions to be carried over a single underlying Media Transport. 3.2.3. Multiple Media Sources in a WebRTC PeerConnection The WebRTC WG defines a containment object named "RTCPeerConnection" that can potentially contain several Media Sources mapped to a single RTP Session or spread across several RTP Sessions. 3.3. Equivalence Context In this relationship different instances of a concept are treated to be equivalent for the purposes of relating them to the Media Source. Figure 2 below depicts in UML notation the general relation between a Media Provider and its Media Stream(s), including the Media Stream specializations Source Stream and RTP Repair Stream. +--------------+ | |<>-+ 0..* | Media | | | Provider | | | |---+ 0..* +--------------+ < > 1 | | 0..* +--------------+ 0..* 1 +-----------------+ | Media Stream |<>-------| Media Transport | +--------------+ +-----------------+ /\ /\ +--+ +--+ | | +-------+ +-------+ | | Lennox, et al. Expires January 16, 2014 [Page 13] Internet-Draft RTP Grouping Taxonomy July 2013 +--------------+ +--------------+ 1 | Primary |<>----------| Repair |<>-+ | Stream | 1..* 0..* | Stream |---+ +--------------+ +--------------+ 0..* Figure 2: Media Stream Relations This relation can in combination with Figure 1 be used to achieve a set of functionalities, described below. 3.3.1. Simulcast A Media Source represented as multiple independent Encodings constitutes a simulcast of that Media Source. The figure below represents an example of a Media Source that is encoded into three separate simulcast streams that are in turn sent on the same transport flow. +----------------+ | Media Source | +----------------+ < > < > < > | | | +------------+ | +--------------+ | | | +----------------+ +----------------+ +----------------+ | Media Provider | | Media Provider | | Media Provider | +----------------+ +----------------+ +----------------+ < > < > < > | | | | | | +----------------+ +----------------+ +----------------+ | Media Stream | | Media Stream | | Media Stream | +----------------+ +----------------+ +----------------+ < > < > < > | | | +---------------+ | +----------------+ | | | +-------------------+ | Media Transport | +-------------------+ Figure 3: Example of Media Source Simulcast 3.3.2. Layered MultiStream Transmission Lennox, et al. Expires January 16, 2014 [Page 14] Internet-Draft RTP Grouping Taxonomy July 2013 Multi-stream transmission (MST) is a mechanism by which different portions of a layered encoding of a media stream are sent using separate Media Streams (sometimes in separate RTP sessions). MSTs are useful for receiver control of layered media. A Media Source represented as multiple dependent Encodings constitutes a Media Source that has layered dependency. The figure below represents an example of a Media Source that is encoded into three dependent layers, where two layers are sent on the same transport flow and the third layer is sent on a separate transport flow. +----------------+ | Media Source | +----------------+ < > < > < > | | | +--------------+ | +--------------+ | | | +----------------+ +----------------+ +---------------+ | Media Provider |<>-| Media Provider |<>-| Media Provider| +----------------+ +----------------+ +---------------+ < > < > < > | | | | | | +----------------+ +----------------+ +----------------+ | Media Stream | | Media Stream | | Media Stream | +----------------+ +----------------+ +----------------+ < > < > < > | | | +------+ +------+ | | | | +-----------------+ +-----------------+ | Media Transport | | Media Transport | +-----------------+ +-----------------+ Figure 4: Example of Media Source Layered Dependency 3.3.3. Robustness and Repair A Media Source may be protected by repair streams during transport. Several approaches listed below can achieve the same result o Duplication of the original Media Stream o Duplication of the original Media Stream with a time offset, o forward error correction (FEC) techniques, and. Lennox, et al. Expires January 16, 2014 [Page 15] Internet-Draft RTP Grouping Taxonomy July 2013 o retransmission of lost packets (either globally or selectively). The figure below represents an example where a Media Source is protected by a retransmission (RTX) flow. In this example the primary Media Stream and the RTP RTX Stream share the same Media Transport. +----------------+ | Media Source | +----------------+ < > | +----------------+ | Media Provider | +----------------+ < > | +---------------+ +-----------+ | Primary Media |<>-| RTX Media | | Stream | | Stream | +---------------+ +-----------+ < > < > | | +------+ +------+ | | +-----------------+ | Media Transport | +-----------------+ Figure 5: Example of Media Source Retransmission Flows The figure below represents an example where two Media Sources are protected by individual FEC flows as well as one additional FEC flow that protects the set of both Media Sources (a FEC group). There are several possible ways to map those Media Streams to one or more Media Transport, but that is omitted from the figure for clarity. +----------+ +----------+ | Media | | Media | | Source | | Source | +----------+ +----------+ < > < > | | +----------+ +----------+ | Media | | Media | | Provider | | Provider | +----------+ +----------+ < > +-------------------+ +-------------------+ < > Lennox, et al. Expires January 16, 2014 [Page 16] Internet-Draft RTP Grouping Taxonomy July 2013 | | | | | | | | < > < > | | +---------+ +--------+ +--------+ +--------+ +---------+ | Primary | | RTP | | RTP | | RTP | | Primary | | Media |<>-| FEC |-<>| FEC |<>-| FEC |-<>| Media | | Stream | | Stream | | Stream | | Stream | | Stream | +---------+ +--------+ +--------+ +--------+ +---------+ Figure 6: Example of Media Source FEC Flows 3.3.4. SDP FID Semantics RFC5888 [RFC5888] defines m=line grouping mechanism called "FID" for establishing the equivalence of Media Streams across the m=lines under grouping. RFC5576 [RFC5576] extends the above mechanism when multiple media sources are described by a single m=line. 3.4. Session Context There are different ways to construct a Communication Session. The general relation in UML notation between a Communication Session, Participants, Multimedia Sessions and RTP Sessions is outlined below. Lennox, et al. Expires January 16, 2014 [Page 17] Internet-Draft RTP Grouping Taxonomy July 2013 +---------------+ | Communication | | Session | +---------------+ 0..* < > < > 1..* | | +----------+ +--------+ 1..* | | 1..* +-------------+ 1 0..* +--------------------+ | Participant |<>----------| Multimedia Session | +-------------+ +--------------------+ < > 1 < > 1 | | 0..* | +-------------+ | | RTP Session | | +-------------+ | < > 1 | 0..* | 0..* +-----------------+ 1 0..* +--------------+ | Media Transport |--------<>| Media Stream | +-----------------+ +--------------+ Figure 7: Session Relations Several different flavors of Session can be possible. A few typical examples are listed in the below sub-sections, but many other are possible to construct. 3.4.1. Point-to-Point Session In this example, a single Multimedia Session is shared between the two Participants. That Multimedia Session contains a single RTP Session with two Media Streams from each Participant. Each Participant has only a single Media Transport, carrying those Media Streams, which is the main reason why there is only a single RTP Session. +----------------+ | Point-to-Point | | Session | +----------------+ < > < > < > | | | +------------------------+ | +------------------------+ | | | +-------------+ +--------------------+ +-------------+ | Participant |<>----------| Multimedia Session |----------<>| Participant | Lennox, et al. Expires January 16, 2014 [Page 18] Internet-Draft RTP Grouping Taxonomy July 2013 +-------------+ +--------------------+ +-------------+ < > < > < > | | | | +--------------+ +-------------+ +--------------+ | | | Media Stream |----<>| RTP Session |<>----| Media Stream | | | +--------------+ +-------------+ +--------------+ | | < > < > < > < > | | | | | | | +-----------------+ +--------------+ +--------------+ +-----------------+ | Media Transport |-<>| Media Stream | | Media Stream |<>-| Media Transport | +-----------------+ +--------------+ +--------------+ +-----------------+ Figure 8: Example Point-to-Point Session 3.4.2. Full Mesh Session In this example, the Full Mesh Session has three Participants, each of which has the same characteristics as the example in the previous section; a single Media Transport per peer Participant, resulting in a single RTP session between each pair of Participants. +-----------+ +-------------+ +-----------+ | Media |----------------<>| Participant |<>---------------| Media | | Transport | +-------------+ | Transport | +-----------+ | +-----------+ | | +------------+ | +------------+ | | < > < > | Multimedia | | | Multimedia | < > < > +--------++--------+ | Session | | | Session | +--------++--------+ | Media || Media | +------------+ | +------------+ | Media || Media | | Stream || Stream | < > | | | < > | Stream || Stream | +--------++--------+ | | | | | +--------++--------+ | | | | | | | | | | < > | < > < > < > | < > | | +---------+ +---------------+ +---------+ | +-------<>| RTP | | Full Mesh | | RTP |<>------+ +-------<>| Session | | Session | | Session |<>------+ | +---------+ +---------------+ +---------+ | | < > < > < > < > < > | | | | | | | | +--------++--------+ | | | +--------++--------+ | Media || Media | | | | | Media || Media | | Stream || Stream | | | | | Stream || Stream | +--------++--------+ | | | +--------++--------+ < > < > | | | < > < > | | | | | | | +-----------+ | | | +-----------+ | Media | | | | | Media | Lennox, et al. Expires January 16, 2014 [Page 19] Internet-Draft RTP Grouping Taxonomy July 2013 | Transport | | | | | Transport | +-----------+ +-----------------+ | +-----------------+ +-----------+ | | | +-------------+ +--------------------+ +-------------+ | Participant |<>-----------| Multimedia Session |----------<>| Participant | +-------------+ +--------------------+ +-------------+ < > < > < > | | | | +--------+ +---------+ +--------+ | | | Media |----------<>| RTP |<>----------| Media | | | | Stream | | Session | | Stream | | | +--------+ +---------+ +--------+ | | < > < > < > < > | | | | | | | +-----------+ +--------+ +--------+ +-----------+ | Media |---------<>| Media | | Media |<>---------| Media | | Transport | | Stream | | Stream | | Transport | +-----------+ +--------+ +--------+ +-----------+ Figure 9: Example Full Mesh Session 3.4.3. Centralized Conference Session Text to be provided TBD Figure 10: Example Centralized Conference Session 4. Security Considerations This document simply tries to clarify the confusion prevalent in RTP taxonomy because of inconsistent usage by multiple technologies and protocols making use of the RTP protocol. It does not introduce any new security considerations beyond those already well documented in the RTP protocol [RFC3550] and each of the many respective specifications of the various protocols making use of it. Hopefully having a well-defined common terminology and understanding of the complexities of the RTP architecture will help lead us to better standards, avoiding security problems. Lennox, et al. Expires January 16, 2014 [Page 20] Internet-Draft RTP Grouping Taxonomy July 2013 5. Acknowledgement This document has many concepts borrowed from several documents such as WebRTC [I-D.ietf-rtcweb-overview], CLUE [I-D.ietf-clue-framework], Multiplexing Architecture [I-D.westerlund-avtcore-transport-multiplexing]. The authors would like to thank all the authors of each of those documents. The authors would also like to acknowledge the insights, guidance and contributions of Magnus Westerlund, Roni Even, Colin Perkins, Keith Drage, and Harald Alvestrand. 6. Open Issues Much of the terminology is still a matter of dispute. It might be useful to distinguish between a single endpoint's view of a source, or RTP session, or multimedia session, versus the full set of sessions and every endpoint that's communicating in them, with the signaling that established them. (Sure to be many more...) 7. IANA Considerations This document makes no request of IANA. 8. References 8.1. Normative References [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003. [UML] Object Management Group, "OMG Unified Modeling Language (OMG UML), Superstructure, V2.2", OMG formal/2009-02-02, February 2009. 8.2. Informative References [I-D.ietf-avtcore-clksrc] Williams, A., Gross, K., Brandenburg, R., and H. Stokking, "RTP Clock Source Signalling", draft-ietf-avtcore- clksrc-05 (work in progress), July 2013. [I-D.ietf-clue-framework] Lennox, et al. Expires January 16, 2014 [Page 21] Internet-Draft RTP Grouping Taxonomy July 2013 Duckworth, M., Pepperell, A., and S. Wenger, "Framework for Telepresence Multi-Streams", draft-ietf-clue- framework-11 (work in progress), July 2013. [I-D.ietf-mmusic-sdp-bundle-negotiation] Holmberg, C., Alvestrand, H., and C. Jennings, "Multiplexing Negotiation Using Session Description Protocol (SDP) Port Numbers", draft-ietf-mmusic-sdp- bundle-negotiation-04 (work in progress), June 2013. [I-D.ietf-rtcweb-overview] Alvestrand, H., "Overview: Real Time Protocols for Brower- based Applications", draft-ietf-rtcweb-overview-06 (work in progress), February 2013. [I-D.westerlund-avtcore-transport-multiplexing] Westerlund, M. and C. Perkins, "Multiple RTP Sessions on a Single Lower-Layer Transport", draft-westerlund-avtcore- transport-multiplexing-05 (work in progress), February 2013. [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with Session Description Protocol (SDP)", RFC 3264, June 2002. [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session Description Protocol", RFC 4566, July 2006. [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific Media Attributes in the Session Description Protocol (SDP)", RFC 5576, June 2009. [RFC5888] Camarillo, G. and H. Schulzrinne, "The Session Description Protocol (SDP) Grouping Framework", RFC 5888, June 2010. [RFC5905] Mills, D., Martin, J., Burbank, J., and W. Kasch, "Network Time Protocol Version 4: Protocol and Algorithms Specification", RFC 5905, June 2010. [RFC6222] Begen, A., Perkins, C., and D. Wing, "Guidelines for Choosing RTP Control Protocol (RTCP) Canonical Names (CNAMEs)", RFC 6222, April 2011. Lennox, et al. Expires January 16, 2014 [Page 22] Internet-Draft RTP Grouping Taxonomy July 2013 Appendix A. Changes From Earlier Versions NOTE TO RFC EDITOR: Please remove this section prior to publication. A.1. Changes From Draft -00 o Too many to list o Added new authors o Updated content organization and presentation Authors' Addresses Jonathan Lennox Vidyo, Inc. 433 Hackensack Avenue Seventh Floor Hackensack, NJ 07601 US Email: jonathan@vidyo.com Kevin Gross AVA Networks, LLC Boulder, CO US Email: kevin.gross@avanw.com Suhas Nandakumar Cisco Systems 170 West Tasman Drive San Jose, CA 95134 US Email: snandaku@cisco.com Gonzalo Salgueiro Cisco Systems 7200-12 Kit Creek Road Research Triangle Park, NC 27709 US Email: gsalguei@cisco.com Lennox, et al. Expires January 16, 2014 [Page 23] Internet-Draft RTP Grouping Taxonomy July 2013 Bo Burman Ericsson Farogatan 6 SE-164 80 Kista Sweden Phone: +46 10 714 13 11 Email: bo.burman@ericsson.com Lennox, et al. Expires January 16, 2014 [Page 24]