Network Working Group Magnus Westerlund INTERNET-DRAFT Ericsson Expires: March 2007 Stephan Wenger Nokia September 17, 2006 RTP Topologies draft-ietf-avt-topologies-01.txt> Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Copyright Notice Copyright (C) The Internet Society (2006). Abstract This document disucsses multi-endpoint topologies commonly used in RTP based environments. In particular, centralized topologies commonly employed in the video conferencing industry are mapped to the RTP terminology. Wenger, et al. [Page 1] INTERNET-DRAFT RTP Topologies September 17, 2006 TABLE OF CONTENTS Status of this Memo................................................1 Copyright Notice...................................................1 Abstract...........................................................1 TABLE OF CONTENTS..................................................2 1. Introduction....................................................3 2. Definitions.....................................................3 2.1. Glossary...................................................3 2.2. Terminology................................................3 2.3. Topologies.................................................4 2.3.1. TOPO10: Point to Point................................4 2.3.2. TOPO20: Point to Multi-point using Multicast..........4 2.3.3. TOPO30: Point to Multipoint using the RFC 3550 translator...................................................5 2.3.4. TOPO40: Point to Multipoint using the RFC 3550 mixer model........................................................8 2.3.5. TOPO50: Point to Multipoint using video switching MCU 10 2.3.6. TOPO60: Point to Multipoint using RTCP-terminating MCU11 2.3.7. Combining Topologies.................................12 3. Security Considerations........................................13 4. IANA Considerations............................................13 5. Acknowledgements...............................................13 6. References.....................................................14 6.1. Normative references......................................14 6.2. Informative references....................................14 7. Authors' Addresses.............................................14 8. List of Changes relative to previous drafts....................15 RFC Editor Considerations.........................................16 Wenger, et al. Informational [Page 2] INTERNET-DRAFT RTP Topologies September 17, 2006 1. Introduction When working on the Codec Control Messages [CCM], we noticed a considerable confusion in the community with respect to terms such as MCU, mixer, and translator. In the process of writing, we became increasingly unsure of our own understanding, and therefore added what became the core of this draft to the CCM draft. Later, it was found that this information has its own value, and was "outsourced" from the CCM draft into the present memo. It could be argued that this document clarifies and explains sections of the RTP spec [RFC3550], and is therefore of informational nature. In this case, the present memo may end up as an informational RFC. When the Audio-Visual Profile with Feedback (AVPF) [AVPF] was developed, the main emphasis lied in the efficient support of point-to-point and small multipoint scenarios without centralized multipoint control. However, in practice, many small multipoint conferences operate utilizing devices known as Multipoint Control Units (MCUs). MCUs comprise mixers and translators (in RTP [RFC3550] terminology), but also signalling support 2. Definitions 2.1. Glossary ASM - Asynchronous Multicast AVPF - The Extended RTP Profile for RTCP-based Feedback MCU - Multipoint Control Unit PtM - Point to Multipoint PtP - Point to Point 2.2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. Wenger, et al. Informational [Page 3] INTERNET-DRAFT RTP Topologies September 17, 2006 2.3. Topologies This subsection defines several basic topologies that are relevant for codec control. The first four relate to the RTP system model utilizing multicast and/or unicast, as envisioned in RFC 3550. The last two topologies, in contrast, describe the widely deployed system model as used in most H.323 video conferences, where both the media streams and the RTCP control traffic terminate at the MCU. More topologies can be constructed by combining any of the models, see Section 2.3.7. The topologies may be referenced by a shortcut name, indicated by the prefix "Topo-". 2.3.1. Point to Point Shortcut name: Topo-Point-to-Point The Point to Point (PtP) topology (Figure 1) consists of two end- points with unicast capabilities between them. Both RTP and RTCP traffic are conveyed endpoint to endpoint using unicast traffic only (even if this unicast traffic happens to be conveyed over an IP-multicast address). +---+ +---+ | A |<------->| B | +---+ +---+ Figure 1 - Point to Point The main property of this topology is that A sends to B and only B, while B sends to A and only A. This avoids all complexities of handling multiple endpoints and combining the requirements from them. Do note that an endpoint may still use multiple RTP Synchronization Sources (SSRCs) in an RTP session. 2.3.2. Point to Multi-point using Multicast Shortcut name: Topo- Multicast Wenger, et al. Informational [Page 4] INTERNET-DRAFT RTP Topologies September 17, 2006 +-----+ +---+ / \ +---+ | A |----/ \---| B | +---+ / Multi- \ +---+ + Cast + +---+ \ Network / +---+ | C |----\ /---| D | +---+ \ / +---+ +-----+ Figure 2 - Point to Multipoint using Multicast We define Point to Multipoint (PtM) using multicast topology as a transmission model in which traffic from any participant reaches all the other participants, except for cases such as o packet loss occurs, o a participant does not wish to receive the traffic for a specific media stream, and therefore has not subscribed to the IP multicast group in question. In this sense, "traffic" encompasses both RTP and RTCP traffic. The number of participants can be between one and many -- as RTP and RTCP scales to very large multicast groups (the theoretical limit of RTP is approximately two billion participants). This draft is primarily interested in the subset of multicast session where the number of participants in the multicast group allows the participants to use early or immediate feedback as defined in AVPF. This document refers to those groups as as "small multicast groups". 2.3.3. Point to Multipoint using the RFC 3550 translator Shortcut name: Topo-Translator Two main categories of Translators can be distinguished. Transport Translators do not modify the media stream itself, but are concerned with transport parameters. Transport parameters, in the sense of this section, comprise the transport addresses to bridge different domains, and the media packetization to allow other transport protocols to be interconnected to a session (gateways). Media Translators, in contrast, modify the media stream itself. This process is commonly known as transcoding. The modification of the media stream can be as small as removing parts of the Wenger, et al. Informational [Page 5] INTERNET-DRAFT RTP Topologies September 17, 2006 stream, and can go all the way to a full transcoding utilizing a different media codec. Media translators are commonly used to connect entities without a common interoperability point. Stand-alone Media Translators are rare. Most commonly, a combination of Transport and Media Translators are used to translate both the media stream and the transport aspects of a stream between two transport domains (or clouds). Both Translator types share common attributes that separates them from mixers. For each media stream that the translator receives, it generates an individual stream in the other domain. However, a translator maintains a complete view of all existing participants between both domains. Therefore, the SSRC space is shared across the two domains. The RTCP translation process can be trivial, for example when Transport translators just need to adjust IP addresses, and can be quite complex in the case of media translators. See section 7.2 of [RFC 3550]. +-----+ +---+ / \ +------------+ +---+ | A |<---/ \ | |<---->| B | +---+ / Multi- \ | | +---+ + Cast +->| Translator | +---+ \ Network / | | +---+ | C |<---\ / | |<---->| D | +---+ \ / +------------+ +---+ +-----+ Figure 3 - Point to Multipoint using a Translator Figure 3 depicts an example of a Transport Translator performing at least IP address translation. It allows the (non multicast capable) participants B and D to take part in a multicasted session by having the translator forward their unicast traffic to the multicast addresses in use, and vice versa. It must also forward B's traffic to D and vice versa, to provide each of B and D with a complete view of the session. If B were behind a limited link, the translator may perform media transcoding to allow the traffic received from the other participants to reach B without overloading the link. When in the example depicted in Figure 3 the translator acts only as a Transport Translator, then the RTCP traffic can simply be forwarded, similar to the media traffic. However, when media Wenger, et al. Informational [Page 6] INTERNET-DRAFT RTP Topologies September 17, 2006 translation occurs, the translator's task becomes substantially more complex even with respect to the RTCP traffic. In this case, the translator needs to rewrite B's RTCP receiver report, before forwarding them to D and the multicast network. The rewriting is needed as the stream received by B is not the same stream as the other participants receive. For example, the number of packets transmitted to B may be lower than what D receives, due to the different media format. Therefore, if the receiver reports were forwarded without changes, the extended highest sequence number would indicate that B were substantially behind in reception -- while it most likely it would not be. Therefore, the translator must translate that number to a corresponding sequence number for the stream the translator received. Similar arguments can be made for most other fields in the RTCP receiver reports. As specified in Section 7.1 of [RFC3550] the SSRC space is common for all participants in the session, independent of which side they are of the translator. Thus it is the responsibility of the participants to run SSRC collision detection, and the SSRC a field the translator should not change. +---+ +------------+ +---+ | A |<---->| Multipoint |<---->| B | +---+ | Control | +---+ | Unit | +---+ | (MCU) | +---+ | C |<---->| |<---->| D | +---+ +------------+ +---+ Figure 4 - MCU with RTP Translator (relay) with only unicast links A common MCU scenario is the one depicted in Figure 4. Herein, the MCU connects multiple users of a conference through unicast. This can be implemented using a very simple transport translator, which could be called a relay. The relay forwards all traffic it receives, both RTP and RTCP, to all other participants. In doing so, a multicast network is emulated without relying on a multicast capable network structure. A translator normally does not use an SSRC of its own, and is not visible as an active participant in the session. However, it may act as a media receiver, thus have an SSRC, and use RTCP to report reception statistics. However this behavior should only be used when it is really desirable to have this feedback, i.e. having it act as special type of quality monitor. It also needs to be noted that the translator, in some cases, may act on behalf of the "real" source and respond to codec control messages. in his capacity as media translator. This for example Wenger, et al. Informational [Page 7] INTERNET-DRAFT RTP Topologies September 17, 2006 may occur if a receiver requests a bandwidth reduction, and the media translator has not detected any congestion or other reasons for bandwidth reduction between the media source and itself. In that case, a translator should be able to react to codec control messages, as it is capable of fulfilling the request on behalf of the media sender. If it wouldn't react to codec control, and therefore couldn't fullfil the request, the media quality in the media senders domain would suffer. 2.3.4. Point to Multipoint using the RFC 3550 mixer model Shortcut name: Topo-Mixer A mixer is a middlebox that aggregates multiple RTP streams that are part of a session, by mixing the media data and generating a new RTP stream. One common application for a mixer is to allow a participant to receive a session with a reduced amount of resources. +-----+ +---+ / \ +-----------+ +---+ | A |<---/ \ | |<---->| B | +---+ / Multi- \ | | +---+ + Cast +->| Mixer | +---+ \ Network / | | +---+ | C |<---\ / | |<---->| D | +---+ \ / +-----------+ +---+ +-----+ Figure 5 - Point to Multipoint using RFC 3550 mixer model A mixer can be viewed as a device terminating the media streams received from other session participants. Using the media data from the received media streams, a mixer generates a media stream that is sent to the session participant. The content that the mixer provides is the mixed aggregate of what the mixer receives from the PtP or PtM links, which are part of the same conference session. The mixer is the content source, as it mixes the content (often in the uncompressed domain) and then encodes it for transmission to a participant. The CC and CSRC fields in the RTP header are used to indicate the contributors of to the newly generated stream. The SSRCs of the to-be-mixed streams on the mixer input appear as the CSRCs at the mixer output. That output stream uses a new SSRC that identifies the Mixer. The CSRC are forwarded between the two domains to allow for loop detection and identification of sources Wenger, et al. Informational [Page 8] INTERNET-DRAFT RTP Topologies September 17, 2006 that are part of the global session. Note that Section 7.1 of RFC 3550 requires the SSRC space to be shared between domains for these reasons. The mixer is responsible for generating RTCP packets in accordance with its role. It is a receiver and should therefore send reception reports for the media streams it receives. As a media sender itself it should also generate sender report for those media streams sent. The content of the SRs created by the mixer may or may not take into account the situation on its receiving side. Similarly, the content of RRs created by the mixer may or may not be based on the situation on the mixer's sending side. This is left open to the implementation. As specified in Section 7.3 of RFC 3550, a mixer must not forward RTCP unaltered between the two domains. The mixer depicted in Figure 5 has three domains that needs to be separated; the multicast network, participant B and participant D. The Mixer produces different mixed streams to B and D, as the one to B may contain D and vice versa. However the mixer does only need one SSRC in each domain that is the receiving entity and transmitter of mixed content. In the multicast domain, the mixer does not need to provide a mixed view of the other domains and will commonly only forward the media from B and D into the multicast network using B's and D's SSRC. The mixer is responsible for receiving the codec control messages and handles them appropriately. The definition of "appropriate" depends on the message itself and the context. In some cases, the reception of a codec control message may result in the generation and transmission of codec control messages by the mixer to the participants in the other domain. In other cases, a message is handled by the mixer itself and therefore not forwarded to any other domains. It should be noted that this form of mixing technology is not widely deployed. Most multipoint video conferences used today employ one of the models discussed in the next sections. When replacing the multicast network in Figure 5 (to the left of the mixer) with individual unicast links as depicted in Figure 6, the mixer model is very similar to the one discussed in section 2.3.6 below. Wenger, et al. Informational [Page 9] INTERNET-DRAFT RTP Topologies September 17, 2006 +---+ +------------+ +---+ | A |<---->| Multipoint |<---->| B | +---+ | Control | +---+ | Unit | +---+ | (MCU) | +---+ | C |<---->| |<---->| D | +---+ +------------+ +---+ Figure 6 - RTP Mixer with only unicast links 2.3.5. Point to Multipoint using video switching MCU Shortcut name: Topo- Video-switch-MCU +---+ +------------+ +---+ | A |------| Multipoint |------| B | +---+ | Control | +---+ | Unit | +---+ | (MCU) | +---+ | C |------| |------| D | +---+ +------------+ +---+ Figure 7 - Point to Multipoint using relaying MCU This PtM topology is, today, still deployed, although the RTCP- terminating MCUs, as discussed in the next section, are perhaps more common.. this topology, as well as the following one, reflect today's lack of wide availability of IP multicast technologies , as well as the simplicity of content switching when compared to content mixing. The technology is commonly implemented in what is known as "Video Switching MCUs". A video switching MCU forwards to a participant a single media stream, selected from the available streams. The criteria for selection are often based on voice activity in the audio-visual conference, but other conference management mechanisms (like presentation mode or explicit floor control) are known to exist as well. The video switching MCU may also perform media translation to modify the content in bit-rate, encoding, resolution; however it still may indicate the original sender of the content through the SSRC. In this case the values of the CC and CSRC fields are retained. Wenger, et al. Informational [Page 10] INTERNET-DRAFT RTP Topologies September 17, 2006 If not terminating RTP, the RTCP Sender Reports are forwarded for the currently selected sender. All RTCP receiver reports are freely forward between the participants. In addition, the MCU may also originate RTCP control traffic in order to control the session and/or report on status from its viewpoint. The video switching MCU has mostly the attributes of a translator. However its stream selection is a mixing behaviour. This behaviour has some RTP and RTCP issues associated with it. The suppression of all but one media stream results in that most participants see only a subset of the sent media streams at any given time; often a single stream per conference. Therefore, RTCP receiver reports only report on these streams. In consequence, the media senders that are not currently forwarded receive a view of the session that indicates their media streams disappearing somewhere en route. This makes the use of RTCP for congestion control very problematic. To avoid these issues the MCU needs to modify the RTCP RRs. 2.3.6. Point to Multipoint using RTCP-terminating MCU Shortcut name: Topo-RTCP-terminating-MCU +---+ +------------+ +---+ | A |<---->| Multipoint |<---->| B | +---+ | Control | +---+ | Unit | +---+ | (MCU) | +---+ | C |<---->| |<---->| D | +---+ +------------+ +---+ Figure 8 - Point to Multipoint using content modifying MCU In this PtM scenario, each participant runs an RTP point-to-point session between itself and the MCU, this is the mostly deployed topology. The content that the MCU provides to each participant is either: a) A selection of the content received from the other participants, or b) The mixed aggregate of what the MCU receives from the other PtP links, which are part of the same conference session. In case a) the MCU may modify the content in bit-rate, encoding, resolution. No explicit RTP mechanism is used to establish the relationship between the original media sender and the version the MCU sends. In other words, the outgoing session typically uses a Wenger, et al. Informational [Page 11] INTERNET-DRAFT RTP Topologies September 17, 2006 different SSRC, and may well use a different PT, even if this different PT happens to be mapped to the same media type. (This is the definition of this topology and distinguishes it from the topologies previously discussed). In case b) the MCU is the content source as it mixes the content and then encodes it for transmission to a participant. The participant's content that is included in the aggregated content is not indicated through any explicit RTP mechanism. For example, regardless of the number of streams that are aggregated, in the MCU generated streams CC is zero and therefore no CSRC fields are present (this is true for most shipping MCUS). The participants contributing to the mix are reported using signalling mechanism like conference event package in SIP. The MCU is responsible for receiving the codec control messages and handle them appropriately. In some cases, the reception of a codec control message may result in the generation and transmission of codec control messages by the MCU to some or all of the other participants. An MCU may transparently relay some codec control messages and intercept, modify, and (when appropriate) generate codec control messages of its own and transmit them to the media senders. The main feature that sets this topology apart from what RFC 3550 describes, is the lack of an explicit RTP level indication of all participants. If one were using the mechanisms available in RTP and RTCP to signal this explicitly, the topology would follow the approach of an RTP mixer. The lack of explicit indication has at least the following potential problems: 1) Loop detection cannot be performed on the RTP level. When carelessly connecting two misconfigured MCUs, a loop could be generated. 2) There is no information about active media senders available in the RTP packet. As this information is missing, receivers cannot use it. It also deprive the participant's clients information about who are actively sending in a machine usable way. Thus preventing clients from doing indication of currently active speakers in user interfaces, etc. It is known in the signaling layer. 2.3.7. Combining Topologies Topologies can be combined and linked to each other using mixers or translators. Care must however be taken to how the SSRC space is handled, mixers separate the SSRC space into two parts, while Wenger, et al. Informational [Page 12] INTERNET-DRAFT RTP Topologies September 17, 2006 translators maintain the space across themselves. Any hybrid, like the video switching MCU, 2.3.5, requires considerable afterthought on how RTCP is dealt with. But do note that the SSRC uniquenss always needs to global across the different domains. 3. Security Considerations The usage of mixers and translators do have impact on security and the security functions used. The primary issue is that both mixers and translators do modify packets, thus preventing the usage of integrity and source authentication unless they are a trusted device which takes part of the security context. If encryption is employed the media translator and mixers will need to be able to decrypt the media to perform its function. A transport translator may be used without access to the security association in cases they touches parts that are not included in the integrity protection, for example IP address and UDP port numbers in a media stream using SRTP [RFC3711]. However in general the translator or mixer needs to be part of the signalling context and get the necessary security associations established with its RTP session participants. Including the mixer and translator in the security context allows the entity if subverted or misbehaving to perform a number of very serious attacks as it has full access. It can perform all the attacks possible, see RFC 3550 and any applicable profiles, as if the media session was not protected at all, while giving the impression to the session participants that they are protected against them. 4. IANA Considerations This document specifies no actions for IANA. 5. Acknowledgements The authors would like to thank N.N. Wenger, et al. Informational [Page 13] INTERNET-DRAFT RTP Topologies September 17, 2006 6. References 6.1. Normative references [AVPF] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, "Extended RTP Profile for Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July 2006. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003. [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video Conferences with Minimal Control", STD 65, RFC 3551, July 2003. [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC 3711, March 2004. 6.2. Informative references Any 3GPP document can be downloaded from the 3GPP web server, "http://www.3gpp.org/", see specifications. 7. Authors' Addresses Magnus Westerlund Ericsson Research Ericsson AB SE-164 80 Stockholm, SWEDEN Phone: +46 8 7190000 EMail: magnus.westerlund@ericsson.com Stephan Wenger Nokia Corporation P.O. Box 100 FIN-33721 Tampere FINLAND Phone: +358-50-486-0637 EMail: stewe@stewe.org Wenger, et al. Informational [Page 14] INTERNET-DRAFT RTP Topologies September 17, 2006 8. List of Changes relative to previous drafts Full Copyright Statement Copyright (C) The Internet Society (2006). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. Wenger, et al. Informational [Page 15] INTERNET-DRAFT RTP Topologies September 17, 2006 The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. RFC Editor Considerations The RFC editor is requested to replace all occurrences of XXXX with the RFC number this document receives. Wenger, et al. Informational [Page 16]