MMUSIC Working Group T. Schierl Internet Draft Document: draft-schierl-mmusic-layered-codec-01 Expires: April 2007 October 2006 Signaling of layered and multi description media in Session Description Protocol (SDP) Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on April 23, 2007. Copyright Notice Copyright (C) The Internet Society (2006). Abstract This memo defines semantics that allow for signaling decoding dependency of different media descriptions with the same media type in the Session Description Protocol (SDP). This is required, for example, if media data is separated and transported in different network streams as a result of the use of a layered media coding process. INTERNET-DRAFT draft-schierl-mmusic-layered-codec-01 October 2006 A new grouping type "DDP" -- decoding dependency -- is defined, to be used in conjunction with RFC 3388 entitled "Grouping of Media Lines in the Session Description Protocol". In addition, an attribute is specified describing the relationship of the media streams in a "DDP" group. Finally, this memo defines SDP semantics indicating SSRC multiplexing for media sessions in case RTP is used as the protocol for media transport. [Edt. note: This is one of the key questions: should this draft address RTP specifics? Should it address a concept that may make sense in niche applications for SVC, but perhaps no where else? Or should we move the SSRC stuff to the SVC payload spec instead?] Schierl Standards Track [page 2] INTERNET-DRAFT draft-schierl-mmusic-layered-codec-01 October 2006 Table of Content 1. Introduction.................................................4 2. Terminology..................................................4 3. Motivation and use cases.....................................4 3.1. Motivation for media dependency signaling..................4 3.2. Use cases for layered and MDC coding and transport.........6 4. Signaling in SDP for media dependency........................6 4.1. Design Principles..........................................6 4.2. Definitions................................................7 4.3. Semantics..................................................8 4.3.1. SDP grouping semantics for decoding dependency............8 4.3.2. Attribute for dependency signaling per media-stream.......8 4.3.3. Attribute for signaling implicit SSRC multiplexing........9 5. Usage of new semantics in SDP...............................10 5.1.1. Usage with the SDP Offer/Answer Model....................10 5.1.2. Network elements not supporting dependency signaling.....10 5.2. Examples..................................................10 6. Security Considerations.....................................12 7. IANA Consideration..........................................12 8. Acknowledgements............................................12 9. References..................................................12 9.1. Normative References......................................12 9.2. Informative References....................................13 10. Author's Addresses..........................................13 11. Intellectual Property Statement.............................13 12. Disclaimer of Validity......................................14 13. Copyright Statement.........................................14 14. RFC Editor Considerations...................................14 15. Open Issues.................................................14 16. Changes Log.................................................14 Schierl Standards Track [page 3] INTERNET-DRAFT draft-schierl-mmusic-layered-codec-01 October 2006 1. Introduction An SDP session description may contain one or more media descriptions, each identifying a single media stream. A media description is identified by one "m=" line. If more than one "m=" line exist, indicating the same media type, a receiver or network element cannot possibly identify an existing relationship between those "m=" lines. This is certainly the case if the receiver or network element is not aware of the media specific information, which may be carried within in the "fmtp:" attribute. Recently, an interest has been expressed to signal relationships of media streams. Different reasons can be envisioned, for example the transporting of bitstream partitions of a hierarchical media coding process (also known as layered media coding process) or of a multi description coding (MDC) in different network streams. Trigger for this draft has been the standardization process of the SVC payload format [SVCpayld]. At present, SDP does not allow for signaling such relations. This memo also defines signaling extensions to be specifically used with SSRC multiplexing techniques in case using RTP as transport protocol. 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14, RFC 2119 [RFC2119]. 3. Motivation and use cases 3.1. Motivation for media dependency signaling There may be various reasons for the concurrent transport of various media (as identified by a media description) of the same media type, among which certain dependencies may exist. But the basic idea for all cases is the separation of partitions of a media bitstream to allow scalability in network elements. Two types of dependency are discussed in the following in more detail, as they are conceptually well understood: o Layered/Hierarchical decoding dependency: Schierl Standards Track [page 4] INTERNET-DRAFT draft-schierl-mmusic-layered-codec-01 October 2006 In layered coding, the partitions of a media bitstream are known as media layers or simply layers. One or more layers may be transported in different network streams. A classic use case is known as receiver-driven layered multicast, in which a receiver selects a combination of media streams conveyed in their own (in this case RTP- ) session in response to quality or bit-rate requirements. Back in the mid 1990s, the then available layered media formats and codecs envisioned primarily (or even exclusively) a one-dimensional hierarchy of layers. That is, each so-called enhancement layer referred to exactly one layer "below". The single exception has been the base layer, which is self-contained. Therefore, an identification of one enhancement layer fully specifies the operation point of a layered decoding scheme, including knowledge about all the other layers that need to be decoded. [RFC4456] contains rudimentary support for exactly this use case and media formats, in that it allows for signaling a range of transport addresses for a certain media description. By definition, a higher transport address identifies a higher layer in the one-dimensional hierarchy. A receiver needs only to decode data conveyed over this transport address and lower transport addresses to decode this operation point of the scalable bit stream. Newer media formats depart from this simple one-dimensional hierarchy, in that highly complex (at least-tree-shaped) dependency hierarchies can be implemented. Compelling use cases for these complex hierarchies have been identified by industry as well. Support for it is therefore desirable. However, SDP, in its current form does not take into account that different combination of a layered media bitstream result in different operation points (represented by a layer or a combination of layers) of the media bitstream. o Multi descriptive decoding dependency: In the most basic form of multiple descriptive coding (MDC), each partition forms an independent representation of the media. That is, decoding of any of the partition yields useful reproduced media data. When more than one partition is available, then a decoder can process them jointly, and the resulting media quality increases. The highest reproduced quality is available if all original partitions are available for decoding. More complex forms of multiple descriptive coding can also be envisioned, i.e. where, as a minimum, N out of M total partitions need to be available to allow meaningful decoding. Schierl Standards Track [page 5] INTERNET-DRAFT draft-schierl-mmusic-layered-codec-01 October 2006 MDC has not yet been embraced heavily by the media standardization community, though it is subject of a lot of academic research. As an example, we refer to [MDC]. 3.2. Use cases for layered and MDC coding and transport o Receiver driven layered multicast This technology is discussed in [RFC3550] and references therein. We refrain from elaborating further; the subject is well known and understood. o Multiple end-to-end transmission with different properties Assume a unicast (point-to-point) topology, wherein one endpoint sends media to another. Assume further that different forms of media transmission are available. The difference may lie in the cost of the transmission (free, charged), in the available protection (unprotected/secure), in the quality of service (guaranteed quality / best effort) or other factors. Layered and MDC coding allows to match the media characteristics to the available transmission path. For example, in layered coding it makes sense to convey the base layer over high QoS and/or over an encrypted transmission path. Enhancement layers, on the other hand, can be conveyed over best effort, as they are "optional" in their characteristic -- nice to have, but non-essential for media consumption. Similarly, while it is essential that the base layer is encrypted, there is (at least conceptually) no need to encrypt the enhancement layer, as the enhancement layer may be meaningless without the (encrypted) base layer. In a different scenario, the base layer may be offered in a non-encrypted session as a free preview. And an encrypted enhancement layer allowing optimal quality play-back may be only accessible for users activated by a conditional access mechanism. o Differentiation on transport level within a media stream (e.g. RTP session): An application may benefit from a more detailed differentiation on transport level. This may particularly be the case, if using RTP with SSRC multiplexing as described in section 13.5 of [SVCpayld]. 4. Signaling in SDP for media dependency 4.1. Design Principles The dependency signaling is only feasible between media descriptions described with a "m="-line and with an assigned media identification attribute ("mid") defined in RFC3388. Schierl Standards Track [page 6] INTERNET-DRAFT draft-schierl-mmusic-layered-codec-01 October 2006 If an application requires SSRC multiplexing to be used, this memo describes a media level attribute for signaling the use of this RTP multiplexing type. 4.2. Definitions Media stream: As used in [RFC4456]. Media bitstream: A valid, decodable stream, containing ALL media partitions generated by the encoder. A media bitstream normally conforms to a media coding standard. Media partition: A subset of a media bitstream indented for independent transportation. An integer number of partitions form a media bitstream. In layered coding, a media partition represents a layer. In MDC coding, a media partition represents a description. Decoding dependency: The class of relationship media partitions have to each other. At present, this memo defines two decoding dependencies: layering and multiple description. Hierarchical/layered coding dependency: Each media partition is only useful (i.e. can be decoded) when ALL media partitions it depends on are available. The dependencies between the media partitions create a directed graph. Note: normally, in layered/hierachical coding, the more media partitions are employed (following the rule above), the better the reproduced quality evolves. Multi description coding (MDC) dependency: N of M media partitions are required to form a valid media bitstream, but there is no hierarchy between these media partitions. Most MDC schemes aim at an increase of reproduced media quality when more media partitions are decoded than necessarily required to form an Operation Point. Operation point: A subset of a layered or MDC media bitstream that includes all partitions required for reconstruction at a certain point of quality or error resilience, and does not include any other Media Partitions. The following terms are itemized for clarification on RTP [RFC3550] multiplexing techniques. Further discussion can be found in section 5.2 of [RFC3550]. Schierl Standards Track [page 7] INTERNET-DRAFT draft-schierl-mmusic-layered-codec-01 October 2006 Session multiplexing: The scalable SVC bitstream is distributed onto different RTP sessions, whereby each RTP session carries one RTP packet stream. Each RTP session requires a separate signaling and has a separate Timestamp, Sequence Number, and SSRC space. Dependency between sessions MUST be signaled according to this memo. SSRC multiplexing: The scalable SVC bitstream is distributed in a single RTP session, but that session comprises more than one RTP packet stream, identified by its SSRC. The use of SSRC multiplexing MUST be signaled according to this memo. 4.3. Semantics 4.3.1. SDP grouping semantics for decoding dependency This specification defines the new grouping semantics Decoding Dependency "DDP": DDP associates a media stream, identified by its mid attribute, with a DDP group. Each media stream MUST be composed of an integer number of media partitions. All media streams of a DDP group MUST have the same type of coding dependency (as signaled by attribute defined in 4.3.2) and MUST belong to one media bitstream. All media streams MUST be part of at least one operation point. The DDP group type informs a receiver about the requirement for treating the media streams of the group according to the new media level attribute "depend", as defined in 4.3.2. 4.3.2. Attribute for dependency signaling per media-stream This memo defines a new media-level value attribute, "depend", with the following BNF [RFC2234]. The "identification-tag" (if used) is defined in [RFC3388]: depend-attribute = "a=depend:" dependency-type-tag *(space identification-tag) dependency-type-tag = dependency dependency = "lay" / "mdc" / "lay-ssrc" The "depend"-attribute describes the decoding dependency. The "depend"-attribute may be followed by a sequence of identification- tag(s) which identify the directly related media streams. The attribute MAY be used with multicast as well as with unicast transport addresses. The following types of dependencies are defined: Schierl Standards Track [page 8] INTERNET-DRAFT draft-schierl-mmusic-layered-codec-01 October 2006 o lay: Layered decoding dependency -- identifies the described media stream as one or more partitions of a layered media bitstream. When lay is used, all media streams MUST be identified by the following identification-tag(s) that are required for a successful use of the media stream. The identification-tag(s) MUST be present when lay is in use. Further the described media stream represents one operation point of the layered media bitstream. As a result, all other media streams belonging to the same dependency group, but not identified by an identification-tag in the media description, are not required for a successful reproduction of the operation point. Hence, a media sender MAY omit sending them when that is advantageous from a scalability or transport viewpoint. o lay-ssrc: Layered decoding dependency in media stream. This attribute indicates the presence of hierarchical relationship within the media stream. For more details refer to section 4.3.3. This value MUST NOT be used with an identification-tag. o mdc: Multi descriptive decoding dependency -- signals that the described media stream is or one more partitions of a multi description coding (MDC) media bitstream. By definition, at least N out of M streams of the group MUST be received for allowing decoding the media, whereby N and M are media stream dependent and not signaled. Receiving more than one media stream of the group may enhance the decodable quality of the media bitstream. This type of dependency does not require the signaling of the depended media streams. 4.3.3. Attribute for signaling implicit SSRC multiplexing This specification defines a new media-level value attribute, "ssrcmux". Therefore the formatting in SDP is described by the following BNF [RFC2234]. ssrcmux-attribute = "a=ssrcmux:" 1*DIGIT The "ssrcmux" attribute indicates that implicit SSRC multiplexing is used. Therefore the transport protocol type of the media MUST be RTP [RFC3551] and the RTP profile MUST be any of RTP/AVP [RFC3551], RTP/SAVP [RFC3711], RTP/AVPF [RFC4585], or RTP/SAVPF [SAVPF]. Implicit SSRC multiplexing implies that layers or combination of layers are conveyed in their own respective RTP transport stream within the same RTP session. The dependency order, from higher to lower important layers, is indicated by SSRC values -- the higher the importance of a layer is, the higher its SSRC value is. The number following the "ssrcmux"-attribute indicates the number SSRCs values used, and therefore the number of different RTP packet streams within a media description. This attribute SHALL be used in combination with a=depend:lay-ssrc attribute only. Schierl Standards Track [page 9] INTERNET-DRAFT draft-schierl-mmusic-layered-codec-01 October 2006 This signaling SHALL NOT be used with multicast transport addresses. 5. Usage of new semantics in SDP 5.1.1. Usage with the SDP Offer/Answer Model If an Answerer does not understand the decoding dependency signaling, it SHOULD be able detect the 'base' media only for a layered media session or SHOULD be able to detect only one partition of MDC media session. That is, the session description MUST offer a backward compatible partition of the media stream with a separate media description. This media description may point to the same transport address as used for an extended media session description using the features defined in this memo. Thus for both described cases, an Answerer may not understand the full media description, but may be able to request a valid sub-set of the offered media. If an Offerer is not able to interpret the decoding dependency signaling, the Offerer SHALL NOT offer the features defined in this memo. 5.1.2. Network elements not supporting dependency signaling Network elements that do not understand the new grouping type, but understand grouping in general, MAY detect a general requirement of treating the media streams of the group in a certain way. Network elements that do not understand the decoding dependency signaling MAY treat all media streams of a session in the same way or MAY use their knowledge about the media format description for treatment of media streams, if such knowledge does exist. Receivers that do not understand the signaling defined in this memo may detect a subset of the separated media only, thus the receiver may not understand the full media description, but may be able to understand and/or request a subset of the media. 5.2. Examples a.) Example for signaling transport of operation points of a layered video bitstream in different network streams: v=0 o=svcsrv 289083124 289083124 IN IP4 host.example.com s=LAYERED VIDEO SIGNALING Seminar t=0 0 c=IN IP4 224.2.17.12/127 a=group:DDP 1 2 3 4 Schierl Standards Track [page 10] INTERNET-DRAFT draft-schierl-mmusic-layered-codec-01 October 2006 m=video 40000 RTP/AVP 94 b=AS:96 a=framerate:15 a=rtpmap:94 h264/90000 a=mid:1 m=video 40002 RTP/AVP 95 b=AS:64 a=framerate:15 a=rtpmap:95 svc1/90000 a=mid:2 a=depend:lay 1 m=video 40004 RTP/AVP 96 b=AS:128 a=framerate:30 a=rtpmap:96 svc1/90000 a=mid:3 a=depend:lay 1 m=video 40004 RTP/SAVP 100 c=IN IP4 224.2.17.13/127 b=AS:512 k=uri:conditional-access-server.example.com a=framerate:30 a=rtpmap:100 svc1/90000 a=mid:4 a=depend:lay 1 3 b.) Example for signaling transport of streams of a multi description (MDC) video bitstream in different network streams: v=0 o=mdcsrv 289083124 289083124 IN IP4 host.example.com s=MULTI DESCRIPTION VIDEO SIGNALING Seminar t=0 0 c=IN IP4 224.2.17.12/127 a=group:DDP 1 2 3 m=video 40000 RTP/AVP 94 a=mid:1 a=depend:mdc m=video 40002 RTP/AVP 95 a=mid:2 a=depend:mdc m=video 40004 RTP/AVP 96 Schierl Standards Track [page 11] INTERNET-DRAFT draft-schierl-mmusic-layered-codec-01 October 2006 c=IN IP4 224.2.17.13/127 a=mid:3 a=depend:mdc c.) Example for signaling implicit SSRC multiplexing for an RTP session containing three RTP packet streams: v=0 o=svcsrv 289083124 289083124 IN IP4 host.example.com s=LAYERED SSRC MUX VIDEO SIGNALING Seminar t=0 0 c=IN IP4 131.160.1.112 m=video 40000 RTP/AVP 96 b=AS:512 a=framerate:30 a=rtpmap:96 svc1/90000 a=ssrcmux:3 a=depend:lay-ssrc 6. Security Considerations 7. IANA Consideration 8. Acknowledgements Funding for the RFC Editor function is currently provided by the Internet Society. Further, the author Thomas Schierl of Fraunhofer HHI is sponsored by the European Commission under the contract number FP6-IST-0028097, project ASTRALS. 9. References 9.1. Normative References [RFC4456] Handley, M., Jacobson, V, and C. Perkins, "SDP: Session Description Protocol", IETF work in progress, July 2006. [RFC3388] Camarillo, G., Holler, J., and H. Schulzrinne, "Grouping of Media Lines in the Session Description Protocol (SDP)", RFC 3388, December 2002. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. Schierl Standards Track [page 12] INTERNET-DRAFT draft-schierl-mmusic-layered-codec-01 October 2006 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003. [RFC2234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", RFC 2234, November 1997 [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video Conferences with Minimal Control", STD 65, RFC 3551, July 2003. [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC 3711, March 2004. [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, "Extended RTP Profile for Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July 2006 [SAVPF] Ott, J., and E. Carrara, "draft-ietf-avt-profile-savpf- 08.txt", October 2006 9.2. Informative References [SVCpayld] Wenger,S., Wang, Y.-K., and T. Schierl, "RTP Payload Format for SVC Video", "draft-wenger-avt-rtp-svc-03.txt", October 2006 [RFC3984] Wenger, S., Hannuksela, M., Stockhammer, T., Westerlund, M. and D. Singer, "RTP Payload Format for H.264 Video", RFC 3984, February 2005 [MDC] Vitali, A., Borneo, A., Fumagalli, M., and R. Rinaldo, "Video over IP using Standard-Compatible Multiple Description Coding: an IETF proposal", Packet Video Workshop, April 2006, Hangzhou, China 10. Author's Addresses Thomas Schierl Phone: +49-30-31002-227 Fraunhofer HHI Email: schierl@hhi.fhg.de Einsteinufer 37 D-10587 Berlin Germany 11. Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be Schierl Standards Track [page 13] INTERNET-DRAFT draft-schierl-mmusic-layered-codec-01 October 2006 found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. 12. Disclaimer of Validity This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 13. Copyright Statement Copyright (C) The Internet Society (2006). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. 14. RFC Editor Considerations none Schierl Standards Track [page 14]