CLUE A. Romanow Internet-Draft F. Andreasen Intended status: Standards Track A. Krishna Expires: September 6, 2012 Cisco Systems March 5, 2012 Investigation of Session Description Protocol (SDP) Usage for ControLling mUltiple streams for tElepresence (CLUE) draft-romanow-clue-sdp-usage-00 Abstract This draft investigates how SDP Offer Answer can be used to communicate CLUE encoding and encoding group parameters. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on September 6, 2012. Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Romanow, et al. Expires September 6, 2012 [Page 1] Internet-Draft SDP Usage for CLUE March 2012 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Background . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4. Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . 4 5. Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 6. Encoding group . . . . . . . . . . . . . . . . . . . . . . . . 6 7. Solution overview . . . . . . . . . . . . . . . . . . . . . . 8 8. Proposal for advertising Encoding and Encoding groups in SDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 9. Representation of Encoding and Encoding group in SDP . . . . . 12 10. Concerns about conveying CLUE parameters via SDP . . . . . . . 13 11. Security Considerations . . . . . . . . . . . . . . . . . . . 14 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 14 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 15 14.1. Normative References . . . . . . . . . . . . . . . . . . 15 14.2. Informative References . . . . . . . . . . . . . . . . . 15 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 15 Romanow, et al. Expires September 6, 2012 [Page 2] Internet-Draft SDP Usage for CLUE March 2012 1. Introduction This draft discusses how SDP can be used to carry CLUE, ControLling mUltiple streams for tElepresence, encoding and encoding group descriptions. CLUE specifies how multiple streams can be communicated in a telepresence conference in a standardized manner allowing interoperability. This draft assumes a familiarity with the CLUE framework document and the concepts defined therein [CLUEFRWK]. The media provider describes to the media consumer both media captures and encodings that it is capable of sending, and the consumer "configures" the provider to send the captures using the potential streams it wants to receive. In order for an interactive telepresence session to be established between A and B, each party needs to configure its role as both a media provider and a media consumer. Encoding parameters are maxBandwidth, maxMbps, maxFps, maxWidth, maxHeight. Encoding group parameters are maxGroupBandwidth and maxGroupVideoMbps. This draft investigates how SDP can be used to carry CLUE provider advertisement encodings and encoding groups. This was investigated because it was suggested that there are two reasons for using SDP for encoding and encoding group parameters: 1. By carrying the encoding and encoding group parameters in SDP, the call-control agents, proxies, and other intermediaries can monitor and potentially modify these values thereby allowing administrators to easily configure and enforce either static limitations or more dynamic call-shaping approaches. 2. SDP is the usual protocol for communicating the encoding and encoding group parameters in SIP-based systems As specified in the CLUE framework, parties in a telepresence conference do not negotiate a single value between them, therefore offer answer is not used for the full negotiation of encoding values. Rather, we propose, the consumer uses another mechanism, such as a new CLUE protocol, to select which captures it wants and how it wants them encoded. Capture attributes such as spatial relations would also be communicated in a CLUE protocol, rather than in SDP, since they are not media descriptions of the streams. In this approach, the CLUE protocol would then carry the information related to media captures, as well as implement the control functions described in the CLUE framework including configuring the encodings. Romanow, et al. Expires September 6, 2012 [Page 3] Internet-Draft SDP Usage for CLUE March 2012 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119] and indicate requirement levels for compliant implementations. 3. Background The following abstractions defined in the CLUE framework document need to be understood in considering this investigation. o Media capture - fundamental representation of the content, audio and video so far defined. Each media capture is associated with an encoding group. o Capture set - A capture set is most usefully thought of as being a set of table rows, with each row consisting of a set of media captures. Media captures within a row can always be used simultaneously, whereas media captures in different sets may or may not be used simultaneously.. By including multiple media captures in a row within a capture set, the provider is signalling that those captures together form a representation of that capture set's scene. o Simultaneous transmission set - there are often physical constraints that determine whether captures can be sent at the same time or not. For example, a camera may be able to create a video capture which is close up and another which is zoomed out. These 2 views cannot be done simultaneously. The simultaneous transmission set expresses the physical constraints on simultaneity. o Encoding - Represents a way to encode a media capture in terms of its associated media characteristics, specifically maxBandwidth, maxFps, maxMbps, maxHeight and maxWidth. Each encoding is uniquely identified within an encoding group. [Practically it's desirable to have a unique ID among all encodings in the conference] o Encoding group - An encoding group includes a set of one or more individual encodings, plus parameters that apply to the group as a whole, specifically maxGroupBandwidth and maxGroupMpbs. Each encoding group has a unique identifier in the SDP. 4. Assumptions o For each CLUE type ("purpose") of media capture (audio, video- main, or video-presentation), there will be at most one corresponding SDP media stream ("m=" line). Note: video-main and Romanow, et al. Expires September 6, 2012 [Page 4] Internet-Draft SDP Usage for CLUE March 2012 video-presentation are both using "video" media type. o Multiple media captures of the same type, or different encodings of a given media capture will all use the same SDP media stream (e.g. via RTP multiplexing with a different SSRC for each). o It is acceptable to use more than one offer/answer exchanges to setup the whole Telepresence session. o The Telepresence session is established by setting up sets of unidirectional streams (not bidirectional streams). 5. Encoding The CLUE framework calls an RTP stream that the provider is capable of sending, an Encoding. The parameters of an encoding are: encodingID, maxBandwidth, maxMbps, maxWidth, maxHeight, maxFrameRate, as described below in the following tables, one shows video Figure 1 and the other shows audio Figure 2. +-----------------+-------------------------------------------------+ | Name | Description +-----------------+-------------------------------------------------+ | encodingID | A unique identifier for the individual encoding | | maxBandwidth | Maximum number of bits per second | | maxMbps | Maximum number of macroblocks per second: | | ((width + 15) / 16) * ((height + 15) / 16) * framesPerSecond | maxWidth |Video resolution's max supported width, in pixels| | maxHeight |Video resolution's max supported height, in pixels | maxFrameRate | Maximum supported frame rate | +--------------------+----------------------------------------------+ Figure 1: Video encoding parameters +--------------+----------------------------------------+ | Name | Description | +--------------+----------------------------------------+ | maxBandwidth | Max # of bits per second | +--------------+----------------------------------------+ Figure 2: Audio encoding parameters Romanow, et al. Expires September 6, 2012 [Page 5] Internet-Draft SDP Usage for CLUE March 2012 6. Encoding group In addition, the Framework defines an Encoding Group as a set of individual encodings plus parameters that apply to the whole set, as shown in Table xx. The two parameters for the group of encodings are maxGroupBandwidth and maxGroupVideoMbps. Max group bandwidth refers to both audio and video, and is the maximum value of the sum of all of the configured bitrates. The max group video macroblocks per second refers only to video and is the maximum value for the sum of all the macroblocks per second in the encoding group. This is shown in Figure 3 +------------------+------------------------------------------------+ | Name | Description | +------------------+------------------------------------------------+ | maxGroupBandwidth| Max # bits per sec for all encodings combined | | maxGroupVideoMbps| Max # macroblks/sec all video encodings combined | videoEncodings[] | Set of potential encodings (list of encodingIDs) | audioEncodings[] | Set of potential encodings (list of encodingIDs) +------------------+------------------------------------------------+ Figure 3: Encoding group parameters Additional characteristics of encoding groups include: 1. A media provider advertises one or more encoding groups. 2. Each encoding group includes one or more individual encodings. 3. Each individual encoding can represent a different way of encoding media. For example one individual encoding may be 1080p60 video, another could be 720p30, with a third being CIF. 4. While a typical 3 codec/display system might have one encoding group per "codec box", there are many possibilities for the number of encoding groups a provider may be able to offer and for the encoding values in each encoding group. The consumer uses the media capture attribute EncGroupID to know which set of encodings to consider using for that capture and what the constraints are. 5. In a provider advertisement, the sum of the maxBandwidths of the encodings advertised in an encoding group may total more than the maxGroupBandwidth, as these are potential encodings. Similarly for maxGroupVideoMbps. 6. In a consumer request, the sum of the maxBandwidths of the encodings selected in an encoding group may total more than the maxGroupBandwidth. However the provider will send values under the max of the individual encodings and which sum to a total below or equal to that of the group max. Romanow, et al. Expires September 6, 2012 [Page 6] Internet-Draft SDP Usage for CLUE March 2012 The following diagram Figure 4, Association of captures and encodings, depicts the association between the encodings (expressed in SDP) and the media captures, expressed in elsewhere (in a CLUE protocol perhaps). In this diagram the consumer can match Capture 1 with any of the encodings in Encoding group 1. The consumer can match capture 2 with any of the encodings in encoding group 2. Although this illustrates only capture 1 in capture set 1, there would likely be more than one capture. /*change this illustration to add more captures so it doesn't look like 1 goes to encgrp 1 and capture 2 goes to encgrp 2. */ +--------------------+ | Capture Set 1 | | | Encodings for capture 1 | +----------------+--------------- Encoding 1 } | | capture 1 | |------------- Encoding 2 } Encoding group 1 | | video | |------------- Encoding 3 } | | ... | | | | encgrpID 1 | | | +----------------+ | | | | | | +----------------+ | Encodings for capture 2 | | capture 2 | | | | audio | |---------------Encoding 4 } | | .... | |---------------Encoding 5 } Encoding group 2 | | | | | | encgrpID 2 | | | +----------------+ | | | +--------------------+ Figure 4: Association of captures and encodings In this example showing a capture set containing 2 captures, the first captureID 1 is video and is mapped to encoding group 1, encgrpID 1. The consumer can map capture 1 to any or all of the potential RTP streams within encoding group 1. The use case for mapping a capture to more than one encoding is simulcast. The information contained in the encoding group parameters helps the consumer choose how to do this mapping by communicating the resource limitations. Romanow, et al. Expires September 6, 2012 [Page 7] Internet-Draft SDP Usage for CLUE March 2012 7. Solution overview In the CLUE framework, each party in the conference is usually both provider and consumer. Each party needs to advertise its provider encoding parameter values to the other side. It is not the typical communication model for SDP offer answer, which is more often a bidirectional agreement on parameter values. Today telepresence endpoints actively negotiate audio and video SDP media streams using the SDP offer/answer model during call establishment and during midcall features, such as hold resume. For example, if A and B are endpoints or middle boxes in a telepresence conference, A is a provider for B. We propose that A send capture information to B through a CLUE protocol and stream information through SDP. As a consumer, B responds to A also through a CLUE protocol. B chooses which captures it wants "configured" on which offered streams. This process is carried out at call setup and also throughout the conference whenever there is a change in the captures or streams that the provider is capable of sending, or if there is a change in which captures and streams the receiver wants to receive. One possibility for handling provider advertisements of encodings in SDP is for one party to do provider advertisement as the SDP offer and the other party to answer with its provider advertisement. A sends the offer with encoding advertisements. B responds to A's offer in SDP as a provider, sending B's media stream capabilities as an SDP answer. B then sends its capture information to A through the CLUE protocol. [Editor's NOTE: There are other approaches than the one described here, which we will explore in the next version.] The following diagram, Figure 6, Initial offer answer exchange, depicts an end-end call with the message sequence, where Alice is the provider and Bob is the consumer. Normally, Bob is also a provider and Alice is a consumer (whether it is a point to point or multipoint call). Romanow, et al. Expires September 6, 2012 [Page 8] Internet-Draft SDP Usage for CLUE March 2012 Alice Bob | | ________|__________________________________|_______________________ | |(1) INVITE | | | | Offer (Offered media, | | | | CLUE extensions - | | | | enc-grp 1 | | | | enc 1 | | | | enc 2 | | | | enc 3 | SIP | | | enc-grp 2 | CALL SETUP | | | enc 4 | | | | enc 5 | Alice sends Provider | | | TRANSPORT TBD/CLUE connection | advertisement for | | | params ) | encodings & encoding | | | | groups in SDP | | | | | | |--------------------------------->| | | |(2) 200OK | | | | Answer (Offered media, | Bob in | | | CLUE extensions | Provider's role | | | enc-grp 1 | sends his | | | enc 1 | advertisement for | | | enc-grp 2 | encodings & encoding | | | enc 2 | groups | | | TRANSPORT TBD/CLUE negotd | | | | connection ) | | | |<---------------------------------| | |________|__________________________________|_______________________| | | | | ________|__________________________________|_______________________ | | (3) CLUE PROTOCOL | | | |--------------------------------->| | | | CLUE Provider MESSAGEs | Alice and Bob in | | | Describe media captures | Provider's role | | | Includes association of MC | | | | to Encoding Group | | | |<---------------------------------| | | | | | |________|__________________________________|_______________________| | | | | . . . . . . . . . . . . . . . . | ________|__________________________________|_______________________| | | (4) CLUE PROTOCOL | | | | | | | | Bob configures the MCs it wants| Bob in consumer role | | | Onto the encodings it wants | sends CLUE consumer | Romanow, et al. Expires September 6, 2012 [Page 9] Internet-Draft SDP Usage for CLUE March 2012 | | Within the encoding group | CONFIGURE message | | | Constraints. | | | |<---------------------------------| | | | | | |________|__________________________________|_______________________| | | ________|__________________________________|_______________________ | | (5) SIP Re-INVITE | | | | Re-negotiate TIAS bw etc | SIP | | | based on consumer choices | Media Re-negotiation| |________|__________________________________|_______________________| | | | . . . . . . . . . . . . . . . . ________|__________________________________|_______________________ | | (6) CLUE PROTOCOL | | | | | | | | Choose the MCs to be sent, | CLUE MESSAGE | | | Request media in various | | | | encoding formats | | | |--------------------------------->| Alice in | | | | Consumer role | |________|__________________________________|_______________________| | | ________|__________________________________|_______________________ | | (7) SIP Re-INVITE | | | | Re-negotiate TIAS bw etc | SIP | | | based on consumer choices| Media Re-negotiation| |________|__________________________________|_______________________| | | . . . . . . . . . . . . . . . . Figure 5: Call message sequence 8. Proposal for advertising Encoding and Encoding groups in SDP The following depicts the exchange of CLUE capabilities between two endpoints or MCUs in the initial offer/answer exchange of SDP. A and B are each both provider and consumer. First A sends provider information. B responds with its encoding provider information. Romanow, et al. Expires September 6, 2012 [Page 10] Internet-Draft SDP Usage for CLUE March 2012 Provider A +--------------------+ | SDP offer 1 | | | | Offered media | | (actively negotd | | as today) | | | Encoding Group 1 | +-------------+ | | | Encoding 1 | | | | Encoding 2 | | | | ... | | | +-------------+ | Offer | |--------> | Encoding Group 2 | | +-------------+ | | | Encoding 3 | | | | Encoding 4 | | | | ... | | | +-------------+ | | ....... | | | +--------------------+ Provider B +--------------------+ | SDP answer 1 | | | | Offered media | | (actively negotd) | Answer | | <--------| | | Encoding-Group 1 | | +-------------+ | | | Encoding 1 | | | | Encoding 2 | | | | ... | | | +-------------+ | | | | Encoding-Group 2 | | +-------------+ | | | Encoding 3 | | | | Encoding 4 | | | | ... | | | +-------------+ | | ....... | +--------------------+ Romanow, et al. Expires September 6, 2012 [Page 11] Internet-Draft SDP Usage for CLUE March 2012 Figure 6: Initial offer answer exchange The challenge with a bidirectional approach is that the answerer must not respond with a type that was not in the offer. Suppose A offers two SDP media streams video main and audio. B however wants to offer 3 SDP media streams - video main, audio and video presentation. B cannot add a new stream type in the answer. One way around this is to make an agreement that each party offers all media types whether it will send them or not. This allows both sides to offer any media type. An adjustment to reflect the real situation is made in a later offer answer. Using the example above, A will only send video main and audio, however, following the specification, A offers all 3 audio, video main, video presentation. B is going to send presentation, B also offers all 3 audio, video main and video presentation. The CLUE protocol negotiations take place. Then there is an updated offer in which A does not offer video presentation, thus making the SDP media streams accurate. 9. Representation of Encoding and Encoding group in SDP We want a mechanism that will allow the provider to advertise encodings and encoding groups. We can define three new SDP attributes: clue, enc, encgrp. Encgrp is the encoding group. It consists of a an encoding group ID, a list of encodings, and a list of encoding group parameters, maxGroupBandwidth, maxGroupVideoMbps. a=encgrp: Enc is the encoding. It consists of an encoding id and a list of clue parameters and their values for the encoding. a=enc: Clue consists of CLUE encoding parameters: max-fps, max-mbps, max- bitrate, imageattr. The parameter list can be extended in the future. a=clue: Romanow, et al. Expires September 6, 2012 [Page 12] Internet-Draft SDP Usage for CLUE March 2012 where clue attributes in our context would be "imageattr[x= ,y= ]" [draft-ietf-mmusic-image-attributes],"max-mbps", "max-br", max-fps Example of 2 video encodings: a=clue:1 imageattr[w=1920, y=1088] a=clue:2 max-fps=60 a=clue:3 max-mbps=244800 a=clue:4 max-br=4000 a=clue:5 imageattr[x=960, y=554] a=clue:6 max-fps=30 a=clue:7 max-mbps=61200 a=enc:1 clue=1,2,3,4 Encoding 1 and its clue attributes a=enc:2 clue=5,6,7,4 Encoding 2 and its clue attributes Example of 2 video encoding groups a=encgrp: 1 enc=1,2,3 grpparm=3,4,5 a=encgrp:2 enc=4,5,6,7 grpparm=2,4,5 Figure 7: Example 2 video encodings 10. Concerns about conveying CLUE parameters via SDP There are a several concerns in using SDP for CLUE provider capability announcements. Passing information to two separate protocols and parsing it from them complicates the protocol and implementation. Secondly, if the two SIP peers do not communicate directly (e.g. due to SIP Record-Route), then reINVITEs/UPDATEs may be delay-prone compared to an independent and direct end-to-end protocol; this is relatively unimportant so long as any parameters expected to change repeatedly, or that must be sent when a stream switches, are kept out of the SDP. The media information will potentially be large and significantly expand the SDP size. Further work and testing will be required to see if it is a problem in practice, and whether it is inherent to CLUE to begin with. The presence of the CLUE protocol negotiation provides a way to ameliorate this problem for systems operating in a mixed CLUE/ non-CLUE environment that don't wish to send the information to non- CLUE endpoints: the initial offer contains the CLUE protocol m-line, but not the CLUE media descriptions. The answerer sees that the caller is CLUE-capable and includes the media description in their Romanow, et al. Expires September 6, 2012 [Page 13] Internet-Draft SDP Usage for CLUE March 2012 answer, and the offerer can then reINVITE once negotiation is complete with a media description. The value to the intermediaries of the encoding parameters needs to be considered carefully. The intermediaries will have the TIAS value which is updated after CLUE protocol activity. What about the encoding parameters? We feel they are unlikely to be of much use to intermediaries - they are the potential maximum values of potential RTP streams. Of what value would they be? It seems the encoding group parameter values, especially the maxGroupBandwidth could be of interest to intermediaries augmenting the TIAS information. An alternative to including provider advertisements in SDP might be to include an attribute in SDP for the sum of all the Encoding group maxGroupBandwidths, which seems more valuable than all the individual encoding group maxGroupBandwidths. Another factor to consider is that suppose an intermediary changes the maxGroupBandwidth value - that change would be seen by the Consumer, not by the provider, and it is the provider who is the one who needs to see a change in encoding group available resources. Note on real time requirements: The provider advertisements and Consumer selection should be as quick as possible, but do not have hard real time requirements. They can be accomplished within the time scale of signaling. There is however one scenario that has timing requirements more stringent than signaling typically provides. Consider a switch=true audio capture, the information on the spatial location of the audio capture needs to happen in real time. The speaker changes frequently and the location of the speaker must be known quickly (in real time) in order to assign the incoming stream to a decoder. Editor's Note: add reference to draft-clue-hansen 11. Security Considerations TBD 12. Acknowledgements Thanks to Rob Hansen for discussions on the advantages and disadvantages of using SDP. Romanow, et al. Expires September 6, 2012 [Page 14] Internet-Draft SDP Usage for CLUE March 2012 13. IANA Considerations TBD 14. References 14.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 14.2. Informative References [I-D.ietf-clue-framework] Romanow, A., Pepperell, A., Baldino, B., and M. Duckworth, "Framework for Telepresence Multi-Streams", draft-ietf-clue-framework-03 (work in progress), February 2012. Authors' Addresses Allyn Romanow Cisco Systems San Jose, CA 95134 USA Email: allyn@cisco.com Flemming Andreasen Cisco Systems Iselin, NJ USA Email: fandreas@cisco.com Arun Krishna Cisco Systems San Jose, CA USA Email: arukrish@cisco.com Romanow, et al. Expires September 6, 2012 [Page 15]