CLUE C. Groves, Ed. Internet-Draft W. Yang Intended status: Informational R. Even Expires: March 15, 2013 P. Kyzivat Huawei September 11, 2012 Discussion of scene and capture scene entry concepts draft-groves-clue-scene-clarifications-00 Abstract The CLUE framework document defines the concept of media captures which identify and describe the characteristics of media sent by a provider. A capture scene is a grouping concept that provides ties a number of captures together. Within the capture scene the media captures may further be grouped into capture scene entries. The current text in the CLUE framework document describing these concepts could be improved and clarified so that both providers and consumers use the same logic to encode/decode CLUE messages with capture scenes/ capture scene entries and media captures. This draft discusses these concepts and provides suggested updates. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on March 15, 2013. Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents Groves, et al. Expires March 15, 2013 [Page 1] Internet-Draft Abbreviated Title September 2012 (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. 1. Introduction Recent discussion on the IETF mailing list has indicated that there are different interpretations of exactly what a "Capture Scene Entry" is. In the framework draft it is defined as: /*Capture Scene Entry: a list of media captures of the same media type that together form one way to represent the capture scene./ it also says: /A media provider arranges media captures in a capture scene to help the media consumer choose which captures it wants. The capture scene entries in a capture scene are different alternatives the provider is suggesting for representing the capture scene./ It could be interpreted as meaning that each "Capture Scene Entry" shows an entire scene, or it could be interpreted as meaning that each "Capture Scene Entry" could show part of the overall scene. It appears from reading the framework both interpretations are allowed. For example: if VC1,VC2,VC3 are three main cameras, VC4 is an camera that automatically zooms in on the active speaker. One could construct a capture scene with one or two capture scene entries. There are other possibilities but to illustrate the issue the following alternatives are shown: Groves, et al. Expires March 15, 2013 [Page 2] Internet-Draft Abbreviated Title September 2012 +------------------+ | Capture Scene #1 | +------------------+ | VC1, VC2, VC3 | | VC4 | +------------------+ or +--------------------+ | Capture Scene #1 | +--------------------+ | VC1, VC2, VC3, VC4 | +--------------------+ Figure 1 Depending on how the consumer is designed it may not see the two options as equivalent. In the first example a consumer may only chose the first capture scene entry as it has been designed with the premise that a capture scene entry represents the whole scene and therefore it is sufficient to choose only one line. If it choses one line it missing out on part of the scene. Another consumer may choose both as it recognises multiple capture scene entries. Clearly there is an interoperability problem if the provider and consumer are using different logic to encode / decode captures from scenes. It is also noted that there may be additional complexity for consumers if the method of using capture scene entries to represent partial scenes is used. A consumer would have to parse the characteristcis of each capture to see how they are related. A further issue is if the entire scene is depicted how does the consumer determine which captures are closely related within a capture scene entry. For example: which of the capture must be processed together to form a continuous image. The area of capture attribute does provide positioning information however determining which captures make up a continuous image may be non-trivial and would become more complicated as the number of captures in a scene entry rose. It would be compounded in situations where different providers have different gaps and scales as there would be different thresholds as to what captures would appear to be continuous. It is believed that having a way to indicated closely bound captures would be advantageous to simplifying this determination. To add to the complication the framework draft is vague in making recommendations as to what the consumer should do at the reception of scene / capture scene entry / capture information. It suggests that the consumer could pick an element from each of the option presented. It's not clear how this would work under the assumption that a provider provides this information with the understanding to it must Groves, et al. Expires March 15, 2013 [Page 3] Internet-Draft Abbreviated Title September 2012 be able to simultaneously provide this media. Chosing different captures to what the provider has sent may mean that the media streams related to captures may not be able to sent together. Limiting this flexibility may increase interoperability. 2. Proposal This section provides a strawman proposal of principles that should be documented in the framework document Whilst there appears to be different interpretations, to the Contributors there appears to be one fundamental assumption that everyone agrees on, that is that all captures within a scene must be spatially related. It is also assumed that there would be a desire to increase the chances of interoperability and to minimise the complexity needed for encoding and parsing scene information. Based on this assumption and the above discussion it is proposed that : a) A scene represents an area where the capture devices are spatially related, i.e. a presentation that shares no spatial relation with a video is a separate scene. b) A capture scene entry thus represents alternate representations of a complete scene. The provider is the one that determines what a complete scene is. A consumer choses a capture scene entry from the scene in the knowledge that it represents the entire scene. c) A new concept of "Capture group" be incorporated into a capture scene entry which indicates which captures are closely bound. Closely bound captures are those that have a closer relationship than just being spatially related and MUST be able to be encoded and sent simultaneously. E.g. multiple captures for the purposes of capturing a continuous image or sound stage are closely bound. d) A consumer shall chose a captre scene entry rather than choosing individual captures from multiple entries. This does not mean that an consuming endpoint must render all the captures. What is locally rendered and how is a local decision. Groves, et al. Expires March 15, 2013 [Page 4] Internet-Draft Abbreviated Title September 2012 3. Acknowledgements This template was derived from an initial version written by Pekka Savola and contributed by him to the xml2rfc project. 4. IANA Considerations This memo includes no request to IANA. 5. Security Considerations TBD 6. References 6.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 6.2. Informative References [I-D.ietf-clue-framework] Romanow, A., Duckworth, M., Pepperell, A., and B. Baldino, "Framework for Telepresence Multi-Streams", draft-ietf-clue-framework-06 (work in progress). [RFC2629] Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629, June 1999. [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC Text on Security Considerations", BCP 72, RFC 3552, July 2003. [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA Considerations Section in RFCs", BCP 26, RFC 5226, May 2008. Groves, et al. Expires March 15, 2013 [Page 5] Internet-Draft Abbreviated Title September 2012 Authors' Addresses Christian Groves (editor) Huawei Melbourne, Australia Email: Christian.Groves@nteczone.com Weiwei Yang Huawei P.R.China Email: tommy@huawei.com Roni Even Huawei Tel Aviv, Isreal Email: roni.even@mail01.huawei.com Paul Kyzivat Huawei Greater Boston, Massachusetts USA Email: pkyzivat@alum.mit.edu Groves, et al. Expires March 15, 2013 [Page 6]