CLUE WG A. Romanow Internet-Draft Cisco Systems Intended status: Informational S. Wenger Expires: October 24, 2011 Vidyo, Inc. April 22, 2011 Requirements for Telepresence Multi-Streams draft-romanow-clue-telepresence-requirements-02.txt Abstract This memo discusses the requirements for a specification that enables telepresence interoperability, by describing the relationship between multiple RTP streams. In addition, the problem statement and definitions are also covered herein. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on October 24, 2011. Copyright Notice Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Romanow & Wenger Expires October 24, 2011 [Page 1] Internet-Draft CLUE Telepresence Requirements April 2011 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 4 4. Problem Statement . . . . . . . . . . . . . . . . . . . . . . 5 5. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 6 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 8 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 8. Security Considerations . . . . . . . . . . . . . . . . . . . 8 9. Informative References . . . . . . . . . . . . . . . . . . . . 8 Appendix A. Draft History . . . . . . . . . . . . . . . . . . . . 9 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 10 Romanow & Wenger Expires October 24, 2011 [Page 2] Internet-Draft CLUE Telepresence Requirements April 2011 1. Introduction Telepresence systems greatly improve collaboration. In a telepresence conference (as used herein), the goal is to create an environment that gives the users a feeling of (co-located) presence - the feeling that a local user is in the same room with other local users and the remote parties. Currently, systems from different vendors often do not interoperate because they do the same tasks differently, as discussed in the Problem Statement section below. The approach taken in this memo is to set requirements for a future specification that, when fullfilled by an implementation of the specification, provide for interoperability between IETF protocol based telepresence systems. It is anticipated that a solution for the requirements set out in this memo likely involves the exchange of adequate information about participating sites; information that is currently not standardized by the IETF. The purpose of this document is to describe the requirements for a specification that enables interworking between different SIP-based [RFC3261] telepresence systems, by exchanging and negotiating appropriate information. Non IETF protocol based systems, such as those based on ITU-T Rec. H.323, are out of scope. These requirements are for the specification, they are not requirements on the telepresence systems implementing the solution/protocol that will be specified. Telepresence systems of different vendors, today, can follow radically different architectural approaches while offering a similar user experience. It is not the intention of CLUE to dictate telepresence architectural and implementation choices. CLUE enables interoperability between telepresence systems by exchanging information about the systems' characteristics. Systems can use this information to control their behavior to allow for interoperability between those systems. In a telepresence session, required are at least one sending and one receiving endpoint. Most telepresence endpoints are full-duplex in that they are both sending and receiving. Some, especially multiparty telepresence sessions include more than two endpoints, and centralized infrastructure such as Multipoint Control Units (MCUs) or equivalent. CLUE specifies the syntax, semantics, and control flow of information to enable the best possible user experience at those endpoints. Sending endpoints, or MCUs, are not mandated to use any of the CLUE specifications that describe their capabilities, attributes, or behavior. Similarly, it is not envisioned that endpoints or MCUs Romanow & Wenger Expires October 24, 2011 [Page 3] Internet-Draft CLUE Telepresence Requirements April 2011 must ever take into account information received. However, by making available as much information as possible, and by taking into account as much information as has been received or exchanged, MCUs and endpoints are expected to select operation modes that enable the best possible user experience under their constraints. The document structure is as follows: Definitions are set out, followed by a description of the problem of telepresence interoperability that led to this work. Then the requirements to a specification addressing the current shortcomings are enumerated and discussed. 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 3. Definitions Audio mixing - The accumulation of scaled audio signals to produce a single audio stream. See RTP Topologies, [RFC5117]. Capture device - A device that converts audio and video input into an electrical signal, for example a camera or a microphone. Conference - A Framework for Conferencing within the Session Initiation Protocol (SIP). Defined in [RFC4353] Endpoint - The logical point of final termination through receiving, decoding and rendering, and/or initiation through capturing, encoding, and sending of media streams. An endpoint consists of one or more physical devices which source and sink media streams. In contrast to an endpoint, a middle box, such as an MCU, may also send and receive media streams, but it is not the initiator nor the final terminator. Endpoints can be anything from multiscreen/multicamera rooms to handheld devices. Layout - How media streams are spatially arranged with respect to each other on a single screen/mono audio telepresence endpoint, and how media streams are arranged with respect to each other on a multiple screen/speaker telepresence endpoint. Note that audio as well as video is encompassed by the term layout--in other words, included is the placement of audio streams on speakers as well as video streams on video screens. Romanow & Wenger Expires October 24, 2011 [Page 4] Internet-Draft CLUE Telepresence Requirements April 2011 Media - Any data that, after suitable encoding, can be conveyed over RTP, including audio, video or timed text. MCU - Multipoint Control Unit (MCU) - A device that connects two or more endpoints together into one single conference [RFC5117]. Remote - Sender or receiver on the other side of the communication channel; i. e., not local. A remote can be an endpoint or an MCU. Render - The process of generating a representation from a media, such as displayed motion video or sound emitted from loudspeakers. Session - RTP session. An association among a set of participants communicating with RTP. The distinguishing feature of an RTP session is that each maintains a full, separate space of SSRC identifiers. [RFC3550] Source selection policies - Rules for determining which media source(s) to play or show. Spatial relations - The arrangement in space of objects, in contrast to relation in time or other relationships. Video composite - A single image that is formed from combining visual elements from separate sources. 4. Problem Statement In order to create the "being there" or telepresence experience, media inputs need to be captured, sent, transported, received, rendered, and coordinated. Different telepresence systems take diverse approaches in crafting a solution, or implement similar solutions differently. Telespresence systems from different vendors can use disparate techniques, and they describe, control and negotiate media in dissimilar fashions. Such diversity creates an interoperability problem. The same issues are solved in different ways by different systems, so that they are not directly interoperable. This makes interworking difficult at best and sometimes impossible. Worse, many telepresence use proprietry protocol extensions to solve telepresence-related problems, even if those extensions are based on common standards such as SIP. Romanow & Wenger Expires October 24, 2011 [Page 5] Internet-Draft CLUE Telepresence Requirements April 2011 Some degree of interworking between systems from different vendors is possible through transcoding and translation. This requires additional devices, which are expensive, often not entirely automatic, and sometimes introduce unwelcome side effects such as additional delay or degrading performance. Specialized knowledge is currently required to operate a telepresence conference with endpoints from different vendors, for example to configure transcoding and translating devices. Often such conferences do not commence as planned, or are interrupted by difficulties that arise. The general problem addressed by CLUE that needs to be solved can be described as follows. Today, the transmitting side sends audio and video streams based upon an implicitely assumed model for rendering a realistic depiction from this information. If the receiving side belongs to the same vendor, it works with the same model and renders the information according to the model implicitely assumed by the vendor. However, if the receiver and the sender are from different vendors, the models they each have for rendering presence can and usually do differ. The result can be that the telepresence systems actually connect, but the user experience suffers, for example because one system assumes that the first video stream stems from the right camera, whereas the other assumes the first video stream stems from the left camera. It is as if Alice and Bob are at different sites. Alice needs to tell Bob information about what her camera and sound equipment see at her site so that Bob's receiver can create a display that will capture the important characteristics of her site. Alice and Bob need to agree on what the salient characteristics are as well as how to represent and communicate them. Characteristics include number, placement, capture/render angle, resolution of cameras and screens, spatial location and mixing parameters of microphones. The telepresence multi-stream work seeks to describe the sender situation in a way that allows the receiver to render it realistically, though it may have a different rendering model than the sender; and for the receiver to provide information to the sender in order to help the sender create adequate content for interworking. 5. Requirements These requirements are for the CLUE protocol specification (henceforth also called CLUE solution); they are not requirements for the telepresence systems that implement CLUE. Romanow & Wenger Expires October 24, 2011 [Page 6] Internet-Draft CLUE Telepresence Requirements April 2011 REQMT-1: The solution MUST support a means of describing the spatial arrangement of source video images sent in video streams. For example this image/stream is left of that (indicated) one, or above it. REQMT-2: The solution MUST support a means of describing the spatial arrangement of source audio sounds sent in audio streams. REQMT-3: The solution MUST support a mechanism to allow spatial matching of audio and video streams from the same endpoints. REQMT-4: The solution SHOULD support a mechanism by which a sending endpoint or MCU can describe aspects of its characteristics and attributes. REQMT-5: The solution MUST support a mechanism by which a receiver, (endpoint or MCU), can send specified information about itself to the sender (endpoint or MCU). REQMT-6: The solution MUST include a mechanism to support a means of describing dynamic changes in the streams offered by the sender; and dynamic changes in the streams requested by the receiver. REQMT-7: The solution MUST include a mechanism by which the receiver can learn the description of the streams the sender can send it. REQMT-8: The solution MUST include a mechanism by which the receiver (endpoint or MCU) can specify to the sender (endpoint or MCU) what streams it wants to receive. REQMT-9: The solution MUST support interoperability between "asymmetric" telepresence endpoints, meaning between telepresence endpoints that have a different number of input and output devices, such as screens, speakers, microphones, cameras. For example in a multi-point conference, one endpoint may have one screen, another may have 2 screens and a third may have 3 screens. REQMT-10: The solution MUST include a mechanism that makes it possible for endpoints with different capabilities to interoperate in a telepresence conference. Romanow & Wenger Expires October 24, 2011 [Page 7] Internet-Draft CLUE Telepresence Requirements April 2011 REQMT-11: The solution MUST support a layout model in which the local telepresence system does the layout * Remote tells local properties of available captured/ coded media (such as: max video resolutions, spatial placements of capture devices, capabilities of codecs, inter-codec dependencies,...) * Local, based on its knowledge of rendering stystems, tells the remote which streams it wants REQMT-12: The solution SHOULD support a model in which all endpoints can inform each other about their capture/encode and render/decode capabilities and constraints, and locally decide how to capture/encode/send and receive/decode/ render based on the information provided by other endpoints. REQMT-13: The solution MUST support a mechanism for specifying audio and video source selection policies. REQMT-14: The solution MUST not cause additional call setup latency. 6. Acknowledgements This draft has benefited substantially from thoughtful review by Espen Berger, Peter Musgrave, Cullen Jennings, Mark Duckworth, Jim Cole, Paul Kyzivat, and Roni Even. It also benefits from Brian Rosen's helpful thoughts on the topic. 7. IANA Considerations TBD 8. Security Considerations TBD 9. Informative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, Romanow & Wenger Expires October 24, 2011 [Page 8] Internet-Draft CLUE Telepresence Requirements April 2011 A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002. [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003. [RFC4353] Rosenberg, J., "A Framework for Conferencing with the Session Initiation Protocol (SIP)", RFC 4353, February 2006. [RFC5117] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117, January 2008. Appendix A. Draft History Changes from version 1 NEEDS UPDATING. All the changes are based on comments from IETF 80 WG meeting 1. Put comments into the introduction 2. Definitions - removed conceptual stream, region, participant, render device, source selection. Changed endpoint, layout. Added media, MCU. 3. Pulled out assumptions, after responding to comments during WG meeting only the following 3 were left, and it seemed like they were not offering much to the draft. ASMP-3: Different telepresence systems may do layout differently - any of locally, remotely, or a combination of the two. ASMP-4: Layout decisions can be made by various mechanisms, such as an algorithm, administratively determined, or local user- based. ASMP-5: Layout can be static, fixed at call setup, or dynamic, changing during runtime, or all. 4. Requirements- they were all re-written or removed based on comments. #11 is still under discussion as to whether it is necessary/desirable or not. Romanow & Wenger Expires October 24, 2011 [Page 9] Internet-Draft CLUE Telepresence Requirements April 2011 5. Added appendix to track changes Authors' Addresses Allyn Romanow Cisco Systems San Jose, CA 95134 USA Email: allyn@cisco.com Stephan Wenger Vidyo, Inc. 433 Hackensack Ave., 7th Floor Hackensack, NJ 07601 USA Email: stewe@stewe.org Romanow & Wenger Expires October 24, 2011 [Page 10]