DISPATCH WG A. Romanow Internet-Draft Cisco Intended status: Informational S. Botzko Expires: January 6, 2011 Polycom July 5, 2010 Problem Statement for Telepresence Multi-streams draft-romanow-dispatch-telepresence-prob-statement-00.txt Abstract Telepresence systems create a "being there" conferencing experience. A number of issues need to be solved largely by manipulating multiple audio and video streams. Different systems take different approaches, employ different techniques, and convey information by using different vocabularies, making interoperability extremely challenging. This problem statement describes the typical issues that must be solved and uses examples to illustrate the kind of diversity that makes interworking problematic. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on January 6, 2011. Copyright Notice Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect Romanow & Botzko Expires January 6, 2011 [Page 1] Internet-Draft Telepresence Problem Statement July 2010 to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Fundamental Issues for Telepresence . . . . . . . . . . . . . 4 4. Manipulating Media Streams . . . . . . . . . . . . . . . . . . 5 5. Examples of Interworking Issues . . . . . . . . . . . . . . . 6 5.1. Designating Roles and Positions for transmitted streams . 6 5.2. Multipoint . . . . . . . . . . . . . . . . . . . . . . . . 7 5.3. Capability Negotiation . . . . . . . . . . . . . . . . . . 9 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 7. Security Considerations . . . . . . . . . . . . . . . . . . . 9 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 9 9. Informative References . . . . . . . . . . . . . . . . . . . . 9 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 10 Romanow & Botzko Expires January 6, 2011 [Page 2] Internet-Draft Telepresence Problem Statement July 2010 1. Introduction In a Telepresence conference, the idea is to create a feeling of presence - that you are in the same room with the remote parties. In order to create the "being there" or telepresence experience, a number of technical issues need to be solved. These issues are addressed by manipulating multiple media streams, video and audio - by describing them, controlling them, and signaling about them. The fundamental features of telepresence require handling multiple streams of media, and considering additional characteristics of those streams beyond those normally specified in existing videoconferencing standards. Different telepresence systems approach solving the basic issues differently. They use disparate techniques, and they describe, control and signal media in dissimilar fashions. Such diversity creates an interoperability problem. The same issues are solved in different ways by different systems, so that they are not directly interoperable. This makes interworking difficult at best and sometimes impossible. Some degree of interworking is possible through transcoding and translation. This requires additional devices, which are expensive and not entirely automatic. Specialized knowledge is required to operate a telepresence conference where the endpoints use different equipment and a transcoding and translating device is employed for interoperability. Often such conferences are interrupted by difficulties that arise. The general problem that needs to be solved is this. The transmitting side sends audio and video streams based upon a model for rendering a realistic depiction from this information. If the receiving side belongs to the same vendor, it works with the same model and renders the information according to that shared model. However, if the receiver and the sender are from different vendors, the models they each have for rendering presence differ. It is as if Alice and Bob are at different sites. Alice needs to tell Bob information about what her camera and sound equipment see at her site so that Bob's receiver can create a display that will capture the important characteristics of her site. Alice and Bob need to agree on what the salient characteristics are as well as how to represent and communicate them. The telepresence multi-steam work seeks to describe the sender situation in a way that allows the receiver to render it realistically though it may have a different rendering model than the sender. This problem statement identifies the fundamental issues that need to Romanow & Botzko Expires January 6, 2011 [Page 3] Internet-Draft Telepresence Problem Statement July 2010 be addressed to provide telepresence in typical use case scenarios. We show how different approaches to solving the problems and different techniques for handling multiple media create a challenge for interoperability. This document describes some of the problems that arise, it is not an complete list, but rather it is more illustrative than exhaustive. Requirements, use cases and solutions are discussed in other documents. 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 3. Fundamental Issues for Telepresence The fundamental issues that must be handled to produce a typical telepresence conference, either point to point or multipoint include: 1. Participant display A. Placement of video B. Size C. Angle D. Overlap E. Display technology 2. Audio A. Placement, emanating from right place B. Type of audio 3. Different number of screens on sender and receiver sides 4. Participant display for multipoint A. Placement of video Romanow & Botzko Expires January 6, 2011 [Page 4] Internet-Draft Telepresence Problem Statement July 2010 B. Continuous presence C. Control of display, how does it change? - automatic, user 5. Maintaining eye contact and gaze connection 6. Panoramic view for site switching 7. Mismatches between media characteristics between sender and receiver, such as: A. aspect ratio B. format C. frame rate D. resolution 8. Presentation A. What methodology? 9. Security A. SRTP? B. Key methodology 4. Manipulating Media Streams In addressing the fundamental issues, multiple media streams are handled in the following ways: 1. Sender and receiver understand each others capabilities A. Number of video, audio and presentation streams that can be sent/received simultaneously B. What media signaling protocol being used (SDP, proprietary, etc.) 2. Streaming control 3. Feedback mechanisms Romanow & Botzko Expires January 6, 2011 [Page 5] Internet-Draft Telepresence Problem Statement July 2010 4. Signaling about RTP payload 5. Media control signaling A. Video refresh B. Flow control 6. Signaling media formats and media capabilities 7. Signaling content type 8. Signaling device type 9. Signaling network characteristics per stream 10. Floor control signaling 5. Examples of Interworking Issues This section describes several examples that illustrate the kinds of incompatibilities that arise when different systems take different approaches to an issue. 5.1. Designating Roles and Positions for transmitted streams Senders and receivers need to have the same vocabulary and understanding of stream roles and positions in order to place them appropriately. For example one system may define roles as: center, left, right, legacy center, legacy right, legacy left, auxiliary 1/5 fps and auxiliary 30 fps positions. These roles as defined are a combination of "input devices" + "codec type/format" for transmission positions, and a combination of "stream decoders/output devices" + "codec type/format" for receive positions. Another system will not have the exact same vocabulary and meaning, though it still has to accomplish the same placement task. How the cameras and encoders are wired determines how the local scene is displayed on the remote screen. In many systems right and left need to be exchanged to be seen properly, but this depends on the way the equipment is wired. In describing how to display the local scene, the language can be misleading if there is no agreed upon reference for right and left. [for example, more] Although often the video is displayed on separate monitors, it is Romanow & Botzko Expires January 6, 2011 [Page 6] Internet-Draft Telepresence Problem Statement July 2010 also possible to use projectors to create a video wall. In this case, there may be an overlap region between cameras which allows for projector blending. Also, although cameras are generally arranged to create a seamless panoramic view of the participants, it is also possible for there to be gaps between cameras (and corresponding gaps between displays). There is also no reference for image size. Some rooms use proportionally larger displays, and set the camera field of view to show participants either standing or sitting at life size. Others use smaller displays, and set the field of view for sitting participants (cropping off heads when people stand). In order to preserve full size display when these systems interoperate, both systems must rescale their video. 5.2. Multipoint Multipoint conferences, where there are more than two endpoints, create a wealth of technical issues to be solved. The primary one is which participants to display on each screen at each site. If the number of sites is greater than can be shown on the number of displays at a site, this adds to the complexity. There are, of course, almost unlimited ways this can be handled. We discuss the common approaches and how they differ. The local screens can show all the camera image from the a particular remote site (site switching); or each local screen can show a participant or two from each of the remote sites (segment switching); or local displays can show a composite of remote camera shots (continuous presence). The choice of who to display on a screen can be determined by users, or, more often, automated according to voice activity level. [Add user-controlled personal telepresence scenario.] Policies are created and implemented in many ways. They tend to be based on some combination of what H.323 defines as centralized and decentralized. One of the challenges is that the endpoints in the conference may have different number of cameras and displays from each other so a common mode on the number of streams and their priority is required. Also, the various endpoints might have different bandwidth constraints and support different codec profiles. A centralized multipoint conference is one in which all participating endpoints communicate in a point-to-point fashion with an MCU. The endpoints transmit their control, audio, video, and/or data streams to the MCU. The MCUA centrally manages the conference, processes the audio, video and/or data streams, and returns the processed streams Romanow & Botzko Expires January 6, 2011 [Page 7] Internet-Draft Telepresence Problem Statement July 2010 to each endpoint. In this mode, the MCU will mix the audio streams; and if using centralized video, will either use voice activated video switch, where everyone will see the active speaker and the speaker will see the previous speaker, or will use continuous presence mode, where the MCU will create a video stream with sub windows for each of the participants. MCUs can support multiple video layouts and they can be created automatically based on the number of participants or by a conference management application. There are three methods commonly used for video stream distribution in centralized multipoint conferences. The three conference policies above can be implemented using any of these technologies. Simple video switching (forwarding) has the advantage of low latency and low complexity. It can be used if all systems are capable of receiving the encodings used by the sending endpoints (including both the video codec and the image resolution/aspect ratio). In some situations it can be wasteful of bandwidth. Full video transcoding usually has higher latency than switching It does not require system to be capable of receiving identical encodings, and different sites can connect with different bandwidths. Layered video encoding combines some of the benefits of video switching and video transcoding. It is more complex than video switching, but less complex than video transcoding. Bandwidth and resolution can be reduced for each site. Since this is done by filtering out layers of the original encoding, the available bandwidths and resolutions are not as fine-grained as full video transcoding. In decentralized mode or full mesh mode each endpoint creates its display mode. This requires each endpoint to receive multiple streams and send its video and audio to all participants, using multicast of unicast. In practice, multicast is not now being used in commercial systems, so the size of a strictly decentralized multipoint conference is limited. There are analogous issues for audio. Like video, the audio is rotated, so there is no clarity on the meaning of left and right. Since the number of streams, microphones, and speakers are not matched, the systems need to re-process the received audio in order to create the correct sound field for their respective rooms. There are two ways in which the audio might be handled in this use case: Romanow & Botzko Expires January 6, 2011 [Page 8] Internet-Draft Telepresence Problem Statement July 2010 o A single stereo audio stream is sent to the remote site, just as in standard videoconferencing. o Three monaural audio streams are sent to the remote site, with proprietary signaling to associate each audio stream with a video stream. Microphones and speakers positions vary; and there is no agreed upon way to describe their placement. There is no agreed upon reference for audio level. In addition, audio may be sent as an independent stream from each microphone or as a multi-channel channel stream. 5.3. Capability Negotiation Call setup for the telepresence conference will start with a single call establishing one video media stream. After the connection is established, a proprietary capability negotiation takes place that will enable both sides to identify that they are telepresence applications and capable of having two more video sessions and provide the connectivity information. The result is that two or more video sessions are established. The system may use two new SIP call legs or just add the two new video streams to the existing dialog. [more to be added] 6. IANA Considerations This document contains no IANA considerations. 7. Security Considerations While there are likely to be security considerations for any solution for telepresence interoperability, this document has no security considerations. 8. Acknowledgements The draft has benefitted from input from a number of people including Roni Even, Jim Cole, Nermeen Ismail, Nathan Buckles. 9. Informative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. Romanow & Botzko Expires January 6, 2011 [Page 9] Internet-Draft Telepresence Problem Statement July 2010 Authors' Addresses Allyn Romanow Cisco San Jose, CA 95134 US Email: allyn@cisco.com Stephen Botzko Polycom Andover, MA 01810 US Email: stephen.botzko@polycom.com Romanow & Botzko Expires January 6, 2011 [Page 10]