XCON WG C. Jennings Internet-Draft Cisco Systems Expires: August 9, 2004 B. Rosen Marconi February 9, 2004 Media Mixer Control for XCON draft-jennings-xcon-media-control-00 Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http:// www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on August 9, 2004. Copyright Notice Copyright (C) The Internet Society (2004). All Rights Reserved. Abstract Conference mixers have many controls that change how the media is combined for each participant in the conference. There is a need to describe these to the clients connected to the a centralized conference so that the clients can render a user interface and allow the user to manipulate them. This work is very early and far from complete. This draft sketched the outline of a solution for consideration. It is being discussed on the xcon@ietf.org mailing list. Jennings & Rosen Expires August 9, 2004 [Page 1] Internet-Draft Media Mixer Control February 2004 Table of Contents 1. Conventions . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Introduction to the Problem . . . . . . . . . . . . . . . . 4 2.1 Non Problems . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3.1 Semantic information in a Conference . . . . . . . . . . . . 5 3.2 The Protocol . . . . . . . . . . . . . . . . . . . . . . . . 5 3.3 Templates . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.4 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.5 Controls . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.6 Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4. Introductory Example . . . . . . . . . . . . . . . . . . . . 6 4.1 Simple Audio . . . . . . . . . . . . . . . . . . . . . . . . 6 5. Names and terminology . . . . . . . . . . . . . . . . . . . 8 5.1 Templates . . . . . . . . . . . . . . . . . . . . . . . . . 8 5.2 Participants . . . . . . . . . . . . . . . . . . . . . . . . 8 5.3 Streams . . . . . . . . . . . . . . . . . . . . . . . . . . 8 5.3.1 Stream Types . . . . . . . . . . . . . . . . . . . . . . . . 9 5.3.2 Stream URLs . . . . . . . . . . . . . . . . . . . . . . . . 9 5.3.3 Stream Priority . . . . . . . . . . . . . . . . . . . . . . 9 5.4 Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 5.5 Controls . . . . . . . . . . . . . . . . . . . . . . . . . . 10 5.6 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 10 6. Solution . . . . . . . . . . . . . . . . . . . . . . . . . . 10 6.1 Templates . . . . . . . . . . . . . . . . . . . . . . . . . 11 6.1.1 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 11 6.1.2 Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 6.1.3 Streams . . . . . . . . . . . . . . . . . . . . . . . . . . 12 6.1.4 Controls . . . . . . . . . . . . . . . . . . . . . . . . . . 12 6.1.5 Conference State . . . . . . . . . . . . . . . . . . . . . . 12 6.1.6 Transport Protocol . . . . . . . . . . . . . . . . . . . . . 13 6.2 Controls . . . . . . . . . . . . . . . . . . . . . . . . . . 13 6.2.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . 13 6.2.2 Strings . . . . . . . . . . . . . . . . . . . . . . . . . . 14 6.2.3 Integer . . . . . . . . . . . . . . . . . . . . . . . . . . 14 6.2.4 Boolean . . . . . . . . . . . . . . . . . . . . . . . . . . 15 6.2.5 Selection . . . . . . . . . . . . . . . . . . . . . . . . . 15 6.2.6 Multiple Selection . . . . . . . . . . . . . . . . . . . . . 15 6.2.7 Frame . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 7. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 16 7.1 Audio Video Presentation . . . . . . . . . . . . . . . . . . 16 8. Template Registry . . . . . . . . . . . . . . . . . . . . . 17 9. Comparison to other solutions . . . . . . . . . . . . . . . 18 10. CPCP vs. MPCP vs. CCP vs. MCP . . . . . . . . . . . . . . . 18 11. IANA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 12. Security . . . . . . . . . . . . . . . . . . . . . . . . . . 18 13. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . 18 Jennings & Rosen Expires August 9, 2004 [Page 2] Internet-Draft Media Mixer Control February 2004 Normative References . . . . . . . . . . . . . . . . . . . . 18 Informative References . . . . . . . . . . . . . . . . . . . 18 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . 19 Intellectual Property and Copyright Statements . . . . . . . 20 Jennings & Rosen Expires August 9, 2004 [Page 3] Internet-Draft Media Mixer Control February 2004 1. Conventions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [1]. 2. Introduction to the Problem This work tries to solve the problem of allowing a conference participant to manipulate the media flow in a mixer. It defines a protocol between the end user's software manipulating the conference and the centralized conference mixer. This needs to be rich enough for a mixer to express what information it wants from a mixer yet simple enough to allow the client to render a useful user interface to the user. This work takes into account that real mixers have constraints on what media flows are possible and that UIs have buttons, knobs, etc that users manipulate. The goal is for a conferencing end point made by one vendor to work with mixers or conference systems made by another vendor. 2.1 Non Problems There are several topics that are completely internal to the conference systems and are out of scope for this this work. These include: How the focus manipulates the mixer. How one describes what a mixer is capable of doing. 3. Overview When a conference is created, it is instantiated from a template. The template describes what controls are available for the client to manipulate the media. The conference also describes roles that the client can take on, such as Moderator. The template can have parameters that are set when it is instantiated to allow one template to describe variations of similar flow models. This document describes the templates and ways for the client to understand and manipulate the media in the conference. It allows for the following: A conference consists of several participants and multiple streams of media flowing between the participant and the mixer. Sidebars are mini conferences that are just like conferences Jennings & Rosen Expires August 9, 2004 [Page 4] Internet-Draft Media Mixer Control February 2004 except that a sidebar cannot itself contain sidebars. Clients can discover the template chosen for use in a conference, and the Values of the parameters set for the conference Clients can discover the available streams in a conference. Clients can send media on a participant stream and receive media and receive media on a mixer stream. Clients can discover the Participants in a conference and their role (this is more conference policy than media policy). Clients can join a conference as a participant and assume a particular role. Conferences, Streams, and Participants can have controls that manipulate the media sent and received. The role of the participant will control what view of the conference they have and which media streams they can manipulate. 3.1 Semantic information in a Conference The conference has a list of Participants. Each Participant has a set of Controls That he can manipulate. Each conference has a list of sidebars. Each conference has a list of Streams. Each Stream has attributes such as name, type, priority and list of contributing participants. 3.2 The Protocol The protocol between the client and the conference server allows the client to get the semantic information in the conference, find out when it changes, and make changes to it. It's probably something like XCAP. [TODO add ref] 3.3 Templates Templates define a model for the reception, manipulation and transmission of streams. A template provides enough information that the client can intelligently render a useful GUI to the end user to manipulate the model. There is a registry of well known templates, but a conference server can define new ones. A convener can find all the templates a conference server supports and select one to use when creating the conference. Jennings & Rosen Expires August 9, 2004 [Page 5] Internet-Draft Media Mixer Control February 2004 A template for a very basic audio conference, for example, may indicate that there is one audio stream for each participant, and one output mixer stream named "primary". Each participant in the stream has a single binary control called "Mute". There is only one Role that can be used, called "participant". 3.4 Parameters Parameters are variables in the template that are set when the conference is created. For example, in the audio conference, the maximum number of participants might be a parameter. If the value was set to 10 when the conference is instantiated, then up to 10 participant streams can be accepted into the mixer. The template can indicate the valid range for max number of participants, perhaps from 2 to 128. 3.5 Controls Controls are variables participants may manipulate to control the media streams of the conference. Conferences can have controls, participants in a conference can have controls, and streams in a conference can have controls. Controls can also be implicitly created by stream action, for example a selector control based on the loudest speaker. Controls have a name, and a value. Controls are defined in the template. 3.6 Roles Participants in a conference can take on different Roles that change what ccontrols they may manipulate. The template defines what Roles are available for the client. The moderator (which itself is a role) can change the role of a particular participant. 4. Introductory Example 4.1 Simple Audio The client selects the basic audio template that looks like: The client retrieves this template and uses it to create a conference where it sets the max-participants to 10. Alice and Bob join this conference and the conference server tells Bob about the state of the conference media. There is only one role "participamt". Each participant contributes one input stream. There is also an output stream per participant. There is a single control, called mute, for each participant. After Alice and Bob have joined, the conference server informs Bob that the current state of the conference is as shown in the xml below. 10 0 0 There are two participants, Alice and Bob, who both contribute input streams and receive Mix streams and neither is muted. Bob's client decides to change the Mute state for its audio stream and sends the following to the conference server to change the state Jennings & Rosen Expires August 9, 2004 [Page 7] Internet-Draft Media Mixer Control February 2004 of the conference. 1 A key part of this is that Bob's client may have known about this basic audio template and what the semantics of the "mute" control implied. The client may have connected this up with a button of the client's that was labeled mute. On the other hand, Bob's client may not have known anything about this template and simply rendered a button on the screen and labeled it "mute" with no idea what this would do. A third client may not have been table to deal with the control at all and may have just ignored it. Clearly the user interface can be better if the client understands the semantics of what the template means, but the user interface is still functional when the client does not. 5. Names and terminology 5.1 Templates Templates contain a list of stream, roles for participants, parameters that need to be set, and controls for the conference. 5.2 Participants Participants are the logical user entities participating in a conference. 5.3 Streams The stream is a named stream of media. An example is a simple audio conference with 6 participants and a mixer that mixes the loudest three. Each participant contributes an input stream. There is a single logical output stream, but every participant gets a "custom" version of this stream, because, in normal mixers, each participants can hear all inputs except his own. This is commonly referred to as "mix-minus". If the output steam also has a control (mute), the output streams for each participant may also vary depending on the state of the control. Streams all have a type, a name, a direction (in or out), one or more URLs, and a priority. The URL is the source or sink of the stream. The priority indicates how important this particular stream is to the Jennings & Rosen Expires August 9, 2004 [Page 8] Internet-Draft Media Mixer Control February 2004 conference and the type indicates the type of media carried in this steam. Streams have types. These correspond to the major MIME types of the media they send. 5.3.1 Stream Types 5.3.1.1 Audio Streams originate as participant contributions (dir="in") that are mixed using some kind of algorithm. Intermediate streams may be created, which are subsequently mixed with other streams yielding streams which are sent to participants (dir="out"). Controls commonly available on audio streams include input or output faders (volume controls), stereo balance, and mute. 5.3.1.2 Video Streams originate as participant contributions (dir="in") that are combined with some kind of algorithm. Intermediate streams may be created, which are subsequently combined with other streams yielding streams which are sent to participants (dir="out"). Controls commonly available on video streams might include selectors for choosing a tiling format, selectors which input streams appear on output tiles, and video mutes. 5.3.1.3 Text Streams originate as participant contributions (dir="in") (Instant Messages). Messages from all participants are combined using some algorithm. Intermediate streams may be created, which are subsequently combined with other text streams yielding streams which are sent to participants (dir="out"). 5.3.1.4 Application At a minimal level, this consist of a URL that defines the application. Many systems will simply update an http URL that fetches an HTML page that shows the current presentation. 5.3.2 Stream URLs Streams have URLs that specify the source or sink of the stream. These would typically be a SIP, H323 or XMPP URL. 5.3.3 Stream Priority Jennings & Rosen Expires August 9, 2004 [Page 9] Internet-Draft Media Mixer Control February 2004 Streams have a priority from 0 to 1. Zero indicates that a client, by default, should not play/display this stream unless the user specifically requests it. A priority of 1 indicates that, by default, the client should render this stream and should warn the user if it cannot. Other values only define an ordering, and clients should attempt to use their resources to display the higher priority streams before the lower. 5.4 Roles Roles are defined as part of Conference Policy but are used here so that the Media Policy can define separate streams and controls depending on role. Roles are defined by in the template. Some templates may allow a participant to take on more than one role at a time. Each template must define a role named "participant", which is the default role. "Moderator" is a typical role, as is "Floor-Holder", but templates do not intrinsically define or require such roles. 5.5 Controls Controls manipulate the state of the conference while it is instantiated. All controls have a name, a type, a current value and permissions that indicate whether or not the current client can modify them. They may also have, optionally, a min and max value. A control can be defined as being part of a role. In that case, all participants who assume that role have an instance of the control. A control may also be defined as part of a stream, in which case all contributors of that stream (dir="in") have an instance of the control, or all sinks of the stream (dir="out") have an instance of the control. There can be global controls, which are available to all participants. Implicit controls extract values from streams (or other controls), such as choosing video inputs based on loudest speakers 5.6 Parameters Parameters are variables that modify the function of the template. They are fixed when the conference is instantiated. Parameters allow a single template definition to describe a range of possible mixer capabilities. Parameters have a name, a type, a value and, optionally, a mix and max value. 6. Solution Jennings & Rosen Expires August 9, 2004 [Page 10] Internet-Draft Media Mixer Control February 2004 6.1 Templates A template is an xml document. The template definition includes a name, which is a string, for example: