Internet DRAFT - draft-pepperell-clue-switched-attribute

draft-pepperell-clue-switched-attribute






CLUE                                                        A. Pepperell
Internet-Draft                                               Silverflare
Intended status: Standards Track                              A. Romanow
Expires: December 2, 2012                                      R. Hansen
                                                              B. Baldino
                                                           Cisco Systems
                                                            May 31, 2012


  Use of switched capture attribute & spatial co-ordinates in advanced
                                 cases
               draft-pepperell-clue-switched-attribute-00

Abstract

   This draft examines the issues with advertising "switched" captures
   in CLUE, and makes some proposals for how to solve the issues
   involved.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on December 2, 2012.

Copyright Notice

   Copyright (c) 2012 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of



Pepperell, et al.       Expires December 2, 2012                [Page 1]

Internet-Draft  Switched attribute & spatial co-ordinates       May 2012


   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.


Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . 3
   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 3
   3.  The need for switched captures in CLUE  . . . . . . . . . . . . 3
   4.  Issues with switched captures . . . . . . . . . . . . . . . . . 4
   5.  Proposed approach . . . . . . . . . . . . . . . . . . . . . . . 5
   6.  A less minimalist solution  . . . . . . . . . . . . . . . . . . 6
   7.  Security Considerations . . . . . . . . . . . . . . . . . . . . 6
   8.  References  . . . . . . . . . . . . . . . . . . . . . . . . . . 6
     8.1.  Normative References  . . . . . . . . . . . . . . . . . . . 6
     8.2.  Informative References  . . . . . . . . . . . . . . . . . . 6
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . . . 6


































Pepperell, et al.       Expires December 2, 2012                [Page 2]

Internet-Draft  Switched attribute & spatial co-ordinates       May 2012


1.  Introduction

   This draft attempts to state some of the issues involved in using
   switched captures in CLUE, explores the need for a "switched"
   attribute and what this attribute means in different contexts.


2.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119] and
   indicate requirement levels for compliant implementations.


3.  The need for switched captures in CLUE

   The media capture "switched" attribute refers to captures whose
   content can change between different provider-chosen possibilities.
   A typical case might be a 3 camera system choosing to offer a capture
   scene entry comprising a single switched video capture which at any
   given time would show one of the 3 camera feeds available (perhaps
   based on audio activity within its local scene, or room).  The
   presence of the "switched" attribute would distinguish such a media
   capture from another that, say, was providing a fixed, zoomed out,
   view of all the available seats, even if both captures involved used
   identical point of origin and capture co-ordinates.

   In common with other capture types, a consumer would only need to set
   up decoder state for a switched capture once (when first selecting to
   be sent an instantiation of that capture) and not need to modify such
   state in response to the provider choosing to change the source of
   the switched capture.  Note that "switched" here carries no
   implications in terms of whether the audio or video in question has
   been transcoded or forwarded unmodified.

   For an MCU or endpoint providing 1, 2, 3, 4 ... n, video captures
   with adjacency characteristics (for instance, camera feeds intended
   to be shown "in a line" or a transcoded conference view created for
   display across multiple monitors) the capture co-ordinates supplied
   by the provider give the consumer sufficient information to be able
   to render those captures correctly.  Specifically, the consumer knows
   not only that a group of video captures forms a complete
   representation of the capture scene (because together those captures
   from a capture scene entry) but also how the captures in that group
   should be displayed relative to each other in order to preserve the
   integrity of the rendered scene.




Pepperell, et al.       Expires December 2, 2012                [Page 3]

Internet-Draft  Switched attribute & spatial co-ordinates       May 2012


4.  Issues with switched captures

   CLUE is, however, intended to cover more advanced switching cases,
   cases typically (though not necessarily exclusively) involving an MCU
   device.  For instance, an MCU may choose to forward a selection of
   significant participants' audio and video captures to devices
   participating in that conference in order for those devices to form
   their own appropriate multi-pane layouts.  This might be a required
   feature of the system (if the MCU in question had no transcoding or
   composition capabilities) or simply a desired one (perhaps in order
   to reduce latency and media degradation caused by potentially
   multiple stages of transcoding).  These cases result in MCU middle
   box devices wishing, as part of their CLUE provider roles, to
   advertise to participating consumer devices the availability of
   potentially many switched captures.  For example, an MCU might
   advertise the availability of up to 20 such switched streams;
   possible consumer behaviors in such a case would include:
   o  a 4 screen endpoint choosing to receive the "top 16" switched
      video streams to display a 2 x 2 grid on each of its screens
   o  a 3 screen endpoint choosing to receive the "top 12" switched
      video streams to display the most significant 3 at full screen
      size and the next 9 as 3 small PiPs on each screen
   o  a 1 screen endpoint choosing to receive the "top 10" switched
      video streams and forming a "1 big + 9 small" display

   To support such cases, several additional factors need to be
   considered in addition to what has been previously discussed:
   o  knowledge that the 20 switched streams advertised do not all need
      to be sent to the consumer for it to be able to represent the
      complete scene to the user (this is not the case for a normal
      multi-camera endpoint scenario, for instance, where typically a
      consumer would need all captures in a capture scene entry in order
      to be able to render that scene)
   o  ensuring that the spatial characteristics of contributing systems
      to the ordered set are adhered to when sending out the requested
      instantiated captures to consumers (for instance in the 3 screen
      "top 12" example above), the provider should be able to take into
      account the undesirability of splitting the 4 constituent captures
      of a 4 camera system that was the active speaker across 3 full
      screen panes and a single small PiP)
   o  ensuring that sufficient stream synchronization information is
      available at the consumer in order for it to be able to perform
      correct lip sync on the dynamically changing set of received audio
      and video streams







Pepperell, et al.       Expires December 2, 2012                [Page 4]

Internet-Draft  Switched attribute & spatial co-ordinates       May 2012


5.  Proposed approach

   A minimalist solution to the above issues is proposed here and
   addresses the above points as follows:
   o  Use of the existing "switched" media capture attribute to cover
      two subtly-different cases:
      *  endpoints providing a subset of their available camera feeds /
         microphones as one or more switched captures
      *  an MCU providing a subset of all current participants as a set
         to be laid out by a consumer device in a layout of the
         consumer's choosing
   o  Indicating to the consumer that a valid representation of the
      scene can be constructed with a subset of the captures that form a
      capture set entry would be accomplished by ensuring that the
      captures in that capture set entry do not have any associated co-
      ordinate or point of origin attributes.  For instance, if an MCU
      were able to send on 100 such streams but a receiving consumer
      device could only form a 2 x 2 layout of the 4 most significant,
      it would need to be able to determine that the 100 capture capture
      scene entry was still of use to it, rather that it being, say, a
      strip of 100 video thumbnails that was only a valid representation
      of the scene when displayed in a certain order.  In many senses it
      would not be possible for the provider to supply any capture co-
      ordinates in this case because no fixed, pre-determined, set of
      co-ordinates would be valid.
   o  In order for the provider device to be able to make correct
      choices about which of the ordered set of participants' captures
      to send to the consumer device, there is a requirement for some
      information about the consumer-side render groupings to be
      conveyed from the consumer to the provider.  The proposal here is
      to be consistent with the provider-side X / Y / Z capture co-
      ordinates and for the consumer to be able to signal, when making
      its stream choice from the provider, the render co-ordinates of
      each instantiated video capture.  For example, if the active
      speaker was a 3 camera system, all 3 corresponding video captures
      might be sent to a consumer that had signalled that the first 3 or
      more video captures would be rendered adjacently.  A consumer
      device with a different rendered layout might only be sent the
      "second loudest" participant's video (if, for instance, the
      corresponding source system was supplying just a single camera-
      sourced video capture).
   o  In order for the dynamic mapping between audio and video captures
      to be ascertained by the consumer, the proposal is for use of the
      RTCP CNAME attribute to be the preferred mechanism, and for
      consumer devices to monitor which streams have the same clock
      source, and so can be usefully synchronized.





Pepperell, et al.       Expires December 2, 2012                [Page 5]

Internet-Draft  Switched attribute & spatial co-ordinates       May 2012


6.  A less minimalist solution

   An alternative to the above minimalist solution would be to remove
   some of the implicitly signalled elements; specifically:
   o  a new attribute could be defined at the capture scene entry level
      explicitly signalling that a subset of the constituent captures
      can be used to produce a valid representation of that scene (this
      removes the significance of, and thus the need to observe, the
      absence of provider-side capture co-ordinates)
   o  rather than reusing the "switched" capture attribute for both a
      single system switching between its available captures that cover
      a single scene and an MCU-style device providing a set of "active
      speaker" captures, introduce a new attribute for captures that
      represent a provider choice of captures potentially cut down from
      a larger list (e.g. the superset of all captures from all
      conference participants) ordered by some provider-specific method
      (e.g. loudest participants first)


7.  Security Considerations

   This draft involves only the internal nomenclature of the CLUE
   framework and data model, and hence has no security considerations.


8.  References

8.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

8.2.  Informative References

   [I-D.ietf-clue-framework]
              Romanow, A., Duckworth, M., Pepperell, A., and B. Baldino,
              "Framework for Telepresence Multi-Streams",
              draft-ietf-clue-framework-05 (work in progress), May 2012.


Authors' Addresses

   Andy Pepperell
   Silverflare

   Email: andy.pepperell@silverflare.com





Pepperell, et al.       Expires December 2, 2012                [Page 6]

Internet-Draft  Switched attribute & spatial co-ordinates       May 2012


   Allyn Romanow
   Cisco Systems
   San Jose, CA  95134
   USA

   Email: allyn@cisco.com


   Robert Hansen
   Cisco Systems
   San Jose, CA  95134
   USA

   Email: rohanse2@cisco.com


   Brian Baldino
   Cisco Systems
   San Jose, CA  95134
   USA

   Email: bbaldino@cisco.com





























Pepperell, et al.       Expires December 2, 2012                [Page 7]