Network Working Group                                      Lingli Deng 
    Internet Draft                                                Jin Peng 
    Intended status: Informational                            China Mobile 
    Expires: January 2013                                     July 3, 2012 
                                       
     
                                          
                Sender Media Control based on Local Status Detection 
                        draft-deng-rtcweb-svccontrol-00.txt 


    Status of this Memo 

       This Internet-Draft is submitted in full conformance with the 
       provisions of BCP 78 and BCP 79.  

       Internet-Drafts are working documents of the Internet Engineering 
       Task Force (IETF), its areas, and its working groups.  Note that 
       other groups may also distribute working documents as Internet-
       Drafts. 

       Internet-Drafts are draft documents valid for a maximum of six 
       months and may be updated, replaced, or obsoleted by other documents 
       at any time.  It is inappropriate to use Internet-Drafts as 
       reference material or to cite them other than as "work in progress." 

       The list of current Internet-Drafts can be accessed at 
       http://www.ietf.org/ietf/1id-abstracts.txt 

       The list of Internet-Draft Shadow Directories can be accessed at 
       http://www.ietf.org/shadow.html 

       This Internet-Draft will expire on January 3, 2009. 

    Copyright Notice 

       Copyright (c) 2012 IETF Trust and the persons identified as the 
       document authors. All rights reserved. 

       This document is subject to BCP 78 and the IETF Trust's Legal 
       Provisions Relating to IETF Documents 
       (http://trustee.ietf.org/license-info) in effect on the date of 
       publication of this document. Please review these documents 
       carefully, as they describe your rights and restrictions with 
       respect to this document.  

    Abstract 

     
     
     
    Peng, et al.           Expires January 3, 2013                [Page 1] 
     








    Internet-Draft               svc-control                      July 2012 
        

       This document proposes to add sender media control based on local 
       status detection by the browser, to further reduce multiparty video 
       conferencing's media bandwidth consumption with SVC cdoec for RTCWEB 
       use-cases. 

    Table of Contents 

        
       1. Introduction ................................................. 2 
       2. Problem Statement ............................................ 3 
          2.1. Existing Proposals in [use-case-draft] .................. 3 
             2.1.1. Receiver-harvested SVC-stream (SVC scheme).......... 3 
             2.1.2. Receiver-selected multi-stream (Multi-stream scheme) 4 
             2.1.3. Receiver-transcoded HD stream (transcoding scheme).. 4 
          2.2. Analysis of existing proposals .......................... 4 
       3. Enhanced SVC scheme with sender control ...................... 5 
          3.1. Overview of the core idea................................ 5 
          3.2. A simple example......................................... 5 
          3.3. Extensions to the simple example ....................... 11 
          3.4. Applicable Scenarios ................................... 12 
       4. Derived Requirements ........................................ 12 
       5. Security Considerations ..................................... 13 
       6. IANA Considerations ......................................... 13 
       7. References .................................................. 13 
          7.1. Normative References ................................... 13 
          7.2. Informative References ................................. 13 
        
    1. Introduction 

       As an important use-case of RTCWEB, multiparty video conferencing 
       has been discussed in depth in [use-case-draft], which states the 
       requirement for the receiving party (either a participant in the p2p 
       use-case or the centralized mixing server use-case) tend to render 
       the remote video input based on the audio input status (i.e. whether 
       the user is the spokesman of the conference at this moment) of the 
       sending party.  

       Assume an organization needs to establish a multiparty video 
       conference through a video communication system and hopes to be 
       connected point to point directly among the peers involved in the 
       process of the video conference session. Each conference participant 
       can send audio and video media stream to the other conference 
       participants. When one session participant receives the video and 
       audio stream from the others, he wants the conference spokesman to 
       be presented in a high-definition large window in the middle while 
       the other conference participants (including himself as local 
       playback) to a group of small windows by the side. The spokesman 
     
     
    Peng, et al.           Expires January 3, 2013                [Page 2] 
        








    Internet-Draft               svc-control                      July 2012 
        

       changes frequently from one to another with conducting of the 
       conference, while the participants always want to present the video 
       stream of the current conference spokesman in a high-definition 
       large window in the local browser and would like to present the 
       video stream of ordinary participants in small windows. 

       Taking into account the peer-to-peer video conferencing traffic 
       overhead and terminal traffic restrictions, how can we minimize the 
       cost of the video conferencing traffic overhead without affecting 
       the effects of the participants?presentation in video conference? 
       To satisfy the requirement that for each participant one high 
       resolution video is displayed in a large window, while a number of 
       low resolution videos are displayed in smaller windows, [use-case-
       draft] provides three solutions:  

       1. The sender sends the high resolution SVC stream and the 
          receiver/server selects part or full presentation. 

       2. The sender sends both high resolution and low resolution stream, 
          and the receiver selects one to present. 

       3. The sender sends a high resolution stream, the server/receiver 
          transcodes into low/high resolution streams as required. 

       This proposal proposes to add sender media control based on the 
       determination on whether the local participant is speaking. For the 
       sender, high resolution media stream is only sent by the current 
       spokesman, and the ordinary media stream is sent by the other 
       participants. For the receiver, only to receive and present the HD 
       media stream from the spokesman while the ordinary media stream from 
       the others in the conference. The consumption of both the sender and 
       receiver thus is further reduced in this way. 

    2. Problem Statement 

    2.1. Existing Proposals in [use-case-draft] 

       [use-case-draft] provides three solutions to the window resizing 
       requirement for multiparty conferencing scenarios. Despite of their 
       differences elaborated in the following, they share a common nature 
       that they rely on the receiver to do the trick, while the sender 
       blindly offers HD media all the time. 

    2.1.1. Receiver-harvested SVC-stream (SVC scheme)  

       During a video conference, the sender uses the SVC coding, sends SVC 
       HD media streams (base layer plus extended layer) and the receiver / 
     
     
    Peng, et al.           Expires January 3, 2013                [Page 3] 
        








    Internet-Draft               svc-control                      July 2012 
        

       server selects the presentation based on the requirements. For 
       example, present the media stream from the current conference 
       spokesman in a high resolution full window and show part media 
       stream (only base layer) from ordinary participants in small windows. 

    2.1.2. Receiver-selected multi-stream (Multi-stream scheme) 

       During a video conference, the sender sends both high resolution and 
       low resolution media stream, the receiver choose one to present in 
       the local browser according to requirements. For example, the 
       receiver presents the high resolution media stream from the current 
       conference spokesman and shows low resolution media stream (only 
       base layer) from ordinary participants. 

    2.1.3. Receiver-transcoded HD stream (transcoding scheme) 

       During a video conference, the sender sends high resolution media 
       stream, and the server /receiver transcodes as requirement. For 
       example, the receiver transcodes the received media stream from 
       ordinary participants into low media stream and then present the 
       stream in local browser. 

    2.2. Analysis of existing proposals 

       We can analyze the existing three schemes mentioned above from the 
       two dimensions of the flow transmission overhead and local 
       processing consume. 

       Firstly, from the perspective of high resolution traffic 
       transmission consume, multi-stream seems the last thing to do. In 
       terms of the SVC and transcoding schemes, whether the user is 
       speaking, his local browser will send the high resolution media 
       stream collected by local devices. If the number of participants in 
       the video conference based on the style of peer-to-peer interaction 
       is N, there would be 2*(N-1) high resolution media stream resource 
       consume needed in the transmission (including: the number of high-
       definition video media stream sent by the local participants is N-1; 
       the number of high-definition video media stream received by the 
       remote peers is N-1, too). While in the multi-stream scheme, 
       transmission overhead is raised for additional low resolution media 
       streams. 

       Secondly, we analyze their computation overheads from the 
       perspectives of the sender and receiver, respectively. 

       For the sender, the transcoding scheme incurs the lowest cost to 
       encoding the outgoing media stream. On the contrary, the multi-
     
     
    Peng, et al.           Expires January 3, 2013                [Page 4] 
        








    Internet-Draft               svc-control                      July 2012 
        

       stream scheme, which dictates that two versions of the local media 
       stream be encoded and sent separately, incurs the highest cost. The 
       encoding cost for a local sender to perform SVC scheme is between 
       the above two. 

       For the receiver, the transcoding scheme needs to do real-time 
       transcoding and therefore takes the highest computation overheads to 
       the receiver side (mixing server or a participating peer). In order 
       to switch from and to HD timely, a receiver in multi-stream scheme 
       needs to synchronize both high and low resolution media stream with 
       considerable consumption. While SVC scheme adjusts the processing 
       consume by overlaying / removing the extension layer sub-stream data 
       with a minimum extra cost.  

    3. Enhanced SVC scheme with sender control 

    3.1. Overview of the core idea 

       The core idea of this proposal is: 

       o Both ends use SVC, negotiating with the other side to establish 
          peer-to-peer media stream. 

       o The sender's browser detects local user call status by calling 
          some local devices, such as to detect the sustained voice input. 

       o The browser adjusts sending policy of different coding levels 
          according to the results of the detected local status: 

            o If the local user is the spokesman, send base layer and 
              extended layer media stream. 

            o If the local user is in a quiescent state, just send the base 
              layer media stream. 

       o The receiver's browser has the ability to call the local devices, 
          and presents the decoded input media stream according to SVC. 

    3.2. A simple example 

       In the multi-party video conferencing session (For example, in the 
       illustrative example, three users participate the video 
       conferencing), the workflow as shown in the figure goes as follows:  




     
     
    Peng, et al.           Expires January 3, 2013                [Page 5] 
        








    Internet-Draft               svc-control                      July 2012 
        

                       A               B               C 

                       |               |               |  

                       |               |               | 

       A/B/C peer-to-  |<=============>|<=============>| 

       peer connection |<=============================>| 

                       |               |               | 

                       |               |               |  

         Detect the +--| Detect the +--| Detect the +--| 

         local user |  | local user |  | local user |  | 

           status   +->|   status   +->|   status   +->| 

                       |               |               | 

                +------------+  +------------+  +------------+ 

                | Continuous |  |   Slient   |  |   Slient   | 

                |speech input|  |   status   |  |   status   | 

                +------------+  +------------+  +------------+ 

                       |               |               | 

                       |               |               | 

                       | Base layer +  |               | 

                       |extended layer |               | 

                       |-------------->|               | 

                       |               |               | 

                       |  Base layer + extended layer  | 

                       |------------------------------>| 

                       |               |  Base layer   | 
     
     
    Peng, et al.           Expires January 3, 2013                [Page 6] 
        








    Internet-Draft               svc-control                      July 2012 
        

                       |               |-------------->|  

                       |               |               | 

                       |  Base layer   |               | 

                       |<--------------|               | 

                       |               |               | 

                       |               |               | 

                       |               |  Base layer   | 

                       |               |<--------------| 

                       |               |               | 

                       |          Base layer           | 

                       |<------------------------------| 

                       |               |               | 

                       |               |               |  

         Detect the +--| Detect the +--| Detect the +--| 

         local user |  | local user |  | local user |  | 

           status   +->|   status   +->|   status   +->| 

                       |               |               | 

                +------------+  +------------+  +------------+ 

                | Continuous |  |   Slient   |  |   Slient   | 

                |speech input|  |   status   |  |   status   | 

                +------------+  +------------+  +------------+ 

                       |               |               | 

                       |------------+  |------------+  |------------+ 

                       | Present A  |  | Present A  |  | Present A  | 
     
     
    Peng, et al.           Expires January 3, 2013                [Page 7] 
        








    Internet-Draft               svc-control                      July 2012 
        

                       |(Large Size)|  |(Large Size)|  |(Large Size)| 

                       |------------+  |------------+  |------------+ 

                       |------------+  |------------+  |------------+ 

                       | Present B  |  | Present B  |  | Present B  | 

                       |(Small Size)|  |(Small Size)|  |(Small Size)| 

                       |------------+  |------------+  |------------+ 

                       |------------+  |------------+  |------------+ 

                       | Present C  |  | Present C  |  | Present C  | 

                       |(Small Size)|  |(Small Size)|  |(Small Size)| 

                       |------------+  |------------+  |------------+ 

                       |               |               | 

                       |               |               |  

         Detect the +--| Detect the +--| Detect the +--| 

         local user |  | local user |  | local user |  | 

           status   +->|   status   +->|   status   +->| 

                       |               |               | 

                +------------+  +------------+  +------------+ 

                |   Slient   |  | Continuous |  |   Slient   | 

                |   status   |  |speech input|  |   status   | 

                +------------+  +------------+  +------------+ 

                       |               |               | 

                       |               |               | 

                       |  Base layer   |               | 

                       |-------------->|               | 
     
     
    Peng, et al.           Expires January 3, 2013                [Page 8] 
        








    Internet-Draft               svc-control                      July 2012 
        

                       |               |               | 

                       |          Base layer           | 

                       |------------------------------>| 

                       |               |               | 

                       |               |               | 

                       |               | Base layer +  | 

                       |               |extended layer | 

                       |               |-------------->|  

                       |               |               | 

                       | Base layer +  |               | 

                       |extended layer |               | 

                       |<--------------|               | 

                       |               |               | 

                       |               |               | 

                       |               |  Base layer   | 

                       |               |<--------------| 

                       |               |               | 

                       |          Base layer           | 

                       |<------------------------------| 

                       |               |               | 

                       |               |               |  

         Detect the +--| Detect the +--| Detect the +--| 

         local user |  | local user |  | local user |  | 

           status   +->|   status   +->|   status   +->| 
     
     
    Peng, et al.           Expires January 3, 2013                [Page 9] 
        








    Internet-Draft               svc-control                      July 2012 
        

                       |               |               | 

                +------------+  +------------+  +------------+ 

                |   Slient   |  | Continuous |  |   Slient   | 

                |   status   |  |speech input|  |   status   | 

                +------------+  +------------+  +------------+ 

                       |               |               | 

                       |------------+  |------------+  |------------+ 

                       | Present A  |  | Present A  |  | Present A  | 

                       |(Small Size)|  |(Small Size)|  |(Small Size)| 

                       |------------+  |------------+  |------------+ 

                       |------------+  |------------+  |------------+ 

                       | Present B  |  | Present B  |  | Present B  | 

                       |(Large Size)|  |(Large Size)|  |(Large Size)| 

                       |------------+  |------------+  |------------+ 

                       |------------+  |------------+  |------------+ 

                       | Present C  |  | Present C  |  | Present C  | 

                       |(Small Size)|  |(Small Size)|  |(Small Size)| 

                       |------------+  |------------+  |------------+ 

                       |               |               | 

                       |               |               | 

                    Figure 1 A B C Tripartite session flowchart 

       First of all, the session participant's browser of A, establishes a 
       peer-to-peer media stream connection with the other participants?
       browsers with SVC codec. Subsequently, the browsers utilize local 
       device's capability to monitor the local user audio input status. 
       Since A is the spokesman at beginning, A's browser detects a 
     
     
    Peng, et al.           Expires January 3, 2013               [Page 10] 
        








    Internet-Draft               svc-control                      July 2012 
        

       continuous audio input from the local user, A's browser sends both 
       the base and extended layers of SVC sub-streams to the others(B and 
       C). At the meantime, B and C, as the listeners to A, remain silent. 
       Therefore, B's local browser sends only the SVC base layer media 
       stream to the others (A and C) according to the result of silent 
       status detected by calling the local devices. Similarly C sends only 
       the base layer to A and B. 

       The spokesman changes from one to another as the conference 
       continues. For example, when A detects himself in a silent status 
       from  the local devices, A switches to send only the base layer to 
       the others (B and C). Conversely, when B detects a continuous audio 
       input from the local user, B would change to send both the base and 
       extended layers to the others (A and C). While C detects no changes 
       locally, C would continue sending base layer to A and B. 

    3.3. Extensions to the simple example 

       The above description of a sender controlled SVC transmission modes 
       based on local audio input status detection, is a simple example of 
       a more generalized sender media control scheme, where the types of 
       status transition as control triggers, the mechanism to enforce 
       media adjustment afterwards, and whether to give JS API the ability 
       to influence the media behavior, may vary accordingly. In particular, 
       a few extensions would be in terms of : 

       1. Options of implementation on send-side state detection, include: 

            a.to make use of the DTX module at voice codec level, and 
              decide the strategy of video mode switching in accordance 
              with a given strategy (Note that the strategy of video mode 
              switching (whether need for local high resolution 
              presentation) can be different from the strategy of voice 
              codec (silence / voice signals), 

            b.to defer user presence status indirectly, by monitoring 
              whether the local audio input is muted from the web page, or 
              whether the conferencing page is currently active, etc. 

       2. Options of implementation on send-side media control, include:  

            a.to adjust the definition of the local camera, or 

            b.to adjust the bitrate of a given codec in use (the layer 
              composition in case of SVC codecs, for instance), or 


     
     
    Peng, et al.           Expires January 3, 2013               [Page 11] 
        








    Internet-Draft               svc-control                      July 2012 
        

            c.to trigger a re-negotiation for a new codec instead of the 
              one in use. 

       3. Options of implementation on clients include: 

            a.to enforce the transition detection and conduct media control 
              singly by the local browser, or 

            b.the browser offers related callback APIs for both defined 
              state transition events and media controlling to JS, who may 
              provide SP's entailed control behavior.  

    3.4. Applicable Scenarios 

       For simplicity, we used the fully distributed scenario as an example 
       to elaborate our proposal. It should be noticed that the proposed 
       scheme is also applicable and would bring benefit to a centralized 
       mixer-based conferencing setting.  

       In a fully distributed P2P conference, without any centralized 
       mixing server at the media plane, each participating node processes 
       the modulation of video stream locally. As mentioned above, the 
       application of our proposal would reduce the transmission overhead 
       of both send-side nodes and receive-side nodes, also reduces the 
       processing and modulation overhead of the receive-side nodes. 

       While in a mixing server-based conference, the dedicated mixer 
       server receives the participants?local captured video stream, and 
       renders a uniform rendered collective screen for display. The 
       application of our proposal will significantly reduce the 
       transmission overhead for uploading local video to the server, thus 
       saving its bandwidth as a centralized sink. 

    4. Derived Requirements 

       In our proposal, video conferencing client detects the local user 
       session state (speaking/silence), achieving the purpose of saving 
       media plane transmission overhead by adjusting the sending rate of 
       video media stream. In order to realize it in a RTCWEB setting, 
       additional requirements are derived as follows: 

       1. Function Requirement for Browser 

       Fxx: The browser SHOULD be able to detect the audio input status 
       (speaking/ silent) of the local user. 

       2. API Requirements for Browser 
     
     
    Peng, et al.           Expires January 3, 2013               [Page 12] 
        








    Internet-Draft               svc-control                      July 2012 
        

       Axx: It SHOULD be possible for the JS to be notified about the audio 
       input status (speaking/silent) of the local user, and to entail the 
       media control behavior in response. 

        

    5. Security Considerations 

       TBA 

    6. IANA Considerations 

       None. 

    7. References 

    7.1. Normative References 

       [1]  Bradner, S., "Key words for use in RFCs to Indicate 
             Requirement Levels", BCP 14, RFC 2119, March 1997. 

       [2]  Crocker, D. and Overell, P.(Editors), "Augmented BNF for 
             Syntax Specifications: ABNF", RFC 2234, Internet Mail 
             Consortium and Demon Internet Ltd., November 1997. 

       [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 
                 Requirement Levels", BCP 14, RFC 2119, March 1997. 

       [RFC2234] Crocker, D. and Overell, P.(Editors), "Augmented BNF for 
                 Syntax Specifications: ABNF", RFC 2234, Internet Mail 
                 Consortium and Demon Internet Ltd., November 1997. 

    7.2. Informative References 

       [use-case-draft] Holmberg, C., Hakansson, S. and Eriksson, G., "Web 
                        Real-Time Communication Use-cases and Requirements", 
                        draft-ietf-rtcweb-use-cases-and-requirements-09 
                        (work in progress), June 27, 2012 








     
     
    Peng, et al.           Expires January 3, 2013               [Page 13] 
        








    Internet-Draft               svc-control                      July 2012 
        

    Authors' Addresses 

       Lingli Deng 

       China Mobile 

       Email: denglingli@chinamobile.com 

        

       Jin Peng 

       China Mobile 

       Email: pengjin@chinamobile.com 

        





























     
     
    Peng, et al.           Expires January 3, 2013               [Page 14]