Internet DRAFT - draft-huang-mmusic-multiview

draft-huang-mmusic-multiview



 



INTERNET-DRAFT                                                  R. Huang
Intended Status: Standard Track                                   Huawei
                                                        October 19, 2015


             Multi-view streams in SDP and RTP Sessions   
                    draft-huang-mmusic-multiview-00


Abstract

   This document analyses the streaming options of multi-view
   applications,and describes the required SDP signaling for them.

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as
   Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/1id-abstracts.html

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html


Copyright and License Notice

   Copyright (c) 2015 IETF Trust and the persons identified as the
   document authors. All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document. Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
 


<R. Huang>               Expires April 21, 2016                 [Page 1]

INTERNET DRAFT            <Multi-view in SDP>           October 19, 2015


   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.



Table of Contents

   1  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2  Terminology . . . . . . . . . . . . . . . . . . . . . . . . . .  3
   3.  Use Cases  . . . . . . . . . . . . . . . . . . . . . . . . . .  3
     3.1. 3D Channel of IPTV  . . . . . . . . . . . . . . . . . . . .  4
     3.2 Multi-view video conference  . . . . . . . . . . . . . . . .  4
       3.2.3 Tele-education with Free Viewpoint Video . . . . . . . .  4
       3.2.4 Immersive Telepresence with Multi-view Video . . . . . .  4
   4.  Multi-view Video Transmission  . . . . . . . . . . . . . . . .  4
   5. SDP Signaling Requirements for Multi-view Video . . . . . . . .  5
   6.  Gap Analysis . . . . . . . . . . . . . . . . . . . . . . . . .  6
     6.1. RFC5583 . . . . . . . . . . . . . . . . . . . . . . . . . .  6
     6.2. Simulcast . . . . . . . . . . . . . . . . . . . . . . . . .  6
     6.3. RID . . . . . . . . . . . . . . . . . . . . . . . . . . . .  6
     6.4. CLUE  . . . . . . . . . . . . . . . . . . . . . . . . . . .  7
     6.5. 3D Signaling  . . . . . . . . . . . . . . . . . . . . . . .  7
   7.  Possible Solutions . . . . . . . . . . . . . . . . . . . . . .  7
   8.  Security Considerations  . . . . . . . . . . . . . . . . . . .  7
   9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . .  7
   10.  Acknowledgments . . . . . . . . . . . . . . . . . . . . . . .  7
   11.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  7
     11.1  Normative References . . . . . . . . . . . . . . . . . . .  8
     11.2  Informative References . . . . . . . . . . . . . . . . . .  8
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . .  9


















 


<R. Huang>               Expires April 21, 2016                 [Page 2]

INTERNET DRAFT            <Multi-view in SDP>           October 19, 2015


1  Introduction

   Multi-view video consists of multiple views that are taken by
   multiple cameras from different positions and angles. 3D video and
   free viewpoint video are two typical use cases. The first offers a 3D
   depth impression of the observed scenery, while the second allows for
   interactive selection of viewpoint and direction within a certain
   operating range as known from computer graphics. Streaming of such
   multi-view applications on Internet is usually offered at varying
   speeds and costs over a variety of physical infrastructures. However,
   since the multi-view video consists of multiple video sequences, the
   traffic is several times larger than traditional multimedia, which
   brings the dramatic increase in the bandwidth requirement. For
   streaming of multi-view representations two operating options are
   usually used: streaming all views in a highly compressed MVC bit-
   stream with little possibility of random access or encoding all views
   independently and streaming only required views. 

   This document analyses the streaming options of multi-view
   applications,and describes the required SDP signaling for them.

2  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

   This document uses the following terms:

   Multi-view Video Compression: Using efficient compression techniques
   to transmit multi-view video. Usually, inter-view similarity between
   adjacent views and temporal similarity between temporally successive
   images of each video are exploited.

   MVC (Multiview Video Coding): An extension of the AVC standard,
   standardised in 2009, covers a wide range of 3D video applications
   including 3D video streaming, free-viewpoint video. It is inherently
   backward compatible with AVC which serves as the base-view that can
   be decoded independently in the absence of the MVC decoder. Any
   additional views are referred to as enhancement views and are
   typically coded using interview prediction within the same bitstream.

3.  Use Cases

   Multi-view video communications can be applied on a wide variety of
   use cases. In this section, several of the most likely usage
   scenarios are introduced.

 


<R. Huang>               Expires April 21, 2016                 [Page 3]

INTERNET DRAFT            <Multi-view in SDP>           October 19, 2015


3.1. 3D Channel of IPTV

   The multi-view video content is sent to a group of end users by
   multicast, which is widely applied in 3D channels of IPTV. Different
   views can be separately transmitted in different multicast group or
   just be transmitted in one multicast group in MVC. 

3.2 Multi-view video conference

   Two or more users engage in a remote video conversation using mobile
   or desktop terminals, which have stereoscopic capture and display
   capabilities. The multi-view video stream requires the transmission
   of 2 or more views or, alternately, of one view and a depth map. The
   views or depth maps may be transmitted as separate transport streams,
   or together,depending on the choice of multi-view video
   transmissions. In the case of a multiparty session, an intermediate
   media server should be used for mixing. Video mixing for the multi-
   view video stream needs to be aware of the additional views or depth
   maps.

3.2.3 Tele-education with Free Viewpoint Video

   A teacher intends to give a real-time lecture to students in one or
   more remote sites. The teacher's site is equipped with a multi-view
   capture setup, and has a 2D display to show feedback from the
   students, who are equipped with regular 2D cameras and displays.
   Multi-view video content is transmitted so that the student sites are
   capable of selecting perspectives of the captured scene within a
   certain operating range, and rendering them in real time, thus some
   signaling interaction between students and the teacher may be
   required.

3.2.4 Immersive Telepresence with Multi-view Video

   A group of users wants to conduct a meeting through a telepresence
   system, connecting to the session from several sites, and with each
   terminal supporting multiple users. Participants are arranged around
   a shared virtual table, and each remote participant is shown in a
   separate screen, which is autostereoscopic and displays two different
   views for each user in the room. At each site, several
   autostereoscopic displays and a multi-view capture setup are
   deployed. The viewpoint of these views is adjusted during the session
   so as to match the position of observing users. The use of multi-view
   video makes telepresence systems having perfect eye contact and
   spatial faithfulness. 

4.  Multi-view Video Transmission

 


<R. Huang>               Expires April 21, 2016                 [Page 4]

INTERNET DRAFT            <Multi-view in SDP>           October 19, 2015


   Multi-view video, which has two or more views, increases the encoding
   complexity and bandwidth requirement for transmission. There are
   several ways that can be used to transmit these views:

      * Multi-view simulcast: encode each view and/or depth map
      independently using a monocular video codec, which enables
      streaming each view over separate channels; and clients can
      requests as many views as their displays require without worrying
      about inter-view dependencies. While this method has its benefits,
      it does not exploit the redundancies that are preset in between
      the views.

      * Multi-view video compression: encode views using specific
      technique to decrease the overall bit rate by exploiting the
      inter-view redundancies. However, although it exploits the
      similarities that are present between the views, it increases the
      effect of transmission errors. And the inter-view efficient
      compression techniques make the views depend on each others. In
      order to decode a frame correctly, the frames it depend on must be
      decoded at first, which will bring more unnecessary transmission
      and delay if the views should be displayed is far away from the
      reference view. MVC is a typical and often used video compression
      format applied to multi-view videos. 

      * Adaptive Multi-view video transmission: Adaptive streaming of
      multi-view video is also considered when functionalities such as
      rate scalability, resolution scalability, view scalability, and
      packet-loss resilience should be offered. In such a case,
      simulcast or Multi-view video compression will be used together
      with adaptive streaming techniques, e.g., SVC.

      * Combination transmission: Sending multi-view video by using a
      combination of simulcast and multi-view video compression
      techniques may be also a good option for some specific scenarios.
      For example, if one multi-view video compression technique works
      well for closely related views but not for widely differing views,
      sending several multi-view video compression streams
      simultaneously can solve the problem.

5. SDP Signaling Requirements for Multi-view Video

   The following requirements need to be met to support streaming multi-
   view video in previous sections:

   REQ-1: It must be possible to signal whether multi-view simulcast or
   multi-view video compression is used.

   REQ-2: It must be possible to signal adaptive multi-view video
 


<R. Huang>               Expires April 21, 2016                 [Page 5]

INTERNET DRAFT            <Multi-view in SDP>           October 19, 2015


   transmission, e.g., multi-view video simulcast is used together with
   adaptive simulcast.

   REQ-3: It must be possible to signal combination transmission where
   multi-view video compression is used together with simulcast.

   REQ-4: Bundled [I-D.ietf-mmusic-sdp-bundle-negotiation] usage must be
   considered.

   REQ-5: It must be possible to signal multi-view video related decoder
   constraints, e.g., maximum number of view streams that can be
   provided at the sender and maximum number of view streams that can be
   received at the receiver.

   REQ-6: It must be possible to support both declarative SDP and SDP
   offer/answer.

   REQ-7: When multi-view simulcast is used, it must be possible to have
   some ways to allow receivers ask for required view streams that they
   wish to receive.

   REQ-8: It must be compatible with existing other mechanisms, e.g.,
   RTP retransmission [RFC4588], Forward Error Correction [RFC5109].


6.  Gap Analysis

6.1. RFC5583

   This specification defines a SDP mechanism to signaling the decoding
   dependency of different media descriptions with the same media type
   [RFC5583]. It can be used to signal the use of SVC or MVC. However,
   it cannot differentiate the usages between multi-view simulcast and
   MVC, not mention when multi-view transmission is used together with
   simulcast or SVC.

6.2. Simulcast

   [I.d-ietf-mmusic-sdp-simulcat] describes simulcast as the scenarios
   where sending multiple differently encoded versions of the same media
   source in different RTP streams. It is mainly used for the same video
   source encoded with different video encoder types or image
   resolutions. It still can not deal with the case when multi-view
   transmission is used together with simulcast or SVC.

6.3. RID

   [I.d-pthatcher-mmusic-rid] defines a framework to identify Source RTP
 


<R. Huang>               Expires April 21, 2016                 [Page 6]

INTERNET DRAFT            <Multi-view in SDP>           October 19, 2015


   streams with constraints on its payload format in SDP. It can
   effectively identify the source RTP stream within a RTP session,
   which is quite useful for simulcast. Thus, it may be also helpful for
   multi-view video signaling. It need to be considered further for
   potential problems and issues. 

6.4. CLUE

   CLUE is dedicated for telepresence systems which provide high
   definition, high quality audio/video enabling a "being-there"
   experience. It involves multiple devices like multiple cameras,
   displays, microphones, and loudspeakers. It specifies spatial
   relationship of these devices, viewpoint, field of view/capture for
   these devices, and related information [I.d-ietf-clue-signal].
   However, the usages of 3D or free view point video in CLUE are not
   considered in current CLUE scope. Thus, the supporting multi-view
   view of CLUE needs to be considered further.

6.5. 3D Signaling

   There were some work in MMUSIC to propose some 3D signaling
   solutions. [I.d-greevenbosch-mmusic-signal-3d-format] and [I.d-
   greevenbosch-mmusic-sdp-parallax] introduce new SDP attributes to
   provide format description and depth position signaling in 3D
   applications. [I.d-capelastegui-mmusic-3dv-sdp] introduces a
   mechanism to describe 3D video streams composed of multiple video
   video views, or of a combination of views and depth maps. They are
   not directly multi-view, but should be evaluated when proposing
   possible solutions.

7.  Possible Solutions

   TBD.

8.  Security Considerations

   TBD.


9. IANA Considerations

   TBD.

10.  Acknowledgments


11.  References

 


<R. Huang>               Expires April 21, 2016                 [Page 7]

INTERNET DRAFT            <Multi-view in SDP>           October 19, 2015


11.1  Normative References


   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC4588]  Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R.,
              Hakenberg, "RTP Retransmission Payload Format", RFC 4588,
              July 2006.

   [RFC5109]  Li, A., Ed., "RTP Payload Format for Generic Forward Error
              Correction", RFC 5109, December  2007.

   [I-D.ietf-mmusic-sdp-bundle-negotiation] Holmberg, C., Alvestrand,
              H., and C. Jennings, "Negotiating Media Multiplexing Using
              the Session Description Protocol (SDP)", draft-ietf-
              mmusic-sdp-bundle-negotiation-23 (work in progress), July
              2015.

   [I.d-ietf-mmusic-sdp-simulcat] Burman, B., Westerlund, M.,
              Nandakumar, S., and M. Zanaty, "Using Simulcast in SDP and
              RTP Sessions", draft-ietf-mmusic-sdp-simulcast-02 (work in
              progress), October 6, 2015.

   [I.d-pthatcher-mmusic-rid] Thatcher, P., Zanaty, M., Nandakuma, S.,
              Roach, A., Burman, B., and B. Campen, "RTP Payload Format
              Constraints", draft-pthatcher-mmusic-rid-01 (work in
              progress), October 2015.

   [I.d-ietf-clue-signal] Kyzivat, P., Xiao, L., Groves, C., and R.,
              Hansen, "CLUE Signaling", draft-ietf-clue-signaling-06,
              August 2015.

   [I.d-greevenbosch-mmusic-signal-3d-format] Greevenbosch, B., and Y.,
              Hui, "Signal 3D format", draft-greevenbosch-mmusic-sdp-3d-
              format-01, October 2012. 

   [I.d-greevenbosch-mmusic-sdp-parallax] Greevenbosch, B., and Y., Hui,
              "SDP attribute to signal parallax", draft-greevenbosch-
              mmusic-sdp-parallax-01, October 2012.

   [I.d-capelastegui-mmusic-3dv-sdp] Capelastegui, P., "3D Video in the
              Session Description Protocol (SDP)", draft-capelastegui-
              mmusic-3dv-sdp-00, April 2012.

11.2  Informative References


 


<R. Huang>               Expires April 21, 2016                 [Page 8]

INTERNET DRAFT            <Multi-view in SDP>           October 19, 2015


Authors' Addresses


   Rachel Huang
   Huawei
   101 Software Avenue
   Nanjing, China

   EMail: rachel.huang@huawei.com










































<R. Huang>               Expires April 21, 2016                 [Page 9]