Internet Engineering Task Force Harri Honko INTERNET-DRAFT Petri Koskelainen Jouni Salonen Expires: Dec 29, 1998 Nokia June 29,1998 SDP syntax for H.263 options STATUS OF THIS MEMO This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." To view the entire list of current Internet-Drafts, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). ABSTRACT This document defines the SDP syntax for 1998 version of H.263 codec options and parameters for IETF multimedia conferencing architecture. It is often useful to know beforehand (in call setup phase) what features of H.263 the other end supports. This document specifies the format spefic parameters for H.263, which exists in a=fmtp: as defined in the SDP document. Moreover, the usage of this feature with SIP and SAP is specified. 1. INTRODUCTION Internet multimedia conferencing is becaming a reality and new protocols have been defined to provide co-operation between different applications. Current IETF protocols SIP [3] and SAP [4] are quite widely used to simple MBone conferencing and Internet unicast conferencing is also taking its first steps. Today, the most widely used video codec in this environment is the ITU-T H.261 video standard, but a better and more efficient low-bit rate video standard, ITU-T H.263 [1] (aka H.263+ and H.263v2) is becoming more and more popular. However, the usage of H.263 with current IETF multimedia conferencing architecture is difficult, since improved efficiency of H.263 depends on use of coding options and parameters which must be told somehow to the other end. SIP and SAP protocols can deliver this information, but current SDP [2] format does not specify standardised way of representing these options and parameters. This document specifies the SDP syntax which describes H.263 options and parameters to be delivered by some signaling protocol (e.g. SIP or SAP). The syntax presented in this document uses SDP "a=fmtp" attribute which is meant for carrying codec specific parameters. These options are especially useful in SIP. In SAP it is not clear whether they have any use. In SIP, these format specific parameters are decoder properties or wishes, and in SAP they informative announcement about forthcoming video options and parameters. The SAP rules are applied also if a multicast session is advertised in a web page in SDP format. The syntax in this document is text-based and uses the ISO 10646 character set in UTF-8 encoding (RFC 2279 [5]). This syntax includes one line and it is terminated by CRLF. Syntax is described in an augmented Backus-Naur form (BNF) described in detail in RFC 2234 [6]. 2. H.263 CODEC PARAMETERS For further description about these H.263 parameters, please refer to ITU documents [1]. The context in which SDP syntax for H.263 is represented is SDP "a" attribute: a=fmtp 34 SDP_263_syntax "34" is RTP payload number for H.263. The syntax itself is defined as follows: SDP_263_syntax = *capability CRLF | request CRLF | CRLF capability = 1*Size SP | 1*Params SP | 1*Options SP Here as usual, SP means space and *term means that zero or more instances of term may be present. 1*term means that one or more instances of term may appear. SDP_263_syntax may be either capability announcement, intra request during video session or empty line when having only CRLF (end of line). Request is explained in chapter 4. 2.1 Picture information and MPI H.263 bitstream supports many picture sizes and different frame rates. Supported picture sizes and their corresponding minimum picture interval (MPI) values are combined with "=" symbol. MPI is an integer value (1..32) and it means that maximum picture (frame) rate is (29.97/MPI) frames/sec. Bigger MPI value means slower picture rate. Augmented BNF syntax for size related H.263 parameters is given below. Size= "SQCIF" "=" mpi | "QCIF" "=" mpi | "CIF" "=" mpi | "CIF4" "=" mpi | "CIF16" "=" mpi | "XMAX" "=" xmax SP "YMAX" "=" ymax SP "MPI" "=" mpi mpi=1*2DIGIT xmax=1*3DIGIT ymax=1*3DIGIT Explanations: SQCIF: Sub-QCIF picture size and its MPI value. MPI is integer value between 1 and 32. QCIF: QCIF picture size and its MPI value (1..32). CIF: CIF picture size and its MPI value (1..32). CIF4: CIF4 picture size and its MPI value (1..32). CIF16: CIF16 picture size and its MPI value (1..32). Arbitrary picture sizes (XMAX,YMAX,MPI): These parameters mean that terminal is capable and/or willing to support arbitrary picture sizes. Both picture sizes must be divisible by 4. These X and Y values are the maximum of allowed picture sizes with corresponding MPI value. All three words must exist together in this order, or none is allowed to exist. More than one mode can exist in the same line (see example later). At least one picture mode should be present. 2.2 Other parameters Params= "PAR" "=" par_a ":" par_b | "CPCF" "=" cpcf | "MaxBR" "=" maxbr | "BPP" "=" bpp | "HRD" "=" hrd par_a=1*DIGIT par_b=1*DIGIT cpcf=1*DIGIT "." 1*DIGIT maxbr=1*DIGIT bpp=1*DIGIT Explanations: Arbitrary Pixel Aspect Ratio (PAR): Par_a and par_b are integers between 0 and 255. Default ratio is 12:11 if not otherwise specified. Arbitrary (Custom) Picture Clock Frequency (CPCF): Cpcf is floating point value. Defaut value is 29.97. MaxBitRate (MaxBR): Maximum video stream bitrate, presented with units of 100 bits/s. MaxBR value is an integer between 1..19200. BitsPerPictureMaxKb (BPP): Maximum amount of kilobits allowed to represent a single picture frame, value is specified by largest supported picture resolution, see [1]. If this parameter is not present, then default value, that is based on the maximum supported resolution, is used. BPP is integer value between 0 and 65536. Hypothetical Reference Decoder (HRD): See annex B of H.263 specification. These parameters are separated by space. 3. H.263 CODEC OPTIONS These options are presented by the character of the annex which describes the corresponding feature. Syntax for options is given below: Options = "D" "=" #annex_d | "E" | "F" | "G" | "I" | "J" | "K" "=" #annex_k | "L" "=" #annex_l | "M" | "N" "=" annex_n | "O" "=" #annex_o | "P" | "Q" | "R" | "S" | "T" annex_d= "1" | "2" annex_k= "1" | "2" | "3" | "4" annex_l= "1" | "2" | "3" | "4" | "5" | "6" | "7" annex_n= "1" | "2" | "3" | "4" annex_o= "1" | "2" | "3" Here #term means comma separated list of term. 3.1 H.263v1 Options Annex D: UnRestricted motion Vector option. D is 1 and/or 2 see later in chapter 3.2. Annex E: Syntax based Arithmetic Coding. It is expected that this annex is not widely used because of the complexity and IPR-issues. Annex F: Advanced Prediction. Annex G: PB Frames. 3.2 H.263v2 Options Below are the new codec options which are defined in 1998 version of H.263. Annex D: UnRestricted motion Vector. This mode is applied differently depending whether H.263v1 or H.263v2 is used. 1 means that H.263v1 is used and annex D is applied correspondingly. 2 means that H.263v2 is used. Terminal can support both modes. Example: D=1,2 Annex I: Advanced Intra Coding mode. Annex J: Deblocking Filter mode Annex K: Slice Structured mode This mode can have up to four possible modes. It can support none, some or all of these modes. The modes are as follows: 1: slicesInOrder-NonRect 2: slicesInOrder-Rect 3: slicesNoOrder-NonRect 4: slicesNoOrder-Rect Example: K=1,2,4 Annex L: Supplementary Enhancement Information Specification This mode has several features. The possibly supported features are: 1: full picture freeze 2: partial picture freeze and release 3: resizing partial picture freeze 4: full picture snapshot 5: partial picture snapshot 6: video segment tagging 7: progressive refinement Example: L=1,6 Annex M: Improved PB-frames mode Annex N: Reference Picture Selection mode Four choices (modes) is available: 1: NEITHER: In which no back-channel data is returned from the decoder to the encoder, 2: ACK: In which the decoder returns only acknowledgment messages, 3: NACK: In which the decoder returns only non-acknowledgment messages, and 4: ACK+NACK: In which the decoder returns both acknowledgment and non-acknowledgment messages. Example: N=2 For further details, see [1]. Annex O: Temporal, SNR, and Spatial Scalability mode This annex has three possible choices (temporal, SNR, spatial). The mode is represented by integer number which refers to temporal, SNR or spatial scalability, respectively. These three submodes can be available at the same time. Example: O=2,3 Annex P: Reference Picture Resampling Explanation: Following submodes are available: 1: dynamicPictureResizingByFour 2: dynamicPictureResizingBySixteenthPel 3: dynamicWarpingHalfPel 4: dynamicWarpingSixteenthPel Example: P=1,3 See [1] for further details. Annex Q: Reduced-Resolution Update mode Annex R: Independent Segment Decoding mode Annex S: Alternative INTER VLC mode Annex T: Modified Quantization mode The presence of annex character indicates the availability of that feature. The actual interpretation of these words is defined differently for one-way and two-way negotiations (see chapters 5 and 6). Note that some annexes can not be used together. H.263 specification gives detailed rules for this issues and it should be followed. 4. H.263 REQUEST FEATURES IN SDP It is sometimes useful to request intra update from other end during video session. This update might be intra picture update or GOP (Group of Blocks) update. request= intra | gob intra= "I-UPDATE" Explanation: Intra picture should be sent as soon as possible. This must NOT be present in the capability exchange phase. It must be alone in the a=fmtp line when it is used (during video session). gob= "GOB-UPDATE" "=" First "," Amount First=1*DIGIT Amount=1*DIGIT Explanation: Group of Blocks are requested and they should be sent as soon as possible. "First" is the first GOB to be requested and "Amount" is the number of requested GOBs. This must NOT be present in the capability exchange phase. It must be alone in the a=fmtp line when it is used (during video session). Example: a=fmtp 34 GOP-UPDATE=1,3 Here first 3 GOPs are updated. When receiving this, video encoder should send intra picture update as soon as possible. This feature is signaled only during video session, not before it. 5. USAGE OF SDP H.263 OPTIONS AND PARAMETERS WITH SIP This document does not specify actual SIP signaling. There is no need to negotiation because it is quite needless since it is enough to tell the preferred decoder options and parameters and let the other end to decide actual options and parameters as long as the other party sends only such options and parameters which are advertized. This scheme keeps the IETF multimedia terminal simple. This syntax should be sent with SIP INVITE and corresponding status response (200 ok). Codec options: (D,E,F,G,I,J,K,L,M,N,O,P,Q,R,S,T) These characters exist only if the sender of this SDP message is able or willing to decode those. E.g. If a terminal is capable of decoding SAC and AP options, it can put E F in the end of . Then the other party knows it, and can use those options in its encoder. Picture sizes and MPI: Supported picture sizes and their corresponding minimum picture interval (MPI) information can be combined. All picture sizes can be advertised to other party, or only some subset of it. Terminal announces only those picture sizes (with their MPIs) which it is willing to receive. For example, MPI=2 means that maximum (decodeable) picture rate per sec is about 15. Parameter occurring first is the most preferred picture mode to be received, and last is the least preferred (but still supported) one. These words are present in SDP line only if terminal is willing to decode the picture size with corresponding MPI. If terminal is not willing to receive some mode, it is not present in the list. Example of the usage of these words: CIF=4 QCIF=3 SQCIF=2 XMAX=360 YMAX=240 MPI=2 This means that sender hopes to receive CIF picture size, which it can decode at MPI=4. If that is not possible, then QCIF with MPI value 3, if that is neither possible, then SQCIF with MPI value =2. It is also allowed (but least preferred) to send arbitrary picture sizes (max 360x240) with MPI=2. Note that most encoders support at least QCIF and CIF fixed resolutions and they are expected to be available almost in every H.263-based video application. MaxBR and BPP parameters: Both these parameters are useful in SIP. MaxBitRate is video decoder property, hence it differs from SDP b: attribute which refers more to application's total bandwidth (an application consists often of both audio and video). BitsPerPictureMaxKb is needed especially for decoder buffer size estimation to reduce the propability of video buffer overflow. Also PAR, CPCF and HRD parameters can be applied in SIP. Below is an example of H.263 SDP syntax in SIP message. a=fmtp 34 CIF=4 QCIF=2/MaxBR=1000/E F This means that the sender of this message can decode H.263 (RTP payload type 34) bitstream with following options and parameters: Preferred resolution is CIF (its MPI is 4), but if that is not possible then QCIF size is ok. Maximum receivable bitrate is 100 kbit/s (1000*100 bit/s) and SAC and AP options can be used. Intra picture update can be provided during session with new SIP INVITE message with same Call-id like this: ... m=video 9999 RTP/AVP 34 a=fmtp 34 I-UPDATE .... 6. USE OF SDP H.263 SYNTAX AND OPTIONS WITH SAP SAP announcements are one-way only. All H.263 options can be used to mean that sending terminal is going to use these options in its transmitted H.263 stream. It is just an informal message. Usually only one picture size (with its MPI) exists. However, since it is possible for a video source (terminal) to change its picture size during session, several picture sizes can exist in the parameter list. First one is the original picture size to be used in the beginning of the session. Intra requests MUST NOT be used with SAP. Example with SAP: a=fmtp 34 CIF=2/D=1 F The video source is sending an H.263 bitstream and picture size is CIF, MPI=2 and URV and SAC options are used. This kind of announcement can be used e.g. in the MBone. 7. SECURITY CONSIDERATIONS Security is truly believed to be irrelevant to this document. 8. CHANGES FROM DRAFT-01: - Syntax is represented with augmented BNF - H.263 option acronyms are changed to annex characters (e.g. "SAC"->"E") - some parameter names are now shorter (e.g. BitsPerPictureMaxKb -> BPP) in order to shorten SDP description and to simplify it - new parameters as suggested by Gary Sullivan: HRD, PAR, CPCF - intra updates are now possible during video session (with new SIP INVITE-message) - MaxBR is no longer mandatory plus many similar minor changes - H.263v2 options are added. Some of these have parameters (e.g. annex O has three possible values (temporal/SNR/spatial scalability) so the syntax is e.g: O=2,3 9. OPEN QUESTIONS The use of profiles (like in MPEG2 etc) to simplify the most common group of options. - this seems unnecessary.. AUTHORS' ADDRESSES Harri Honko, Petri Koskelainen Nokia Research Center P.O.Box 100 FIN-33721 Tampere Finland e-mail: harri.honko@research.nokia.com, petri.koskelainen@research.nokia.com, jouni.salonen@ntc.nokia.com References: [1] International Telecommunication Union. Video Coding for Low Bitrate Communication, ITU-T Recommendation H.263, ITU 1998 [2] Handley et al, ``SDP - Session Description Protocol'', RFC2327, Internet Engineering Task Force, 1998. [3] Handley et al, ``SIP - Session Initiation Protocol'', INTERNET-DRAFT, Internet Engineering Task Force, 1998. [4] ``SAP - Session Announcement Protocol'', INTERNET-DRAFT, Internet Engineering Task Force, 1997. [5] F. Yergeau, "UTF-8, a transformation format of ISO 10646," RFC 2279, Internet Engineering Task Force, Jan. 1998. [6] D. Crocker and P. Overell, "Augmented BNF for syntax specifications: ABNF," RFC 2234, Internet Engineering Task Force, Nov. 1997.