Internet DRAFT - draft-foster-mmusic-vbdformat


Internet Engineering Task Force                                B. Foster  
Internet Draft                                                  R. Kumar  
Document: <draft-foster-mmusic-vbdformat-01.txt>            F. Andreasen  
Category: Informational                                    Cisco Systems  
Expires: September 1, 2002                                  March 1 2002  
                      Voice-Band Data Media Format  
Status of this Document  
This document is an Internet-Draft and is in full conformance with  
all provisions of Section 10 of RFC2026  
Internet-Drafts are working documents of the Internet Engineering  
Task Force (IETF), its areas, and its working groups. Note that other  
groups may also distribute working documents as Internet- Drafts.  
Internet-Drafts are draft documents valid for a maximum of six months  
and may be updated, replaced, or obsoleted by other documents at any  
time. It is inappropriate to use Internet- Drafts as reference  
material or to cite them other than as work in progress."  
The list of current Internet-Drafts can be accessed at  
The list of Internet-Draft Shadow Directories can be accessed at  
1. Abstract  
Voice-band data (fax and modem) traffic can often require different processing 
and as such, the ability to specify a different payload type when passing this 
type of traffic is important. This document defines a MIME type, audio/vbd for 
voiceband data media, and a specific "fmtp" parameter for specifying the 
underlying encoding.  
2. Conventions used in this document  
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", 
"SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this document are to be 
interpreted as described in RFC-2119.  
3. Introduction  
There are a number of ways of passing modem and fax traffic over an  
IP network. One approach is to simply pass it in-band. Other  
approaches involve terminating the fax/modem at each end and relaying  
the data in some fashion. Either approach may be valid depending on  
Foster, et al                Informational                          1 
                      Voice-Band Data Media Format             March 2002 
the processing capability of the gateway, characteristics of the  
network etc.   
This document is specifically concerned with the approach of passing  
modem and fax traffic in-band. Because voice-band data has  
distinctly different characteristics from voice, it is often  
important to be able to distinguish this difference by indicating an  
associated media format. This allows the receiver of the media to  
process the packets differently.  
4. Rationale for distinct Voiceband Data payload types 
The rationale for distinguishing between a payload type associated with voice 
and a payload type associated with voiceband data is twofold: 
* At the receiver, voiceband data traffic is found to work best with fixed-size 
jitter buffers, while adaptive jitter buffers are optimal for voice. 
* Packet loss concealment algorithms are the receiver are suitable for voice, 
but not for voiceband data. 
For discrimination between voice and voiceband data and to allow different 
processing at a receiver, separate payload types must be used even if the 
underlying encoding is the same e.g. PCMU for voice and voiceband data. To this 
end, a new RTP audio encoding name, to be registered as the MIME type audio/vbd 
is defined. For a session, this encoding name could be dynamically mapped into 
one or more payload types; this is true for any encoding. Each payload type 
associated with the encoding "vbd" can have a separate format, specified through 
a 'fmtp' attribute, indicating a different underlying base encoding (e.g. PCMU, 
PCMA, G726-32, G726-40).  
This document proposes the use of dynamic payload types for voiceband data that 
are distinct from the payload types, static or dynamic, for voice even if the 
underlying encoding algorithms are the same. This is to enable different, 
voiceband data-specific receiver processing. For a given encoding algorithm, a 
receiver may include  both in the media (m=) line in SDP. If it intends to 
support the encoding algorithm for voiceband data but not for voice, it should 
not include the applicable voice payload type in the 'm=' line. 
5. Proposed representation in SDP 
The encoding name, "vbd",  may be dynamically associated with one or more RTP 
payload types. Using the "fmtp" SDP attribute, each "vbd" payload type is 
associated with an underlying encoding. Thus, 
    a=rtpmap:<vbd dynamic payload type> vbd/<clock rate> 
     a=fmtp:<vbd dynamic payload type> <non-vbd audio payload type> 
indicates a dynamic payload type to be associated with the codec "vbd". The fmtp 
attribute indicates the underlying audio encoding associated with the "vbd" 
codec. The audio encoding used by the "vbd" codec may be represented by either a 
Foster, et al                Informational                          2 
                      Voice-Band Data Media Format             March 2002 
static or dynamic payload type. Note that it is possible to specify multiple 
"vbd" payload types, each with a different "fmtp" value and, therefore, a 
different audio encoding. 
An example media description in SDP might be: 
     m=audio 3456 RTP/AVP 15 98 99  
     a=rtpmap:98 vbd/8000 
    a=fmtp:98 0 
     a=rtpmap:99 vbd/8000 
    a=fmtp:99 8 
This specifies dynamic RTP payload types 98 and 99 as being "vbd" codecs.  
Further, it specifies that the vbd codec associated with payload type 98 uses an 
underlying PCMU codec format (indicated by the static payload type 0). It also 
specifies that payload type 99 has an underlying format of PCMA, (indicated by 
the static payload type 8).   
Note that the payload types 0 (PCMU) and 8 (PCMA) do not appear in the media 
line in this case. The only permitted voice encoding is G728 (payload type 15).  
The audio encoding underlying the voiceband data might also be represented by a 
dynamic payload type, as in the following segment: 
     m=audio 3456 RTP/AVP 15 98  
     a=rtpmap:96 G726-40/8000 
     a=rtpmap:98 vbd/8000 
    a=fmtp:98 96 
Again, the dynamic payload type of 96 does not appear in the media line in this 
case. However, it is used to bind G726-40 as the underlying encoding algorithm 
for the payload type of 98, used in voiceband data packets. 
When both voice and voiceband data payload types are distinctly earmarked for a 
session at session establishment, a transmitter may switch from a voice payload 
type (15 in the example above) to a voiceband data payload type (98 in the 
example above) when it detects an appropriate event such as an ANS or ANSAM as 
defined in V.25 [1] and V.8 [2] respectively. When the receiving gateway or 
endpoint sees a voiceband data payload type (98 in the example above), it 
recognizes this as a  voiceband data codec (with G726-40 encoding) and adjusts 
the jitter buffer accordingly. 
The packet format defined in RFC 2198  can be used with a voiceband data codec 
for greater reliability by virtue of redundant transmission. A dynamic payload 
type is defined for the encoding name "red". The encapsulated voiceband data 
packets are, in this case, staggered in time (earlier and later packets combined 
in an RFC 2198 composite packet). In the following example media description: 
     m=audio 3456 RTP/AVP  15 98 100 
     a=rtpmap:98 vbd/8000 
    a=fmtp:98 0 
    a=rtpmap:100 red/8000 
Foster, et al                Informational                          3 
                      Voice-Band Data Media Format             March 2002 
    a=fmtp:100 98/98 
a dynamic payload type of 100 is associated with RFC 2198 packets. A 'fmtp' line 
indicates that these RFC 2198 packets encapsulate two voiceband data payloads, 
each with payload type  98.  The encapsulated packets are  staggered in time 
(i.e. earlier and later packets combined in an RFC 2198 composite packet). 
A "vbd" payload type is negotiated like any other codec type. For symmetric 
connections that can be transitioned to a specific voiceband data payload type, 
both  ends  must  declare  support  for  that  payload  type.  For  backward 
compatibility, if this codec type ("vbd") is not bound to a connection, then 
suitable voice payload types may be used for voiceband data. 
6. Other Characteristics of Voiceband Data Sessions 
This section is informational and is intended to elaborate on other differences 
between voice and voiceband data traffic. 
*    Silence suppression can be used with voice, but not with voiceband data 
     which requires a continuous carrier signal. 
*    Since voiceband data has a much lower distortion tolerance, it requires an 
     audio encoding algorithm in which DC removal filters are absent. Examples 
     of suitable schemes are PCM (ITU G.711) and 32 kbps/40 kbps ADPCM (ITU 
     G.726). By contrast, many more encoding algorithms are available for voice 
     traffic.  Note:  this  document  does  not  intend  to  list  all  encoding 
     algorithms suitable for voiceband data. 
7. Proposed Registration of MIME media type audio/vbd 
MIME media type name: audio 
MIME subtype name: vbd 
Required parameters:  
rate: The RTP timestamp clock rate, which is equal to the sampling rate.  The 
typical rate is 8000, but other rates may be specified. 
baseAlgorithm: The encoding scheme, such as PCMU, PCMA, G.726-32, G726-40 etc., 
used. No MIME parameters are inherited.  
Optional parameters: channels, ptime, maxptime (Refer to Ref. 7). 
Encoding considerations: 
This type is only defined for transfer via RTP. 
Security considerations: See Section 5 of Ref. 7. 
Interoperability considerations: none 
Foster, et al                Informational                          4 
                      Voice-Band Data Media Format             March 2002 
Published specification: The RFC that will evolve out of this document. 
Applications which use this media type: 
Audio and video streaming and conferencing tools. 
Additional information: none 
Intended usage: Modulated facsimile and modem signals that benefit from special 
handling e.g. jitter buffer adjustment at a receiver. 
Additional information: 
1. Magic number(s): N/A 
2. File extension(s): N/A 
3. Macintosh file type code: N/A 
Author/Change controller: 
Bill Foster, Rajesh Kumar and Flemming Andreasen 
Cisco Systems 
170 W. Tasman Drive 
San Jose, CA 95134-1706,,  
8. References  
  [1]     ITU-T, V.25 specification.  
  [2]     ITU-T, V.8 Specification.  
  [3]     M. Handley, V. Jacobson, SDP: Session Description Protocol, RFC  
  [4]     H. Schulzrinne, RTP Profile for Audio and Video Conferences with  
          Minimal Control, RFC 1890. 
  [6]     C. Perkins et al, RTP payload for redundant audio data, RFC 2198. 
  [7]     The RFC that will come out of draft-ietf-avt-rtp-mime-06.txt, Casner,  
          S. and Hoschka, P. 
9. Author's Addresses  
  Flemming Andreasen  
  Cisco Systems  
Foster, et al                Informational                          5 
                      Voice-Band Data Media Format             March 2002 
  499 Thornall Street, 8th Floor  
  Edison, NJ 08837  
  Phone: +1 732 452 1667  
  Bill Foster  
  Cisco Systems  
  Phone: +1 250 758-9418  
  Rajesh Kumar  
  Cisco Systems  
  170 West Tasman Dr  
  San Jose, CA  
  Phone: +1 408 527 0811  
7. Full Copyright Statement  
  Copyright (C) The Internet Society (2001).  All Rights Reserved.  
  This document and translations of it may be copied and furnished to  
  others, and derivative works that comment on or otherwise explain it  
  or assist in its implementation may be prepared, copied, published  
  and distributed, in whole or in part, without restriction of any  
  kind, provided that the above copyright notice and this paragraph are  
  included on all such copies and derivative works.  However, this  
  document itself may not be modified in any way, such as by removing  
  the copyright notice or references to the Internet Society or other  
  Internet organizations, except as needed for the purpose of  
  developing Internet standards in which case the procedures for  
  copyrights defined in the Internet Standards process must be  
  followed, or as required to translate it into languages other than  
  The limited permissions granted above are perpetual and will not be  
  revoked by the Internet Society or its successors or assigns.  
  This document and the information contained herein is provided on an  
  Funding for the RFC Editor function is currently provided by the  
  Internet Society.  
Foster, et al                Informational                          6 
                      Voice-Band Data Media Format             March 2002 
Foster, et al                Informational                          7