Internet Engineering Task Force                                   AVT WG
Internet Draft                                               Schulzrinne
ietf-avt-dtmf-00.txt                                         Columbia U.
July 8, 1997
Expires: December 1, 1997


                      RTP Payload for DTMF Digits

STATUS OF THIS MEMO

   This document is an Internet-Draft. Internet-Drafts are working
   documents of the Internet Engineering Task Force (IETF), its areas,
   and its working groups.  Note that other groups may also distribute
   working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as ``work in progress''.

   To learn the current status of any Internet-Draft, please check the
   ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow
   Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
   munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
   ftp.isi.edu (US West Coast).

   Distribution of this document is unlimited.

                                 ABSTRACT


         This memo describes how to carry dual-tone multifrequency
         (DTMF) signaling in RTP packets.

1 Introduction

   This memo defines a payload type for carrying dual-tone
   multifrequency (DTMF) digits in RTP packets. A separate payload type
   is desirable since low-rate voice codecs cannot be guaranteed to
   accurately reproduce DTMF. Defining a separate payload type also
   permits higher redundancy while maintaining a low bit rate.

   The DTMF payload type must be suitable for both a gateway and end-
   to-end scenario. In the gateway scenario, a gateway connecting a


Schulzrinne                                                   [Page 1]

Internet Draft                  Profile                     July 8, 1997


   packet voice network with the PSTN recreates the DTMF tones and
   injects them into the PSTN. Since DTMF digit recognition may take
   several tens of milliseconds, careful time and power (volume)
   alignment is needed to avoid generating spurious digits. For
   interactive voice response (IVR) systems directly connected to the
   packet voice network, time alignment and volume levels are not
   important, since the unit will not perform any signal analysis to
   detect DTMF tones from the audio stream.

   DTMF digits are carried as part of the audio stream, and SHOULD use
   the same sequence number and time-stamp base as the regular audio
   channel to simplify recreation of analog audio at a gateway.

   The default clock frequency is 8000 Hz, but the clock frequency can
   be redefined when assigning the dynamic payload type.

   This format achieves a higher redundancy even in the case of
   sustained packet loss than the method proposed for the Voice over
   Frame Relay Implementation Agreement [1].

   In circumstances where exact timing alignment between the audio
   stream and the DTMF digits is not important and data is sent unicast,
   such as the IVR example mentioned earlier, it may be preferable to
   use a reliable control stream such as H.245.

   A source MAY send coded DTMF and coded audio packets for the same
   time instants, using DTMF as the redundant encoding for the audio
   stream, or it MAY block outgoing audio while DTMF tones are active
   and only send DTMF digits as both the primary and redundant
   encodings.

   A source SHOULD send an update with the same packet frequency as the
   current audio codec while the DTMF digit is active.

2 Payload Format


    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |R R R|  digit  |R R| volume    |          duration             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


Schulzrinne                                                   [Page %]

Internet Draft                  Profile                     July 8, 1997


   digit: The DTMF digits are encoded as follows:

                        DTMF digit    encoding (decimal)
                        ________________________________
                        0             0
                        1             1
                        2             2
                        9             9
                        !*!           10
                        #             11
                        A             12
                        B             13
                        C             14
                        D             15
                        Flash         16


   volume: The power level of the digit, expressed in dBm0 after
        dropping the sign, with range from 0 to -63 dBm0. The range of
        valid DTMF is from 0 to -36 dBm0 (must accept); lower than -55
        dBm0 must be rejected (TR-TSY-000181, ITU-T Q.24A). Thus, larger
        values denote lower volume.

   Note: since the acceptable dip is 10 dB and the minimum detectable
   loudness variation is 3 dB, this field could be compressed by at
   least a bit by reducing resolution to 2 dB, if needed.

   duration: Duration of this digit, in timestamp units. (For a sampling
        rate of 8000 Hz, this field is sufficient to express digit
        durations of upto approximately 8 seconds; the minimum
        permissible digit length is 40 ms.)

   R: This field is reserved for future use. The sender MUST set it to
        zero, the receiver MUST ignore it.

   An audio source SHOULD start transmitting DTMF digit packets as soon
   as it recognizes the first DTMF digit and every multiple of a frame
   period or, for sample-based codecs, every 50 ms thereafter. If a
   digit continues for more than one period, it should send a new DTMF
   packet with the RTP timestamp value corresponding to the beginning of
   the digit and the duration of the digit increased correspondingly.
   (The RTP sequence number is incremented by one for each packet.) If
   there has been no new digit in the last interval, the digit SHOULD be
   retransmitted three times to ensure some measure of reliability for
   the last digit.


        DTMF digits are sent incrementally to avoid having the
        receiver wait for the completion of the digit. Since some


Schulzrinne                                                   [Page 3]

Internet Draft                  Profile                     July 8, 1997


        tones are two seconds long, this would incur a substantial
        delay.

3 Reliability

   To achieve reliability even when the network loses packets, the audio
   redundancy mechanism described in [2] is used. The effective data
   rate is !r! times 64 bits (32 bits for the redundancy header and 32
   bits for the DTMF payload) every 50 ms or !r! times 1280 bits/second,
   where !r! is the number of redundant DTMF digits carried in each
   packet. The value of !r! is an implementation trade-off, with a value
   of 5 suggested.


        The timestamp offset in this redundancy scheme has 14 bits,
        so that it allows a single packet to "cover" 2.048 seconds
        of DTMF digits at a sampling rate of 8000 Hz. Including the
        starting time of previous digits allows precise
        reconstruction of the tone sequence at a gateway. The
        scheme is resilient to consecutive packet losses spanning
        this interval of 2.048 seconds or !r! digits, whichever is
        less. Note that for previous digits, only an average
        loudness can be represented.

   An encoder MAY treat the DTMF payload as a highly-compressed version
   of the current audio frame. In that mode, each RTP packet during a
   DTMF tone would contain the current audio codec rendition (say,
   G.723.1 or G.729) of this digit as well as the representation
   described in Section 2, plus any previous digits as before.


        This approach allows dumb gateways that do not understand
        this format to function. Other reasons?

3.1 Example

   A typical RTP packet, where the user is just dialing the last digit
   of the DTMF sequence "911". The first digit was 200 ms long and
   started at time 0, the second digit lasted 250 ms and started at time
   800 ms, the third digit has just been pressed for 100 ms, at time 1.5
   s. The frame duration is 50 ms. To make the parts recognizable, the
   figure below ignores byte alignment. Timestamp and sequence number
   are assumed to have been zero at the beginning of the first digit.


    0                    1                   2                    3


Schulzrinne                                                   [Page 4]

Internet Draft                  Profile                     July 8, 1997


    0 1 2 3 4 5 6 7 8 9 0 1 2 3  4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |V=2|P|X|  CC   |M|     PT      |       sequence number         |
   | 2 |0|0|   0   |0|     96      |              31               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                           timestamp                           |
   |                             12000                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           synchronization source (SSRC) identifier            |
   |                            0x5234a8                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |F|   block PT  |     timestamp offset      |   block length    |
   |1|     96      |            12400          |         4         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |F|   block PT  |     timestamp offset      |   block length    |
   |1|     96      |             5600          |         4         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |F|   Block PT  |
   |0|     96      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |R R R|  digit  |R R| volume    |          duration             |
   |0 0 0|    9    |0 0|     7     |             1600              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |R R R|  digit  |R R| volume    |          duration             |
   |0 0 0|    1    |0 0|    10     |             2000              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |R R R|  digit  |R R| volume    |          duration             |
   |0 0 0|    1    |0 0|    20     |              800              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


4 Compact Reliability Scheme

   A more compact representation could be achieved by measuring DTMF
   tones in a different sampling rate from that of the surrounding audio
   codec, e.g., as multiples of 1, 10, 40 or 50 ms. Each RTP payload
   type should have a fixed sampling rate, so choosing a value that
   depends on frame interval of the surrounding codec is not
   recommended. For a sampling interval of 50 ms, the following payload
   would "cover" 8 seconds of duration and offset:


    0                   1                   2                   3


Schulzrinne                                                   [Page 5]

Internet Draft                  Profile                     July 8, 1997


    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    offset     |R R R|  digit  |R R| volume    |   duration    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


5 Acknowledgements

   The suggestions of the VoIP working group are gratefully
   acknowledged.

6 Bibliography

   [1] R. Kocen and T. Hatala, "Voice over frame relay implementation
   agreement," Implementation Agreement FRF.11, Frame Relay Forum,
   Foster City, California, Jan. 1997.

   [2] C. Perkins, I. Kouvelas, V. Hardman, M. Handley, J.-C. Bolot, A.
   Vega-Garcia, and S. Fosse-Parisis, "RTP payload for redundant audio
   data," Internet Draft, Internet Engineering Task Force, Mar. 1997.
   Work in progress.


Schulzrinne                                                   [Page 6]