R. Mahy Internet Draft Cisco Systems Document: draft-mahy-sip-signaled-digits-00.txt Feb 2001 Expires: Aug, 2001 Signaled Digits in SIP Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 [RFC2026]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. 1. Abstract This document demonstrates a way for interested SIP User Agents which are not a party to the media of a call or session, to receive SIP event notifications when signaled digits, or other specific telephony-related events are detected. This is useful for a variety of applications that monitor calls for a specific event (e.g.: a long pound, special sequence of digits, or a fax signal) and--only then--take an active role in the monitored calls. 2. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [RFC2119]. Throughout this document, the author refers to "DTMF" and other "tones" as audio media. Similar types of information conveyed as signaling are called "digits" or "signals". This convention is consistent with RFC2833 [AVT], which provides a more detailed discussion of the issue. Mahy Expires: Aug 2001 1 SIP Signaled Digits 3. Motivational text RFC2833 "AVT Tones" is widely acknowledged within the IP telephony community as the best way to transport telephony-related tones between end systems which already terminate media using [RTP]. This approach maintains synchronization of speech audio with tone audio, tolerates loss, provides event duration and volume information, and avoids detection delay. Some applications are interested in the telephony signals represented by these tones, but would not otherwise be a party to the audio media. This proposal addresses the transport requirements of these signals in this context. Synchronizing speech is a non- issue in these topologies, as there is no audio media with which to synchronize; SIP provides its own reliability mechanism to prevent loss; and since this proposal reuses the encoding specified in [AVT], volume and duration are preserved, and detection delay is minimized. For example, in some application scenarios, a user contacts an application, places a new call in the context of the application, and returns to the application after the new call is finished. Some examples of such scenarios are: Calling card systems, Voicemail or Messaging systems which allows outgoing calls, and Voice Browsers or Voice Portals which allow outgoing calls. All of these applications require a way for the user to get back to the application if something has gone wrong with the outgoing call (ex: wrong number), or if the user changes his or her mind. If the originating user is using a TDM telephone, or a simple IP endpoint, the application will typically expect a sequence of signaled digits (ex: a pound or hash (#) of long duration, three stars (*) in a row, etc.) +-------------+ | | | Originating | | User | | | +-------------+ | ^ ^ | | NOTIFY | SIP | | RTP | | | | | v v v +-------------+ +-------------+ | | | | | Waiting for | | Target User | | trigger | | or Service | | | | | +-------------+ +-------------+ Mahy Expires: Aug 2001 2 SIP Signaled Digits Below are several possible [SIP] topologies that would enable this type of behavior. Most of these approaches fall into two categories: the application could receive DTMF media corresponding to the signaled digits, or it could receive the signaled digits using SIP. Below are three approaches to encoding this information as media. None of these approaches are very attractive. - The application could relay all the media itself. This wastes network resources and is inefficient for the application. - The application could setup a conference and INVITE itself to the conference. This method requires setting up a complex set of call legs and wastes network and conferencing resources. - The application could request "forked-media" [Forked-Media], of just the RFC2833 media. While the best media-related proposal, this method requires rather complex functionality in the "forking" UAs; requires [3pcc], and is problematic for firewalls because of the complexity of the [SDP]. Also, experience at interoperability tests showed that most current SDP implementations are much less robust than their SIP counterparts. This draft will summarize a few non-media approaches as well: - The application could expect to receive [INFO] messages containing a representation of the signaled digits. There are a number of disadvantages to this method as well: a) it requires 3pcc, b) there is no way to turn the INFO messages on or off, c) simultaneous use of AVT tones and INFO may cause double detection of events. For these reasons, using INFO to carry signaled digits is now deprecated. - The application could ask the originating UA to execute a script (ex: in [Java]) or render a markup language (ex: [VoiceXML]) to watch for an event and transfer the call back to the application. This is a very elegant solution, but requires significant resources and implementation on each UA. - This proposal: the application SUBSCRIBEs [SUB/NTFY] to signaled digits on the originating UA, and then receives the signaled digits in corresponding NOTIFYs. Although this requires wide deployment on UAs, it is fairly easy to implement and works with both 3pcc and fully distributed call control models. While this proposal only provides examples using signaled digits, it could be used to detect other telephony-related signals (for example FAX signals, or call progress signals). Mahy Expires: Aug 2001 3 SIP Signaled Digits This proposal would also allow for a clean decomposition of some services into media and signaling components. For example, below is a diagram of a VoiceXML browser split into media and non-media handling parts. +-------------+ | | | VoiceXML | | Interpreter | | (signaling) | +-------------+ ^ ^ | | SIP | | [RTSP] | | | | v v +-------------+ +-------------+ | | | | | SIP UA | RTP | RTSP Server | | |<------>| (media) | | | | | +-------------+ +-------------+ 4. Formal Grammar The following syntax specification uses the augmented Backus-Naur Form (BNF) as described in RFC-2234 [BNF]. This proposal adds a new Event type to the SUBSCRIBE/NOTIFY Event header. "Event" ":" SP "telephone-event" *[";" tparams ] tparams = event-param | rate-param | duration-param event-param = "events" "=" nums *["," nums ] rate-param = "rate" "=" num duration-param = "duration" "=" num numbers = range | num range = num "-" num Mahy Expires: Aug 2001 4 SIP Signaled Digits 5. Example of usage The example below shows a typical scenario used for calling cards. The Application acts as both an ordinary UA and as a 3pcc controller. Original App Target UA UA | | | |--INVITE----->| | |<---200-------| | |----ACK------>| | | | | |<===RTP======>| | | | | | ..time.. | | | | | |<--SUBSCRIBE--| | |----200------>| | | | | | |--INVITE---->| | |<---180------| |<--reINVITE---|<---200------| |----200------>| | |<---ACK-------|----ACK----->| | | | |<=========RTP==============>| | | | | ..time.. | | | | | |---NOTIFY---->| | |<---200-------| | | | | |<--reINVITE---|----BYE----->| |-----200----->|<---200------| |<----ACK------| | | | | |<====RTP=====>| | | | | SUBSCRIBE sip:gateway.itsp.net SIP/2.0 Call-Id: 100@gateway.itsp.net To: From: ;tag=abcd CSeq: 1 SUBSCRIBE Events: telephone-event;duration=2000 Expires: 3600 Content-Length: 0 Mahy Expires: Aug 2001 5 SIP Signaled Digits NOTIFY sip: Call-Id: 100@gateway.itsp.net To: ;tag=abcd From: ;tag=efgh CSeq: 5 NOTIFY Events: telephone-event;rate=1000 Content-Type: audio/telephone-event Content-Length: 4 0x0B0F0300 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | event |E|R| volume | duration | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ event=11, end=0, reserved=0, volume=15, duration=2048 a "#" tone at -15dmb volume has been in progress for 2.048 seconds (2048 units of duration at a sampling rate of 1000Hz). 6. Detailed Behavior 6.1 Behavior of application (Subscriber) First the application SUBSCRIBEs to the signaled digits. It SHOULD only SUBSCRIBE to events if it is no longer a party to the media. Because it is impossible to perfectly synchronize the SUBSCRIBE and the reINVITE or transfer necessary for the application to divest itself of the media, it is sufficient for the application to ignore AVT media which it receives while SUBSCRIBEd to signaled digits. The application MAY specify an event mask. The specified event mask MUST at least include the default events: 0-15. The application MUST NOT specify a rate. The subscriber MUST accept any rate specified by the notifier. The subscriber MAY specify a "minimum" duration in milliseconds. The subscriber can be assured that it will not receive any notifications for events which have been in progress for less than this duration. The subscriber MUST still accept "final" events (the end bit is set) with shorter durations. Mahy Expires: Aug 2001 6 SIP Signaled Digits If the application receives a NOTIFY for an event in the range 0-15, it SHOULD verify that the volume parameter is less than or equal to 36 (at least -36 dbm). It MAY accept a DTMF event with a volume as large as 55 (-55 dbm). For all other events, the volume MUST be zero. Most applications will only act either on interim events (end bit is zero), or on final events (end bit is one), but not both. For example, an application which watches for "***" would look for the "*" event (10), the end bit equal to zero, and a duration greater than or equal to 40ms based on the current rate. It could ignore all final events. 6.2 Behavior of the originating UA (Notifier) The notifier MUST send a NOTIFY when it detects either the end of a subscribed event, or the continuation of a subscribed event for a sufficient duration. The notifier SHOULD NOT send events outside the subscribed event mask. The default event mask is 0-15. The default rate for this application is 1000Hz. The duration parameter in a SUBSCRIBE indicates the number of milliseconds a signal must exist before it should be reported with NOTIFY. The default duration is 40ms regardless of the sample rate. If the notifier detects that an event has begun and continued for at least the subscribed duration, it MUST send a NOTIFY for that event. The notifier SHOULD NOT wait for the end of the event. If the notifier detects that an event has ended, it MUST send a NOTIFY for that event, even if that event previously generated a NOTIFY, and even if the event was shorter than the minimum duration requested. In a NOTIFY, the notifier MAY specify any rate. If so, the duration returned in the body of the NOTIFY MUST be in units of the speificied rate. The notifier MUST NOT send the same event three times as required for AVT conveyed in RTP. SIP provides its own redundancy mechanism, and without the timestamp header of the RTP packet available in SIP, there would no way to determine if these were duplicate events. Note that multiple applications may subscribe to signaled digits (possibly with different parameters) for the same call simultaneously. A practical example is a calling card call to a voicemail application during an outcall. The calling card application may wait for a long pound, while the messaging system waits for a different sequence. Mahy Expires: Aug 2001 7 SIP Signaled Digits 6.3 Simple Implementation on IP phones IP phones only generate DTMF for compatibility with the PSTN. The concepts of volume and minimum duration in this context are irrelevant. Therefore, a simple IP phone MAY a) only support events zero through eleven (most phones do not have keys for ABCD), b) always set the volume to zero, c) only use the default rate, and d) never send an event shorter than 40ms. Long key presses (ex: 2 seconds) MUST still be correctly detected and reported. Accept or refuse SUBSCRIBE messages according to local authorization policy. For example, always accept messages for your SIP peer for an active call. Ignore any parameters in the subscriptions. When key activity occurs, check if there are any subscriptions which correspond to the active "line". If so, send a NOTIFY (to each subscriber for this call-id) once when a "DTMF keypad" key is depressed (set the duration to 40ms). Also, send a NOTIFY with the end bit set, and the approximate duration of the keypress when the key is released. In both cases, always use the default rate of 8000Hz, and set the volume to zero. This description does not intend to limit implementation to physical telephones with a "DTMF keypad". 6.4 Behavior of Proxy servers Proxy servers MUST be able to forward SUBSCRIBE and NOTIFY methods. 7. Security Considerations Signaled Digits may convey private information such as PINs, credit card numbers, or account numbers. UAs SHOULD authenticate these subscriptions. In addition, UAs are encouraged to encrpyt this information using a suitable mechanism as available in SIP (e.g. [PGP]). 8. References [SIP] M. Handley, E. Schooler, and H. Schulzrinne, "SIP: Session Initiation Protocol", RFC2543, Internet Engineering Task Force, Nov 1998. [SDP] M. Handley and V. Jacobson, "SDP: session description protocol," Request for Comments 2327, Internet Engineering Task Force, April 1998. Mahy Expires: Aug 2001 8 SIP Signaled Digits [AVT] H. Schulzrinne, S. Petrack, "RTP Payload for DTMF Digits, Telephony Tones and Telephony Signals", RFC2833, Internet Engineering Task Force, May 2000. [RTP] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", RFC 1889, January 1996. [SUB/NTFY] Adam Roach, "Event Notification in SIP", Internet Draft , IETF; January 2001. Work in progress. [RTSP] H. Schulzrinne, A. Rao, and R. Lanphier, "Real time streaming protocol (RTSP)," RFC2326, Internet Engineering Task Force, Apr. 1998. [3pcc] J. Rosenberg, J. Peterson, H. Schulzrinne, "Third Party Call Control in SIP", Internet Draft , IETF; Nov. 2000. Work in progress [INFO] S. Donovan, "The SIP INFO method," Request for Comments 2976, Internet Engineering Task Force, Oct. 2000. [Java] J. Gosling, B. Joy, G. Steele, "The Java Language Specification," Addison Wesley, 1996. [VoiceXML] VoiceXML Forum, "Voice extensible markup language (voicexml) version 1.00," VoiceXML forum specification, VoiceXML Forum, Mar. 2000. [PGP] D. Atkins, W. Stallings, and P. Zimmermann, "PGP message exchange formats," Request for Comments 1991, Internet Engineering Task Force, Aug. 1996. [RFC2026] S Bradner, "The Internet Standards Process -- Revision 3", RFC2026 (BCP), IETF, October 1996. [RFC2119] S. Bradner, "Key words for use in RFCs to indicate requirement levels," Request for Comments (Best Current Practice) 2119, Internet Engineering Task Force, Mar. 1997. [BNF] D Crocker, P Overell, "Augmented BNF for Syntax Specifications: ABNF", RFC2234, IETF, Nov 1997. 10. Acknowledgments Funding for the RFC Editor is currently provided by the Internet Society. Mahy Expires: Aug 2001 9 SIP Signaled Digits 11. Author's Addresses Rohan Mahy Cisco Systems 170 West Tasman Dr, MS: SJC-21/3 Phone: +1 408 526 8570 Email: rohan@cisco.com Full Copyright Statement "Copyright (C) The Internet Society (date). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implmentation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Mahy Expires: Aug 2001 10