Internet Engineering Task Force SIP WG Internet Draft G. Camarillo Ericsson H. Schulzrinne Columbia University draft-camarillo-sipping-early-media-00.txt November 29, 2002 Expires: May, 2003 Early Media and Ringback Tone Generation in the Session Initiation Protocol STATUS OF THIS MEMO This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt To view the list Internet-Draft Shadow Directories, see http://www.ietf.org/shadow.html. Abstract This document describes how to manage early media in SIP. It also describes which inputs need to be taken into consideration to define local policies for ringback tone generation. G. Camarillo et. al. [Page 1] Internet Draft SIP November 29, 2002 Table of Contents 1 Introduction ........................................ 3 2 Early Media in SIP .................................. 3 2.1 Status Codes ........................................ 4 2.2 Direction Attributes and Media Clipping ............. 4 2.3 Intention to Send Media ............................. 5 2.4 The SDP Intention Parameter ......................... 6 2.5 Applicability of the SDP Intention Parameter ........ 6 3 Forking ............................................. 6 4 Ringback Tone Generation ............................ 7 5 Interactions with Preconditions ..................... 8 6 Examples ............................................ 8 6.1 Remotely Generated Ringback Tone .................... 8 6.2 Locally Generated Ringback Tone ..................... 9 7 Acknowledgments ..................................... 9 8 Authors' Addresses .................................. 9 9 Bibliography ........................................ 10 G. Camarillo et. al. [Page 2] Internet Draft SIP November 29, 2002 1 Introduction Early media refers to media (e.g., audio and/or video) that is exchanged before a particular session is accepted by the called user. Early media within a particular SIP dialog takes place from the moment the initial INVITE is sent until the UAS generates a final response. Early media can be unidirectional or bi-directional and can be generated by the caller or/and the callee. Typical examples of early media generated by the callee are ringback tone and announcements (e.g., queuing status). Early media generated by the caller typically consist of voice commands or DTMF tones to drive IVRs. The basic SIP spec [1] supports very simple early media, but UAs that implement fully-featured early media need to support the PRACK [2] and the UPDATE [3] methods. The remainder of this document is organized as follows. Section 2 describes early media establishment in SIP and Section 4 describes ringback tone generation. Section 5 analyzes interactions between early media, ringback tone generation and preconditions and Section 6 provides examples of common scenarios that involve the usage of the mechanisms described in Sections 2 and 4. 2 Early Media in SIP SIP [1] uses the offer/answer model [4] to negotiate session parameters. One of the user agents - the offerer - prepares a session description that is called the offer. The other user agent - the answerer - responds with another session description called the answer. This two-way handshake allows both user agents to agree upon the session parameters to be used to exchange media. The idea behind the offer/answer model is to decouple the offer/answer exchange from the mechanism used to transport the session descriptions. For example, the offer can be sent in an INVITE request and the answer can arrive in the 200 (OK) response for that INVITE. Or, alternatively, the offer can be sent in the 200 (OK) for an empty INVITE and the answer be sent in the ACK. When reliable provisional responses [2] and/or UPDATE requests [3] are used, there are many more possible ways to exchange offers and answers. The offer/answer model is not even coupled to SIP. Other transport mechanisms such as email attachments or instant messages can be used to perform an offer/answer exchange. This decoupling between the offer/answer model and the particular messages used for a particular offer/answer exchange implies that the G. Camarillo et. al. [Page 3] Internet Draft SIP November 29, 2002 negotiation of media parameters is not affected by the status of the session. If an INVITE contains an offer, it does not matter that the answer is received in a 183 (Session Progress), a 180 (Ringing) or a 200 (OK) response. The resulting media session will be the same in the three scenarios. Note that in the past, some people wrongly believed that a UAC receiving a particular answer had to set up different early media sessions if the answer was received in a 180 response (all the media streams were "magically" considered inactive) or in a 183 response (the media streams were established following the normal offer/answer model). 2.1 Status Codes As a consequence of the previously mentioned decoupling, the status code of a particular 1xx or 2xx SIP response is independent of the offer/answer model. For example, if a UAS is alerting the user, it will send a 180 (Ringing) response, regardless of the presence (or absence) of early media. Early media is driven by the offer/answer model, NOT by the status codes. 2.2 Direction Attributes and Media Clipping The direction attribute (i.e., sendrecv, sendonly, recvonly or inactive) for a particular stream contains the status of the media tools handling that stream at the end-points. Therefore, the direction attribute indicates whether or not the media tools at the end-point are ready to receive/send media over a particular media stream. The problem is that the offer/answer model does not distinguish between a sender that does not intend to send media and a receiver that does not accept incoming media. This distinction is useful to avoid media clipping in certain situations. We have the following alternatives for a particular direction of a stream: 1. Sender intends to send; receiver accepts media 2. Sender intends to send; receiver does NOT accept media 3. Sender does NOT intend to send; receiver accepts media 4. Sender does NOT intend to send; receiver does NOT accept media We have to analyze in which of these 4 scenarios there is a chance of having media clipping when the media resumes being sent over the G. Camarillo et. al. [Page 4] Internet Draft SIP November 29, 2002 stream. If is obvious that in scenario 1 there is already media flowing from sender to receiver, so we do not need to analyze it. If in scenario 2 the receiver decides to start accepting media, it will configure his media tool so that it is ready to receive media, and it will send an offer to the sender indicating so. Since the receiver configures his media tool before sending the offer, there is no media clipping. In scenario 3, if the sender decides to start sending media, it will have to send an offer to the receiver indicating so. However, since SIP signalling typically traverses a different path than the media packets, the first media packets may arrive to the receiver before the offer. This is not a problem, since the receiver was accepting media anyway. There is no media clipping. In scenario 4, if the sender decides to start sending media, it will have to send an offer to the receiver indicating so. However, the sender cannot start sending media to the receiver until it gets the answer back. Otherwise, all the media would be discarded by the receiver, since it was not accepting any media at that point in time. This leads to media clipping. The sender will not typically be able to send the first "hello" pronounced by the user. The problem with the offer/answer model is that it can establish scenario 4, but it cannot establish scenario 3. Therefore, when a sender that was quiet resumes sending media, there can be media clipping. The solution to this problem consists of using the SDP direction attribute to indicate media acceptance by the receiver and a new SDP parameter to indicate intention to send media by the sender. Such a parameter is defined in Section 2.4. 2.3 Intention to Send Media To resolve the problem above, some proposed keeping the sender from signalling that it did not intend to send media. That would transform scenario 3 into scenario 1, eliminating media clipping. However, knowing whether or not the sender intends to send media may be important to drive GUIs in certain situations, as shown in the following example. Two users, A and B, are involved in a videoconference using a sendrecv video stream. B wants to have a moment of privacy, so he switches off his camera for a minute. B issues an offer indicating that it does not intend to send video. However, the offer indicates that A and B should still keep their video tools configured as sendrecv, so that when B switches on his camera again, they can perform a "soft" media resume (i.e., without media clipping). G. Camarillo et. al. [Page 5] Internet Draft SIP November 29, 2002 B's intention of not sending video is now used to drive A's GUI (e.g., minimizing the window where A was watching B's face). If B's intention had not been signalled, A's GUI would have probably continued showing the last video frame that was received over the stream. A would not have been able to distinguish this situation from a massive packet loss in the network (RTCP timers are usually too long for this purpose). Therefore, signalling the intention of sending or not sending media is important to drive GUIs. 2.4 The SDP Intention Parameter OPEN ISSUE: SHOULD THIS ATTRIBUTE BE DEFINED IN AN MMUSIC DRAFT OR IS IT OK TO DEFINE IT IN THIS SECTION? IT PROBABLY BELONGS TO MMUSIC, BECAUSE IT IS NOT EARLY MEDIA SPECIFIC. A new "intention" SDP media level attribute is defined. It is used to indicate whether or not the entity generating the session description intends to send media at a particular point in time over a particular stream. Its formatting in SDP is described by the following BNF: intention-attribute = "a=intention:" intention-value intention-value = "send" | "nosend" 2.5 Applicability of the SDP Intention Parameter The SDP intention parameter should be used by systems that want to provide information to drive GUIs and that want to avoid media clipping. Systems whose requirements regarding media clipping are not strict can signal scenario 4 instead. Systems that do not wish to provide information to drive GUIs can signal scenario 1 instead. 3 Forking If an INVITE forks, the UAC can receive multiple provisional responses that establish different early media streams. It is up to the UAC's local policy how to render the media received over those streams. When a UAC has to deal with several video streams, it seems natural, if the GUI supports it, to use a different window to show each individual stream. However, a UAC receiving several audio streams will probably have to choose one to be played, because mixing them all may not be useful. Note that if the INVITE that forked contained an offer, all the UASs will send their early media to the same transport address of the UAC. G. Camarillo et. al. [Page 6] Internet Draft SIP November 29, 2002 The UAC should be ready to temporarily demultiplex them based on the RTP SSRCs and send a new offer within the early dialog as soon as the offer/answer rules allow it. 4 Ringback Tone Generation In the PSTN, telephone switches typically play ringback tones to the caller to indicate that the called user is being alerted. When, where and how these ringback tones are generated has been standardized (i.e., the local exchange of the callee generates a standardized ringback tone while the callee is being alterted). A standardized approach to provide this type of feedback for the user makes sense in a homogeneous environment such as the PSTN, where all the terminals have a similar user interface. This homogeneity is not found among SIP user agents. SIP user agents have different capabilities, different user interfaces and may be used to establish sessions that do not involve audio at all. Because of this, the way a SIP UA provides the user with information about the progress of session establishment is a matter of local policy. This local policy in a given SIP UA has two main inputs; the status of the INVITE transaction and the availability of incoming early media. The status of the INVITE transaction is given by the status code of the latest response (e.g., 180 Ringing). The availability of incoming early media is given by the offer/answer model and its direction attributes and the intention attribute. For example, a POTS-like SIP UA could implement the following local policy: 1. If there is at least one audio stream in sendrecv or recvonly mode, play out the audio received over that stream. 2. If the callee is being alerted and there are no audio streams in sendrecv or recvonly mode, play a locally- generated ringback tone to the user. And a SIP UA with a graphical user interface could follow the local policy below: 1. If there are audio or/and video streams in sendrecv or recvonly mode, play out whatever it is received over those streams. 2. If the callee is being alerted, display the message "The G. Camarillo et. al. [Page 7] Internet Draft SIP November 29, 2002 callee is being alerted" for the user. 3. If a provisional response other than alerting is received, display its reason phrase to the user (e.g., Trying, Call is Being Forwarded, Queued) Note that while it is not desirable to standardize a common local policy to be followed by every SIP UA, a particular subset of more or less homogeneous SIP UAs could use the same local policy by convention. Examples of such subsets of SIP UAs may be "all the PSTN/SIP gateways" or "every 3G IMS terminal". However, defining the particular common policy that such groups of SIP devices may use is outside the scope of this document. 5 Interactions with Preconditions RFC 3312 [5] defines a framework for preconditions for SIP. The negotiation of preconditions does not interact with the negotiation or early media. Every precondition has a direction attribute (e.g., QoS in the sendonly direction) that may differ from the direction attribute of the media stream. Since the presence of early media is signalled with the latter attribute, there are no interactions between preconditions and early media. For example, a UA can request sendrecv QoS for a media stream that will be in recvonly mode for early media and will be set to sendrecv when the session is accepted. 6 Examples The following examples assume SIP UAs following the local policy below: 1. If there is at least one audio stream in sendrecv or recvonly mode, play out the audio received over that stream. 2. If the callee is being alerted and there are no audio streams in sendrecv or recvonly mode, play a locally- generated ringback tone to the user. 6.1 Remotely Generated Ringback Tone The UAS of Figure 1 receives an initial INVITE (1) with an offer that contains an audio stream in sendrecv mode. The UAS will play an announcement, but it will not accept incoming (early) media until user B accepts the session. The UAS sends a 183 (Session Progress) response with an answer that sets the audio stream to sendonly (2). G. Camarillo et. al. [Page 8] Internet Draft SIP November 29, 2002 After playing the announcement, the UAS starts alerting user B (5). The UAS will be generating a special ringback tone on the media stream, but since the audio stream was already in sendonly mode, there is no need of a new offer/answer exchange. When user B accepts the session the UAS sends a 200 (OK) response (8) for the INVITE and an UPDATE (9) to set the audio stream to sendrecv in parallel. The UAC sends the ACK (10) and the 200 (OK) response (11) for the UPDATE in parallel. Since the audio stream is in sendrecv or recvonly mode (from the UAC's prespective) all the time, the UAC applies the first bullet of its local policy. It plays out whatever it is received over the audio stream (i.e., first the announcement and then the remotely generated ringback tone). 6.2 Locally Generated Ringback Tone The UAS of Figure 2 receives an initial INVITE (1) with an offer that contains an audio stream in sendrecv mode. The UAS will play an announcement, but it will not accept incoming (early) media until user B accepts the session. The UAS sends a 183 (Session Progress) response with an answer that sets the audio stream to sendonly (2). After playing the announcement, the UAS starts alerting user B, but it will not be generating any ringback tone on the media stream. Therefore, it sends a 180 (Ringing) response (5) and sets the audio stream to inactive with an UPDATE (8). At this point in time, the UAC uses the second bullet of its local policy and generates ringback tone locally. When user B accepts the session the UAS sends a 200 (OK) response (10) for the INVITE and an UPDATE (11) to set the audio stream to sendrecv in parallel. The UAC sends the ACK (12) and the 200 (OK) response (13) for the UPDATE in parallel. 7 Acknowledgments Paul Kyzivat, Christer Holmberg, Jon Peterson and William Marshall provided useful comments and suggestions. 8 Authors' Addresses Gonzalo Camarillo Ericsson Advanced Signalling Research Lab. FIN-02420 Jorvas G. Camarillo et. al. [Page 9] Internet Draft SIP November 29, 2002 Finland electronic mail: Gonzalo.Camarillo@ericsson.com Henning Schulzrinne Dept. of Computer Science Columbia University 1214 Amsterdam Avenue, MC 0401 New York, NY 10027 USA electronic mail: schulzrinne@cs.columbia.edu 9 Bibliography [1] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston, J. Peterson, R. Sparks, M. Handley, and E. Schooler, "SIP: session initiation protocol," RFC 3261, Internet Engineering Task Force, June 2002. [2] J. Rosenberg and H. Schulzrinne, "Reliability of provisional responses in session initiation protocol (SIP)," RFC 3262, Internet Engineering Task Force, June 2002. [3] J. Rosenberg, "The session initiation protocol (SIP) UPDATE method," RFC 3311, Internet Engineering Task Force, Oct. 2002. [4] J. Rosenberg and H. Schulzrinne, "An offer/answer model with session description protocol (SDP)," RFC 3264, Internet Engineering Task Force, June 2002. [5] "Integration of resource management and session initiation protocol (SIP)," RFC 3312, Internet Engineering Task Force, Oct. 2002. Full Copyright Statement Copyright (c) The Internet Society (2002). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be G. Camarillo et. al. [Page 10] Internet Draft SIP November 29, 2002 A B | | |---------------(1) INVITE -------------->| | a=sendrecv | |<------(2) 183 Session Progress-------------| | a=sendonly | |-----------------(3) PRACK----------------->| | | |<-----------(4) 200 OK (PRACK)--------------| | * | | ****************************************** | |* User B will be with you shortly * | | ****************************************** | | * | |<------------(5) 180 Ringing----------------| | | |-----------------(6) PRACK----------------->| | | |<-----------(7) 200 OK (PRACK)--------------| | * | | ****************************************** | |* Ringback tone * | | ****************************************** | | * | |<-----------(8) 200 OK (INVITE)-------------| | | |<--------------(9) UPDATE ---------------| | a=sendrecv | | | |-----------------(10) ACK------------------>| | | |------------(11) 200 OK (UPDATE)----------->| | a=sendrecv | | * * | | ****************************************** | |* Bi-directional conversation *| | ****************************************** | | * * | | | Figure 1: Remotely generated ringback tone G. Camarillo et. al. [Page 11] Internet Draft SIP November 29, 2002 A B | | |---------------(1) INVITE -------------->| | a=sendrecv | |<------(2) 183 Session Progress-------------| | a=sendonly | |-----------------(3) PRACK----------------->| | | |<-----------(4) 200 OK (PRACK)--------------| | * | | ****************************************** | |* User B will be with you shortly * | | ****************************************** | | * | |<------------(5) 180 Ringing----------------| | a=inactive | |<--------------(6) UPDATE ---------------| | a=inactive | | | |-------------(7) PRACK--------------------->| |-------------(8) 200 OK (UPDATE)----------->| | a=inactive | |<-----------(9) 200 OK (PRACK)--------------| | | | | | | |<----------(10) 200 OK (INVITE)-------------| | | |<-------------(11) UPDATE ---------------| | a=sendrecv | | | |-----------------(12) ACK------------------>| | | |------------(13) 200 OK (UPDATE)----------->| | a=sendrecv | | * * | | ****************************************** | |* Bi-directional conversation *| | ****************************************** | | * * | | | Figure 2: Locally generated ringback tone G. Camarillo et. al. [Page 12] Internet Draft SIP November 29, 2002 followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. G. Camarillo et. al. [Page 13]