Network Working Group H. Kaplan Internet Draft Acme Packet Intended status: Informational October 24, 2011 Expires: April 21, 2011 Requirements for Interworking RTCWeb with Current SIP Deployments draft-kaplan-rtcweb-sip-interworking-requirements-00 Abstract The IETF RTCWEB WG has been discussing how to interwork with deployed SIP equipment and domains. Doing so may require an Interworking Function middlebox in the media-plane. This document lists some RTCWeb-to-SIP use-cases, the RTCWeb requirements to support such, and the complexity involved in interworking if the requirements cannot be met. Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on April 24, 2011. Copyright and License Notice Kaplan, et al Expires April 24, 2011 [Page 1] Internet-Draft RTCWeb-SIP Interworking Requirements October 2011 Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the BSD License. Table of Contents 1. Terminology...................................................3 2. Introduction..................................................3 3. Existing SIP/RTP Devices......................................4 3.1. SIP/RTP Devices in Enterprises...........................4 3.2. SIP/RTP Devices in Service Providers.....................5 3.3. The Need for an Interworking Function....................5 4. RTCWeb-SIP Interworking Architecture..........................6 4.1. Interworking Function Goal: Lower Cost...................7 4.2. Potential Interworking Functions and Complexity..........7 4.2.1 ICE Termination......................................7 4.2.2 SRTP Termination.....................................8 4.2.3 RTP/RTCP Stream Multiplexing.........................8 4.2.4 Multi-media Stream Multiplexing......................8 4.2.5 RFC-4733 DTMF Generation.............................8 4.2.6 RTCP Generation......................................9 4.2.7 Transcoding and Transrating..........................9 5. RTCWeb-SIP Interworking Use-cases............................10 5.1. Basic Audio-Telephony Call..............................10 5.2. Secure Basic Calls......................................10 5.3. Conference Call in SIP Domain...........................11 5.4. Call Hold and Mute in RTCWeb and SIP Domains............12 5.4.1 Legacy Call-Hold Devices Impacting RTCP..............12 5.4.2 RTP Generation when on Hold or Mute..................12 5.4.3 Clipping with Off-hold/off-mute......................13 5.5. Call Transfer in SIP Domain.............................13 5.6. Audio/Video Call Transfer...............................14 5.7. Find-Me-Follow-Me in SIP Domain.........................15 5.8. Video in SIP Domain.....................................16 5.8.1 Video and SIP/SDP....................................16 5.8.2 Video Codec Compatibility............................16 5.8.3 Separate Video RTP Stream............................16 5.8.4 Video RTP Packet Size................................16 6. Signaling-plane Interworking Requirements....................17 7. Media-plane Interworking Requirements........................18 Kaplan Expires - April 2011 [Page 2] Internet-Draft RTCWeb-SIP Interworking Requirements October 2011 8. Security Considerations......................................20 9. IANA Considerations..........................................20 10. Acknowledgments.............................................21 11. References..................................................21 11.1. Informative References..................................21 Author's Address.................................................22 1. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. The terminology in this document conforms to RFC 2828, "Internet Security Glossary". Browser: an Internet World-Wide-Web/HTTP browser capable of executing JavaScript/ECMAScript, with an RTCWeb RTP Library and associated WebRTC API. Web-Server: an HTTP/S server capable of hosting JavaScript to Browsers, as well as execute local code (e.g., PHP). RTCWeb Client: the combination of Browser and JavaScript on the user's host system. RTP-Peer: another device communicating RTP/RTCP directly with the local Client. 2. Introduction One of the desired use-cases for the RTCWeb architecture is to be able to communicate from RTCWeb applications to existing deployed SIP/RTP-based Voice/Video-over-IP devices in the signaling and media-planes. This document assumes such deployed devices communicate using SIP at a signaling layer, but other protocols may be possible such as XMPP or H.323. For the signaling layer, it is assumed the Web-Server will have to play a role in interworking with the SIP world, either using an integrated Web Server module or separate signaling gateway. In either case it should be possible to communicate with deployed SIP devices at a SIP and SDP layer. For the media-plane, however, the preference expressed thus far in this WG is that direct communication at an IP layer between the Browser and existing SIP devices be possible, without requiring a media-plane gateway. Doing so with most deployed SIP devices might be impossible, depending on what requirements are imposed on RTCWeb Kaplan Expires - April 2011 [Page 3] Internet-Draft RTCWeb-SIP Interworking Requirements October 2011 Browsers. An Interworking Function in the media-plane might be required, deployed by either the RTCWeb domain or the SIP domain. The goal of this document is to summarize the use-cases for communicating with deployed SIP devices and domains, and capture the requirements necessary to do so without using an Interworking Function, or to minimize its cost/complexity. The impacts or difficulties with various Interworking Function needs are also discussed, in order to try to minimize the cost and complexity of using them. For those readers wishing to skip the background, the requirements can be found in sections 6 and 7. Note that some of the requirements are already documented and achieved in current IETF RTCWEB and W3C WEBRTC Working Group drafts; some are likely unachievable. This document simply lists what must be done, so that the Working Groups can discuss and decide if and how they can be done. 3. Existing SIP/RTP Devices This document covers two large groups of existing SIP and RTP devices that the WG should focus on communicating with: those in Enterprises, and those in Service Providers. It is extremely difficult, and undoubtedly contentious, to generalize existing SIP devices as having a common set of capabilities - they do not. Some SIP devices implement ICE and iLBC, for example, while others do not even generate RTCP and only support G.711. For example, there are several software-based SIP User Agents (i.e., softphones) which implement ICE, but virtually no PSTN/TDM Gateways do, very few PBXs do, very few media servers do, etc. 3.1. SIP/RTP Devices in Enterprises The Enterprise market includes PBXs, desk-phones, conference bridges, conference phones, soft-phones, PRI gateways, voicemail servers, IVR systems, and recording systems. There are millions of RTP devices already deployed in Enterprises today; some are upgradeable, some are not. Even for those devices that are upgradeable, it is difficult to require upgrading them all at once; or require upgrading devices that are already working today, simply in order to communicate with RTCWeb-enabled Browsers. An Enterprise that uses RTCWeb-based Web Applications itself would be more incented to do so, or be willing to deploy an Interworking Function to do so, but not an Enterprise Kaplan Expires - April 2011 [Page 4] Internet-Draft RTCWeb-SIP Interworking Requirements October 2011 that just happens to be the far-end peer for a voice/video call of an RTCWeb Application provided by someone else. If an Interworking Function is required to communicate with deployed Enterprise SIP devices, it is likely that the Enterprises that deploy RTCWeb-enabled applications, or RTCWeb Application providers wishing to communicate with SIP Enterprises, be the ones to deploy the Interworking Functions - not the SIP Enterprises with deployed SIP devices. Therefore, it is beneficial for the RTCWEB WG to minimize the cost of such Interworking Functions, or not need any to begin with. 3.2. SIP/RTP Devices in Service Providers The SIP Service Provider market represents an enormous population of users and applications reachable through SIP and RTP. There are over 100 Million deployed RTP devices in Service Providers, but more importantly approximately 5 Billion mobile phones, 1.5 Billion landlines, and an untold number of PRI PBX trunks, all reachable through SIP/RTP gateways or hosts in SIP Service Providers. When compared to only about 2 Billion IP hosts on the public Internet, it becomes clear why connecting to existing RTP devices through SIP Service Providers is desirable. Unfortunately, many of the deployed RTP devices are not upgradeable to change behavior to match RTCWeb: some of them are from manufacturers that no longer exist or have stopped providing enhancements for them; some are incapable of performing new codecs, ICE, or RTCP due to hardware limitations; and in many cases a SIP call will transit through the Service Provider to another Provider or to an Enterprise, and the final RTP endpoint is not under the local Service Provider's control to upgrade. If an Interworking Function is required to communicate with deployed Service Provider SIP devices, it is likely that the Service Providers that deploy RTCWeb-enabled applications, or RTCWeb Application providers wishing to communicate with SIP Service Providers, be the ones to deploy the Interworking Functions - not the SIP Service Providers with deployed SIP devices. Therefore, it is beneficial for the RTCWEB WG to minimize the cost of such Interworking Functions, or not need any to begin with. 3.3. The Need for an Interworking Function While the best-case scenario is one in which no Interworking Function is needed, it is likely one will be needed for many SIP deployments based on the current requirements and limitations in both RTCWeb and SIP-based devices. Kaplan Expires - April 2011 [Page 5] Internet-Draft RTCWeb-SIP Interworking Requirements October 2011 For example, because the Javascript in Browsers cannot be fully trusted, a means of peer-consent must be used in the media-plane before the Browser can be allowed to send RTP packets. The currently proposed means of establishing such peer-consent is ICE using the STUN connectivity checks, whereby the STUN responses implicitly prove peer consent. An RTCWeb Browser cannot allow session media to be used unless the peer uses ICE. Since many SIP- based devices do not support ICE, and will not be upgraded to do so for the reasons described previously, an ICE-interworking device is needed. 4. RTCWeb-SIP Interworking Architecture Due to the issues described in section 3, there will likely be a need for an interworking function in the signaling or media-plane. Therefore, this document assumes an RTCWeb-SIP interworking architecture similar to Figure 1 below: RTCWeb domain | SIP domain +-----------+ +-----------+ +-----------+ | | | | | | | Web | SIP | SIP | SIP | SIP | | |-----| Inter- |-----| | | Server | | working | | User-Agent| | | | Function | | | +-----------+ +-----------+ +-----------+ / | \ / | \ / | \ / | \ / Proprietary over | Logical or \Logical or / HTTP/Websockets | Physical API \Physical API / | \ +-----------+ | \ |JS/HTML/CSS| | \ +-----------+ | \ +-----------+ +-----------+ +-----------+ | | | Media- | | | | | | plane | | Media | | Browser | -----------| Inter- |-------------| Agent | | | | working | | | | | | Function | | | +-----------+ +-----------+ +-----------+ Figure 1: RTCWeb-SIP Interworking Architecture Note that the "SIP Interworking Function" is a logical function; it may be a separate physical device, or it may be built into the Web Kaplan Expires - April 2011 [Page 6] Internet-Draft RTCWeb-SIP Interworking Requirements October 2011 Server or the SIP User Agent (UA). Likewise, "Media-plane Interworking Function" is a logical function which may be a physical device or built into the Media Agent, and the vertical lines may be logical internal APIs or external physical protocols. The SIP and Media-plane Interworking Functions may be deployed by the RTCWeb domain administrator or the SIP domain administrator. 4.1. Interworking Function Goal: Lower Cost One of the main goals of this document is to provide requirements for interworking based on the desire for the least cost and complexity. Determining cost is difficult because it depends in large extent on device implementation specifics and other cost factors that are not universally applicable. Even seemingly unrelated costs, such as cost of space or power, have an impact on the costs of interworking RTCWeb and SIP. This document, however, makes assumptions regarding cost that the author believes to be generally accurate, based on the assumption that the more complicated a function is, the more it costs. Even if one uses free software to perform all of the interworking functions, there is a cost burden tied to CPU, memory, and potentially bandwidth uses. If a function takes more CPU instructions to perform, for example, then it will take more CPUs to perform it for the same number of sessions. Thus it is more expensive. 4.2. Potential Interworking Functions and Complexity It is impossible to document the relative monetary costs of the different interworking functions that may need to occur, because they differ by manufacturer and system architecture. This section highlights some of the complexities involved with the different interworking functions that may need to be used, because complexity usually translates to cost (though not always). 4.2.1 ICE Termination If the Interworking Function has to terminate ICE (i.e., be an ICE agent on behalf of the real SIP endpoint), this involves following the procedures in [ICE], including calculating SHA-1 for each STUN message, checking every UDP packet received during the lifetime of the session to see if it is a STUN request or indication rather than RTP, RTCP, or other message, and responding to STUN requests during ICE restarts. Being an ICE-Lite agent is often simpler than being an ICE-Full agent, however, because of the simpler logic and lack of timers. Kaplan Expires - April 2011 [Page 7] Internet-Draft RTCWeb-SIP Interworking Requirements October 2011 4.2.2 SRTP Termination If the Interworking Function has to terminate SRTP (i.e., encrypt/decrypt SRTP on behalf of the real SIP endpoint), this involves performing encryption/decryption and authentication algorithms on every RTP/RTCP packet in both send/receive directions. It should be noted that if SRTP is required to be used for every call by RTCWeb but the [SDES] key exchange model cannot be used on the RTCWeb side, then the Interworking Function likely has to terminate SRTP from RTCWeb even if the SIP-domain supports SRTP, because [SDES] is the most commonly used form of key exchange in SIP today. 4.2.3 RTP/RTCP Stream Multiplexing If the Interworking Function has to multiplex/de-multiplex RTP and RTCP on the same 5-tuple, this involves checking every received packet for the RTP vs. RTCP header format and de-multiplexing them onto separate 5-tuple flows, and in the other direction taking packets from two 5-tuple flows and sending them on the same 5-tuple set. In some interworking system architectures, such a mux/demux function would be trivial, or even simpler to do than not do due to the reduction in number of ICE flows to terminate. Therefore this document recommends it be possible to perform such muxing separately from the media-type muxing described in the next sub-section 4.2.4. 4.2.4 Multi-media Stream Multiplexing If the Interworking Function has to multiplex/de-multiplex RTP/RTCP for audio and video streams on the same 5-tuple, the behavior depends on how such multiplexing is defined. If the 5-tuple multiplexing means they're all part of the same RTP session, then de-multiplexing them is very complicated; if multiplexing means they're all separate RTP/RTCP sessions and use some fixed header- field mode of separation, then mux/demux is likely far simpler. 4.2.5 RFC-4733 DTMF Generation If the Interworking Function has to generate [RFC4733] DTMF event RTP packets to the SIP-domain side, this involves keeping track of RTP timestamps and sequence numbers, and inserting the appropriate sequence of [RFC4733] packets, etc. If SRTP is also used, then the Interworking Function has to terminate SRTP to be able to insert [RFC4733] events. Kaplan Expires - April 2011 [Page 8] Internet-Draft RTCWeb-SIP Interworking Requirements October 2011 4.2.6 RTCP Generation Because some SIP audio-only RTP endpoints do not generate RTCP, if RTCWeb requires receiving RTCP for calls to continue, then the Interworking Function has to generate RTCP on behalf of them. This is only a known issue for audio calls. Unfortunately, generating fake RTCP is more complicated than most people realize. The SDP in SIP does not indicate whether an endpoint will generate RTCP - it is implicitly assumed in the AVP profile. Therefore, the Interworking Function will have to check every packet from the SIP-domain side to detect an RTCP message; if it does not see one for a certain period of time, it will need to generate one. The RTCP messages it generates will need to appear to be true RTCP messages, and thus contain information for both sender and receiver reports, DLSR, SSRCs, etc. It will need to continue to check every packet throughout the call and use expiration timers, because the call could be silently transferred as described in section 5.6, resulting in a new RTP endpoint that does generate RTCP on its own. Furthermore, it will have to terminate SRTP as well even if the SIP- domain side supports SRTP, in order to be able to generate the fake RTCP messages. Even though it may appear unlikely that an RTP endpoint that would support SRTP does not support RTCP, as far as the Interworking Function knows that could be the case. In fact, it's not unlikely to be the case, because middleboxes perform SRTP on behalf of endpoints today, without generating RTCP on their behalf. For example, the call may be from an RTCWeb Browser to the Interworking Function deployed by the RTCWeb domain owner, to a Service Provider with an SBC performing SRTP termination, and then on to a PSTN gateway that does not generate RTCP (and some don't). It is also possible that generating RTCP might actually require transcoding in some system architectures, which would not only be prohibitively expensive but also increase delay for RTP. 4.2.7 Transcoding and Transrating If the Interworking Function has to perform transcoding, it is likely the most expensive function described in this document. Transcoding is typically performed in DSPs, which are expensive and consume significant power and heat in large scale. DSP technology has improved over the years in terms of cost and density, but it is still one of the most expensive components of interworking. It also impacts call quality at an audio level, as well as introduces delay at an RTP level. For video, video transcoding DSPs exist, of course, but scale far worse than audio transcoding. Kaplan Expires - April 2011 [Page 9] Internet-Draft RTCWeb-SIP Interworking Requirements October 2011 Transrating (converting from one packetization rate to another) is typically simpler and cheaper than transcoding, but still requires terminating SRTP, RTP, and typically RTCP. It can sometimes be done without using DSP technology, however, reducing the cost. 5. RTCWeb-SIP Interworking Use-cases Although [draft-use-cases] covers general use-cases, there are no specific use-cases which drive requirements for interworking with already-deployed SIP domains and their RTP endpoints. This section provides such use-cases. 5.1. Basic Audio-Telephony Call An RTCWeb domain user should be able to generate and receive audio- based sessions with currently deployed SIP Enterprise and Service Provider domains. The author assumes the SIP aspects for a basic call will "just work" or be easily inter-workable, but the media- plane issues are as follows: 1) Most RTP endpoints do not support ICE. 2) Many RTP endpoints do not generate RTCP. 3) Most RTCP-capable endpoints only support RTCP on a separate UDP port (i.e., the +1 odd number). 4) Most RTP endpoints do not support SRTP. 5) Most SRTP-capable endpoints only support [SDES] key exchange. The above limitations drive some of the requirements in section 7, although it may not be possible to meet all of the requirements due to RTCWeb security issues. 5.2. Secure Basic Calls An RTCWeb domain user should be able to generate and receive calls with protection from eavesdropping and impersonation, to/from currently deployed SIP Enterprise and Service Provider domains. For example an RTCWeb user should not be concerned about eavesdropping or impersonation when using their laptop in public WiFi networks, or at an IETF meeting, if their call goes to/from a SIP domain; likewise a SIP-based user should not be concerned about it if their call goes to/from an RTCWeb domain. Despite issue (4) in section 5.1 that most deployed RTP endpoints do not support SRTP, the majority of ones that do support it are SIP devices that are used from outside of the Enterprise or Service Provider's physical network, such as software-clients. Within the physical network (or VPN) most Enterprises and Service Providers feel there is sufficient difficulty in eavesdropping and impersonation that the benefits of not using SIP/TLS and SRTP Kaplan Expires - April 2011 [Page 10] Internet-Draft RTCWeb-SIP Interworking Requirements October 2011 outweigh the risks; but beyond their or their trusted partners' physical network(s) or VPNs there is not. Therefore, SIP Enterprises and Service Providers may well *require* SRTP be used in basic call scenarios with other RTCWeb-application domains. The way they handle such calls today, however, is by using middleboxes to terminate SRTP and [SDES] based keying through secure signaling (either SIP/TLS or SIP over IPsec). If [DTLS-SRTP] is required to be used, then the RTCWeb's Interworking Function will have to interwork that to SRTP using [SDES], which will then likely be terminated somewhere on the SIP Service Provider or Enterprise side. This would be expensive for the RTCWeb provider, and provide dubious additional security beyond simply doing [SDES] in RTCWeb. In order to provide [SDES] in the Browser in a useful manner, however, it needs to be secured with HTTPS to the Web Server. 5.3. Conference Call in SIP Domain An RTCWeb domain user should be able to call a SIP Enterprise or Service Provider-reachable conference bridge, IVR services, make credit-card-based toll calls, and access such things as their voicemail, when the media server is in an Enterprise or Service Provider's SIP domain. Typically such services are based on DTMF event indications. One means of generating DTMF events is using SIP messages, such as KPML [RFC4730] or SIP INFO messages, and it is assumed that such mechanisms would be possible in an RTCWeb context without new requirements. Many deployed SIP/RTP systems, however, rely on DTMF events to be indicated in RTP using [RFC4733] event packets. The ability to interwork SIP-based DTMF indications, including KPML, to [RFC4733] DTMF events is already supported by some interworking manufacturers, but it adds complexity. For example if SRTP is used, handling DTMF interworking will require the Interworking Function to also perform SRTP termination. An alternative solution is to provide the means for both a Javascript-driven signaling-plane indication (which likely already exists), as well as a Javascript- driven media-plane [RFC4733] method in the Browser. It should be noted that some deployed systems only use DTMF in-band as tones in G.711 audio. This is a much smaller population of deployed media servers, however, than it is of clients, and thus the author believes may not be an issue for RTCWeb. In other words, most servers that need to process received DTMF events also support [RFC4733], whereas some endpoints can only generate DTMF in-band; since the use-case involves RTCWeb Browsers generating DTMF to deployed SIP media servers, rather than deployed SIP endpoints generating DTMF to RTCWeb Browsers, this is likely a non-issue. Kaplan Expires - April 2011 [Page 11] Internet-Draft RTCWeb-SIP Interworking Requirements October 2011 5.4. Call Hold and Mute in RTCWeb and SIP Domains An RTCWeb domain user should be able to call to, or be called by, a SIP Enterprise or Service Provider and put their call on hold or mute, and un-hold/un-mute it at any time; and have their call put on hold/mute by the SIP side. This use-case may seem obvious and non-problematic, since SDP has direction attributes to indicate inactive/sendonly/recvonly for such things. A call-hold case, for example, is often performed by sending an SDP offer with a sendonly direction attribute and muting the local inputs. There are subtle issues, however, depending on whether RTCP is required, as well as depending on the RTCWeb/WebRTC API design and architecture. 5.4.1 Legacy Call-Hold Devices Impacting RTCP From a legacy deployment perspective, there are still SIP devices which generate SDP with a connection address of 0.0.0.0 to indicate call hold, and expect to receive such to be put on-hold. SIP B2BUA middleboxes already interwork such cases to/from an SDP sendonly or inactive direction mode, but the device receiving the SDP connection address of 0.0.0.0 will not generate RTCP until the call is taken off hold. Therefore, if RTCWeb requires Browsers to receive RTCP as a consent-refresh to continue the call, the call will fail if it is put on hold too long. To avoid the call failure, the Interworking Function may have to generate RTCP, which is complicated and thus expensive. 5.4.2 RTP Generation when on Hold or Mute Another potential issue depends on what the Browser does when Javascript tells it to put the session on mute (i.e., disable the microphone/camera inputs), or full hold (i.e., also stop rendering received media). If the Browser stops generating RTP, but does not send SDP to the SIP domain indicating such, the call may fail. The reason for this is that many SIP Enterprises and Service Providers have middleboxes in various locations, which detect an absence of RTP packets for a sendrecv-mode call as a call failure, and will tear the call down by issuing BYEs. Therefore, if an RTCWeb user puts a call on mute or hold by no longer generating RTP but does not send SDP to the SIP domain indicating the appropriate direction attribute, the call will be terminated eventually by the SIP domain. Kaplan Expires - April 2011 [Page 12] Internet-Draft RTCWeb-SIP Interworking Requirements October 2011 One way to avoid this is to offer the ability for the Javascript to tell the Browser to turn off the microphone/camera inputs, while still generating RTP packets. 5.4.3 Clipping with Off-hold/off-mute Another issue is the clipping that can occur when taking a call off hold or off mute. If the SIP user puts a call on hold, and a new SDP Offer is sent with a direction attribute of sendonly, and some time later the user takes the call off hold, it will take some time to get a new SDP Offer to the RTCWeb side Browser; the extra time it takes may cause clipping: the RTCWeb user will be able to hear/see but not speak/be-seen for a bit. Likewise for the reverse direction: if the RTCWeb user puts the call on/off-hold. In SIP, this generally doesn't take too long because the signaling is over UDP, on managed networks, going through tightly managed servers. In RTCWeb, it will likely be over lossy access mediums, over TCP, across the public Internet, and through Web Servers performing a lot of other functions. A clever Web-Application developer, therefore, might realize that clipping can be avoided by not notifying the Browser of any direction change when the call is put on hold from SIP; such a developer could have the Javascript change the SDP Offer before giving it to the Browser, to be sendrecv. What's needed, then, is the ability to tell the Browser not to render received from the on-hold Browser and not send it to the peer, so the peer never stops sending RTP to the on-hold Browser; or the developer could be even too clever and send the direction information separately in a direct data channel, for example. 5.5. Call Transfer in SIP Domain An RTCWeb domain user should be able to call to, or be called by, an Enterprise or Service Provider and have their call transferred to another user in the same or different Enterprise or Service Provider. In the SIP signaling architecture model, this should either require the SIP domain to issue a REFER request to the RTCWeb domain's logical SIP UA, to tell the logical UA to generate an INVITE to the new party; or it should require the SIP domain to issue an INVITE with Replaces header to the RTCWeb domain's logical SIP UA, to replace the original dialog. In the former case, it requires the RTCWeb application to issue a new SDP Offer for a new session; in the latter case it causes the RTCWeb application to receive an SDP Offer for a new session. In both cases, however, the general expectation of users is that the media impacts are minimal or non- existent: they may hear a short-duration click or nothing at all Kaplan Expires - April 2011 [Page 13] Internet-Draft RTCWeb-SIP Interworking Requirements October 2011 when the audio party changes. Likewise they would probably expect to see the new transferred-to party in the same video window. In practice for audio-only calls, it is quite common for the SIP transfer to occur without the transferred UA being aware of it, by having the REFER and INVITE signaling from the transferor/transferred-to be locally processed by B2BUAs, such as a PBX, Application Server or SBC. It is not very common, for example, to send REFER or INVITE with Replaces-header SIP Requests across SIP Enterprise-to-Service-Provider trunks or between Service Providers. In practice, therefore, SIP and SDP signaling may not be sent to the RTCWeb domain for this call transfer use-case. The RTP media source will change inside the Enterprise or Service Provider, of course, but the change is hidden by the transfer- processing B2BUA, at least at an IP:port transport layer. At an audio codec and RTP layer, however, the change is frequently not hidden, and the result is the transferred party suddenly starts receiving RTP/RTCP packets from a new SSRC, sequence number space, timestamp, CNAME, etc. The same Payload Type and codec is used, of course. Naturally, this assumes SRTP is not used or not used end- to-end (i.e., it may be terminated at the transfer-processing B2BUA). From an RTCWeb interworking perspective, what this means is that the Browser has to be able to receive a new SSRC and timestamp/sequence number space from the Interworking Function, without receiving a new SDP Offer, without changing SRTP keys, and without ICE re- negotiation. Note that this use-case describes Call Transfer cases, but similar media-plane behavior sometimes occurs in Call Park and Pickup, Find- Me-Follow-Me, Call Hunting, Rich-Ringtone, and Voicemail fallback cases. 5.6. Audio/Video Call Transfer An RTCWeb domain user should be able to call to, or be called by, an Enterprise or Service Provider and transfer their RTCWeb call to another user in the same or different RTCWeb domain, SIP Enterprise or Service Provider. This is similar to the previous use-case but the RTCWeb user is now the transferor. In the SIP signaling architecture model, this should either require the RTCWeb domain to issue a REFER request to the SIP domain, to tell the logical UA to generate an INVITE to the new party; or it should require the SIP domain to issue an INVITE with Replaces header to the RTCWeb domain's logical SIP UA, to replace the original dialog. In the former case, it requires the RTCWeb Kaplan Expires - April 2011 [Page 14] Internet-Draft RTCWeb-SIP Interworking Requirements October 2011 application to issue a new SDP Offer for a new session; in the latter case it causes the RTCWeb application to receive an SDP Offer for a new session. In both cases, however, the general expectation of users is that the media impacts are minimal or non-existent: they may hear a short-duration click or nothing at all when the audio party changes, and they likely expect the video rendering to replace the previous video in the same window, even though the incoming SDP Offer is for a new logical session. 5.7. Find-Me-Follow-Me in SIP Domain An RTCWeb domain user should be able to call to a SIP Enterprise or Service Provider and have their call find the target user in the same or different Enterprise or Service Provider, with a SIP Find- Me-Follow-Me service (FMFM). FMFM service is similar to Call Hunting and Call Forwarding services, but with the caller hearing a "Please wait while we try to locate your party" type announcement message. (Note that Call Hunting and Call Forwarding services sometimes do this as well, in which case they're the same as FMFM) A common method of providing FMFM is for the SIP INVITE to be logically or physically forked to a media server that generates the announcement; the media server sends back a 18x response with an initial SDP Answer, and then when the final UAS is reached the UAS sends a 200 response with a final SDP Answer. To the SIP UAC (i.e., the Web Server), it often appears as a parallel-forked call case. Therefore the RTCWeb model must support forked SIP calls, with two or more SDP Answers for a given Offer. It is likely that Web- Application developers will want this type of behavior as well, even for RTCWeb uses that do not go to SIP. From an SDP offer/answer perspective, this means RTCWeb needs to support multiple, provisional SDP Answers. How it does so is beyond the scope of this document. From a media perspective, this means the RTCWeb Browser needs to be able to receive and render media from different IP/RTP peers on the same local listen IP:port at different times, without having generated nor received a new SDP Offer in-between. Note that this use-case describes FMFM cases, but similar media- plane behavior sometimes occurs in Call Park and Pickup, Call Hunting, Rich-Ringtone, and Voicemail fallback cases. It should also be noted that some media servers generate the announcement message without sending a provisional 18x response with SDP Answer. Such servers won't function correctly with UAs behind NATs anyway, since an SDP Answer has to be sent to perform either ICE or SBC-type Latching; and many PSTN Gateways won't accept media Kaplan Expires - April 2011 [Page 15] Internet-Draft RTCWeb-SIP Interworking Requirements October 2011 until they get an SDP Answer either. Therefore, such media servers have issues even in SIP, and can be effectively ignored for the purposes of this document. 5.8. Video in SIP Domain An RTCWeb domain user should be able to make a video call to, or be called by, a SIP Enterprise or Service Provider. While video is not nearly as ubiquitous in SIP as audio-only calls, it does exist and is a growing market, particularly now that most video-conferencing vendors (both terminals and MCUs) have shifted from H.323 to SIP. 5.8.1 Video and SIP/SDP From a SIP perspective there is nothing unique about this use-case; but from an SDP perspective some video MCUs use the [SDP-CAP-NEG] SDP capability negotiation mechanism. The author believes this should not pose a problem for RTCWeb, as [SDP-CAP-NEG] is backwards- compatible with basic [SDP-CAP-NEG] SDP and reverts to using it. [Note: what are the impacts for video-conf calls if SDP-CAP-NEG is not used? Video MCU vendors need to be consulted] 5.8.2 Video Codec Compatibility Codec compatibility is a concern because transcoding video codecs in the Interworking Function would be prohibitively expensive: DSPs don't scale well for video, and are very expensive. If the currently used video codecs in SIP are all encumbered by royalties, then the author recognizes this may not be a solvable problem for Browsers. 5.8.3 Separate Video RTP Stream SIP-based video terminals/MCUs use separate RTP sessions, in separate UDP port numbers, for video vs. audio media. Furthermore, some use separate video RTP sessions for separate cameras/screens, while some use the same one and de-multiplex using SSRC. [Note: this latter use is believed but not known by the author] 5.8.4 Video RTP Packet Size Video-codec RTP packet size is a concern if IP-layer fragmentation occurs, because many NATs and middleboxes discard IP fragments; otherwise they would have to re-assemble them to correctly process the whole UDP packet, and such re-assembly is processing intensive. Carrier Grade NATs (CGNs), consumer NATs, and Firewalls, have similar behavior, and thus this is an issue for RTCWeb video usage in general on the public Internet. Kaplan Expires - April 2011 [Page 16] Internet-Draft RTCWeb-SIP Interworking Requirements October 2011 In particular, although video codecs can "fragment" themselves at the codec layer, in deployed SIP and H.323 uses it has been found that some devices don't do so, resulting in IP-fragmented packets that get dropped along the way. Other devices constrain themselves to an IP MTU of 1500 bytes, without leaving overhead space for packet growth on the path, as can be caused by IPv4-to-IPv6 conversion, IPsec tunneling/VPNs, SSL-VPNs, etc. Unfortunately, path MTU discovery is not supported or used in practice. Therefore, the Browser's maximum codec packet size needs to be carefully thought out. 6. Signaling-plane Interworking Requirements REQ-ID DESCRIPTION ---------------------------------------------------------------- A1-1 RTCWeb MUST provide a means for a sent SIP SDP Offer to be forked and receive multiple SDP Answers; how RTCWeb accomplishes this internally is up to the RTCWEB WG, and need not require SDP be used in RTCWeb. ---------------------------------------------------------------- A1-2 RTCWeb MUST provide a means for a received SIP SDP Offer to be Answered to a completion state; i.e., that the SIP-side can know to send a final SDP Answer back to the SIP domain, either in a 200 OK or reliable provisional response. ---------------------------------------------------------------- A1-3 RTCWeb MUST provide a means for a received session request to be requested without an SDP Offer, and to send an SDP Offer from RTCWeb back to the SIP side; i.e., that the SIP-side can receive a SIP INVITE without SDP, and be able to send back SDP Offer in a response. ---------------------------------------------------------------- A1-4 RTCWeb MUST provide a means for the Browser to indicate SRTP [SDES], [DTLS-SRTP], or RTP optionally in SDP. In other words either [SDP-CAP-NEG] or some similar mechanism, such as [draft-best-effort-srtp], in order to make an SDP Offer that offers both plaintext RTP and both types of SRTP key exchanges. Kaplan Expires - April 2011 [Page 17] Internet-Draft RTCWeb-SIP Interworking Requirements October 2011 7. Media-plane Interworking Requirements REQ-ID DESCRIPTION ---------------------------------------------------------------- A2-1 RTCWeb MUST provide a means for the Browser to generate and receive RTP and RTCP using UDP transport. ---------------------------------------------------------------- A2-2 RTCWeb Browsers MUST support the ability to use separate, distinct RTP sessions on separate UDP ports for separate media streams, such as audio vs. video. ---------------------------------------------------------------- A2-3 RTCWeb Browsers SHOULD support the ability to use the same UDP port for RTP and RTCP of the same media type, without needing to also multiplex media types on the same UDP port. ---------------------------------------------------------------- A2-4 RTCWeb SHOULD provide a means for the Browser to generate and receive RTP without having to perform ICE. ---------------------------------------------------------------- A2-5 RTCWeb MUST provide a means for the Browser to generate and receive RTP with an ICE-Lite peer. ---------------------------------------------------------------- A2-6 RTCWeb Browsers MUST support the G.711 PCMU and PCMA codecs for 10, 20, and 30ms packetization times. ---------------------------------------------------------------- A2-7 RTCWeb Browsers MUST support the G.729, G.722, G.722.1, AMR, and AMR-WB codecs. ---------------------------------------------------------------- A2-8 RTCWeb Browsers MUST support the H.263 and H.263+ codecs. ---------------------------------------------------------------- A2-9 RTCWeb Browsers MUST support the H.264-AVC and SVC codecs for Baseline profile. ---------------------------------------------------------------- A2-10 RTCWeb Browsers MUST support a minimum of QCIF, QSIF, CIF, and SIF resolutions, and optionally higher. ---------------------------------------------------------------- A2-11 RTCWeb Browsers MUST not generate RTP or RTCP packets larger than 1460 bytes at an IP layer using UDP transport. ---------------------------------------------------------------- Kaplan Expires - April 2011 [Page 18] Internet-Draft RTCWeb-SIP Interworking Requirements October 2011 A2-12 RTCWeb MUST provide a means for the Browser to generate and receive RTP without receiving RTCP, for at least the G.711 PCMU and PCMA codecs. ---------------------------------------------------------------- A2-13 RTCWeb MUST provide a means for the Browser to generate [RFC4733] DMTF RTP Events for at least the events 0-15, in an audio-type RTP packet stream. ---------------------------------------------------------------- A2-14 RTCWeb MAY provide a means for the Browser to receive [RFC4733] DMTF RTP Events for the events 0-15. ---------------------------------------------------------------- A2-15 RTCWeb MUST provide a means for the Javascript application to invoke [RFC4733] DTMF events to be generated, and their duration, with a default duration of 50ms. In other words, the Javascript should be able to tell the Browser to generate event "0" for 50ms based on a button click, for example. ---------------------------------------------------------------- A2-16 RTCWeb MUST provide a means for the Javascript application to enable or disable [RFC4733] use, per session. ---------------------------------------------------------------- A2-17 RTCWeb MUST provide a means for the Browser to generate and receive RTP and RTCP over UDP without using SRTP. ---------------------------------------------------------------- A2-18 RTCWeb MUST provide a means for the Browser to generate and receive SRTP using [SDES]; at least if the Web-Server connection is HTTPS. ---------------------------------------------------------------- A2-19 RTCWeb MUST provide a means for the Browser to receive RTP/RTCP from a different peer RTP stack instance, over the same IP and port 5-tuple, at any time. In other words, the SSRC, timestamp, sequence number space, etc., may change during the lifetime of receiving a remote stream, without the remote IP:port nor SRTP key changing, and without ICE restarting. ---------------------------------------------------------------- Kaplan Expires - April 2011 [Page 19] Internet-Draft RTCWeb-SIP Interworking Requirements October 2011 8. Security Considerations From a SIP-signaling perspective, this document makes no requirements which impact SIP-signaling security. SIP over TLS may be used, or not, depending on what the RTCWeb domain and SIP Enterprise or Service Provider supports, with the usual security issues and implications. If [RFC4474] is used, the Interworking Function would likely need to change SDP and thus break the signature, and would have to verify and re-sign the request using a certificate it owns. Or the Interworking Function could also be the trusted signer and verifier for a domain to begin with, in which case it signs and verifier only once. In practice, [RFC4474] is not used by most SIP Service Providers and Enterprises, so it does not matter. From a media-plane perspective, the difficulty of communicating with deployed SIP devices using SRTP is discussed in section 5.2. The idea of not requiring SRTP be used for all sessions is controversial, but the author believes if the RTCWeb Web-Server and Browser are not using HTTPS but only plaintext HTTP, then a user should not expect the session to be secure; thus, at least in this case, SRTP should be optional. When HTTPS is being used, the idea of not using SRTP becomes less appealing as the user likely expects the session to be secure; but in such a case optionally using [SDES] would also seem more reasonable than only allowing [DTLS-SRTP]. Technically, [SDES] is less secure than [DTLS-SRTP] in the sense that the RTCWeb Web-Server and Javascript can view the keys; and with [DTLS-SRTP] the user could verify the session is secure end-to- end by manually checking the fingerprint and asking the far-end user if they sent it. Unless the user actually performs the manual inspection and verification, however, [DTLS-SRTP] proves no more than [SDES] does, since the Javascript could have maliciously sent the call through a Man-in-the-Middle that terminated the DTLS-key- based SRTP. In fact, in order to interwork with deployed SIP devices it would have to use a middleman: the Interworking Function itself. Therefore, there is little to gain by not just supporting [SDES] as well as [DTLS-SRTP]; those users who wish to verify the security can still do so, in exactly the same way they would verify [DTLS-SRTP] fingerprints, and see there is no fingerprint to verify, with appropriate text explaining why. 9. IANA Considerations This document makes no request of IANA. Kaplan Expires - April 2011 [Page 20] Internet-Draft RTCWeb-SIP Interworking Requirements October 2011 10. Acknowledgments Thanks to Xavier Marjou and Victor Pascual for input and feedback. Funding for the RFC Editor function is provided by the IETF Administrative Support Activity (IASA). 11. References 11.1. Informative References [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002. [RFC4566] Handley, M., Jacobson, V., Perkins, C., "SDP: Session Description Protocol", RFC 4566, July 2006. [ICE] Rosenberg, J., "Interactive Connectivity Establishment (ICE): A Protocol for Network Address Translator (NAT) Traversal for Offer/Answer Protocols", RFC 5245, March 2010. [SDES] Andreasen, F., Baugher, M., and D. Wing, "Session Description Protocol (SDP) Security Descriptions for Media Streams", RFC 4568, July 2006. [RFC4733] Schulzrinne, H., Taylor, T., "RTP Payload for DTMF Digits, Telephony Tones, and Telephony Signals", RFC 4733, December 2006. [KPML] Burger, E., Dolly M., "A Session Initiation Protocol (SIP) Event Package for Key Press Stimulus (KPML)", RFC 4730, November 2006. [DTLS-SRTP] McGrew, D., Resocrla, E., " Datagram Transport Layer Security (DTLS) Extension to Establish Keys for the Secure Real-time Transport Protocol (SRTP)", RFC 5764, May 2010. [RFC4474] Peterson, J., Jennings, C., "Enhancements for Authenticated Identity Management in the Session Initiation Protocol (SIP)", RFC 4474, August 2006. [SDP-CAP-NEG] Andreasen, F., "Session Description Protocol (SDP) Capability Negotiation", RFC 5939, September 2010. [draft-best-effort-srtp] Kaplan, H., Audet, F., "Session Description Protocol (SDP) Offer/Answer Negotiation For Best-Effort Secure Real-Time Transport Protocol", draft-kaplan-mmusic-best-effort- srtp-01, October 2006. Kaplan Expires - April 2011 [Page 21] Internet-Draft RTCWeb-SIP Interworking Requirements October 2011 [draft-use-cases] Holmberg, C., Hakansson, S., Eriksson, G., "Web Real-Time Communication Use-cases and Requirements", draft- ietf-rtcweb-use-cases-and-requirements-06, October 4, 2011. Author's Address Hadriel Kaplan Acme Packet Email: hkaplan@acmepacket.com Kaplan Expires - April 2011 [Page 22]