Internet Engineering Task Force SIPPING WG Internet Draft J.Rosenberg dynamicsoft draft-rosenberg-sipping-session-policy-00.txt May 2, 2002 Expires: November 2002 Supporting Intermediary Session Policies in SIP STATUS OF THIS MEMO This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt To view the list Internet-Draft Shadow Directories, see http://www.ietf.org/shadow.html. Abstract The Session Initiation Protocol (SIP) was designed to support establishment and maintenance of end-to-end sessions. Proxy servers provide call routing, authentication and authorization, mobility, and other signaling services that are independent of the session. Effectively, proxies provide signaling policy enforcement. However, numerous scenarios have arisen which require the involvement of proxies in some aspect of the session policy. SIP has no support for such capabilities, as the community has generally considering involvement of proxies in session details "evil". Practical implementations have therefore resorted to non-standard manipulation of SDP messages in order to enforce session policy. These implementations are fragile and frought with problems. In this document, we discuss a middle-ground approach which permits proxies limited involvement in session policy, but retains the robustness that derives from the current prohibition on SDP manipulation. J.Rosenberg [Page 1] Internet Draft Session Policy May 2, 2002 Table of Contents 1 Introduction ........................................ 3 2 Problems with Existing Situation .................... 4 3 Requirements ........................................ 5 4 Solution Framework .................................. 6 5 Supporting Media Intermediaries ..................... 10 5.1 Media-Stream Header ................................. 10 5.2 Media-Middlebox Header .............................. 11 5.3 Reverse-MM-Policy ................................... 12 5.4 UAC Behavior ........................................ 12 5.4.1 Generating the Request .............................. 12 5.4.2 Processing the Response ............................. 13 5.5 UAS Behavior ........................................ 14 5.5.1 Receiving the INVITE or UPDATE ...................... 14 5.5.2 Receiving the ACK ................................... 14 5.6 Proxy Behavior ...................................... 15 5.6.1 Receiving a Request ................................. 15 5.6.2 Receiving a Response ................................ 15 6 Example Call Flows .................................. 15 6.1 Example I: IP-in-IP NAT ............................. 15 6.2 Example II: Traditional MIDCOM ...................... 20 6.3 Example III: SIP Message Sessions ................... 24 7 Author's Addresses .................................. 27 8 Normative References ................................ 27 9 Informative References .............................. 28 J.Rosenberg [Page 2] Internet Draft Session Policy May 2, 2002 1 Introduction The Session Initiation Protocol (SIP) [1] was designed to support establishment and maintenance of end-to-end sessions. Proxy servers provide call routing, authentication and authorization, mobility, and other signaling services that are independent of the session. Effectively, proxies provide signaling policy enforcement. However, numerous scenarios have arisen which require the involvement of proxies in some aspect of the session policy. One scenario is in the traversal of a firewall or NAT. The midcom group has defined a framework for control of firewalls and NATs (generically, middleboxes) [4]. In this model, a midcom agent, typically a proxy server, interacts with the middlebox to open and close media pinholes, obtain NAT bindings, and so on. In this role as a midcom agent, the proxy will need to examine and possibly modify the session description in the body of the SIP message. This modification is to achieve a specific policy objective: to force the media to route through an intermediary. In another application, SIP is used in a wireless network. The network provider has limited resources for media traffic. During periods of high activity, the provider would like to restrict codec usage on the network to lower rate codecs. In existing approaches used in 3gpp, this is accomplished by having the proxies edit the SDP in the body, removing the higher rate codecs. In yet a third application, SIP is used in a network that has gateways which support a single codec type (say, G.729). When communicating with a partner network that uses gateways with a different codec (say, G.723), the network modifies the SDP to route the session through a converter that changes the G.729 to G.723. All three applications require the proxies to examine, and/or manipulate the content of the session description in the body of SIP. However, such manipulation is forbidden by SIP proxies. It does not work when end-to-end encryption is applied. It introduces additional failure modes and fate sharing. It creates potential performance bottlenecks. There are other problems. Our solution is to introduce into SIP a framework that allows proxy servers to request media-level policy operations from user agents. In section 2, we discuss the problems associated with the manipulation of bodies by proxies, which have resulted in the prohibition from doing so in bis. In Section 3 we introduce requirements for a solution. In Section 4 we present our proposed framework. In Section 5 we present a SIP extension based on this framework, which allows for the insertion of intermediaries on the media path. J.Rosenberg [Page 3] Internet Draft Session Policy May 2, 2002 2 Problems with Existing Situation The bis specification explicitly disallows proxy servers from manipulating the content of bodies. This is at odds with the common industry practice of extensive manipulation of bodies by proxies. Although a common practice, it is at odds with the SIP specification for many reasons: End-to-End Encryption: SIP uses S/MIME to support end-to-end security security features. Authentication, message integrity, and encryption are provided. The encryption capabilities are important for end-to-end privacy services, for example. The end-to-end message integrity and authentication are important for preventing numerous attacks, including theft of calls, eavesdropping attacks, and so on. If end-to-end authentication is used, any manipulation of the body will cause the message integrity check to fail. If end-to-end encryption is used, the proxy won't even be able to look at the SDP to modify it. In this case, media may not function, and the call will fail. Require Processing: A UA may require that an extension be applied to the SDP body. This is accomplished by including a Require header in the SIP message. Proxies do not look at such headers. If the proxy processes the SDP without understanding the extension, it may improperly modify the SDP, resulting in a call failure. Consent: Ultimately, end users need to be in control of the media they send. If a user makes a call through a SIP network, they have the expectation that their media is delivered to the recipient. By having proxies modify the SDP in some way, they act in ways outside of expected behavior of the system. Future Proofing: One of the benefits of the SIP architecture is that only the endpoints need to understand sessions, session descriptions, bodies, and so on. This facilitates the use of proxy networks to provide communications services for future session types, such as games and messaging. However, if proxies require an understanding of session types and session descriptions, the SIP network becomes locked in to providing features for a particular set of session types. If a new session description protocol, such as SDPng [5], were introduced, calls would not function even though the endpoints support SDPng. Furthermore, it would be hard to determine why it did not function, since the failure would occur transparently in J.Rosenberg [Page 4] Internet Draft Session Policy May 2, 2002 some proxy in the middle of the network. Robustness: Having a proxy manipulate the body introduces a host of new failure modes into the network. Firstly, the proxy itself will need to have state in some form in order to properly manipulate the SDP. This means that, should the proxy fail, the call may not be able to continue. Secondly, proxies typically won't enforce the media policy. Rather, they leave that to some media middlebox somewhere on the media path. This media middlebox may fail as well. Since the user does not know of its existence, they may not be able to detect this failure or retry the media path around it. Scalability: One of the reasons SIP scales so well is that proxies don't have to be aware of the details of the sessions being established through them. If a proxy needs to examine and/or manipulate session descriptions, this could require many additional processing steps. The proxy may need to traverse a multi-part body to find the SDP, in the case of SIP-T [6]. The proxy will need to parse, modify, and possibly re-serialize the session description. All of this requires additional processing that worsens the performance of the proxies. We note that many of these problems are similar to those pointed out by the IAB regarding Open Pluggable Exchange Services (OPES) [7]. Indeed, the problems are similar. Both have to do with the involvement of intermediaries in manipulation of end-to-end content. Here, the content is not in the body itself, but is a session described by the body. We believe a better solution is needed. 3 Requirements In this section, we provide a set of requirements for solving this problem. 1. The solution should allow proxies to request specific media policies. At the least, these policies include insertion of intermediaries for firewall and NAT traversal, and modification of the codec set. 2. The solution should work even with end-to-end encryption and end-to-end authentication enabled. 3. The solution should not force a proxy to violate the SIP J.Rosenberg [Page 5] Internet Draft Session Policy May 2, 2002 specification. 4. The solution should not require substantial processing burden on the proxies. 5. The solution should support an explicit consent model, so that end users are aware of, and explicitly authorize, the media policies requested by proxies. 6. The solution should not require proxies to understand a specific type of session description (i.e., SDP or SDPng). 7. The solution should allow end systems to detect, and route around, failures of media enforcement points. 8. The solution must not require that the SIP elements be in the same administrative domain as the media processing elements. 9. The solution should support the addition of new media policy functions in the future. 4 Solution Framework Our solution is based on extending the existing Record-Route/Route metholodology to media processing. Effectively, record routing is an expression of a proxies desire for signaling policy - namely, the inclusion of a signaling intermediary. Each proxy makes an independent policy request. These are added to the message, and passed to the end system. The end system is explicitly aware of the set of intermediate proxies on the call path. The proxy elements need not store this route as state. It is stored in the end systems, and pushed back into the network in Route headers. These is exactly the same thing we want to happen, but for session attributes. The basic model for the framework is shown in Figure 1. In this model, the caller (UA 1) sends an INVITE request. This request contains a set of Media Interface Objects (MIO). Each MIO is a description of a media aspect of the session being set up by the caller. For example, there might be an MIO for each the IP addresses and ports for each media stream, and an MIO for the set of codecs in each stream. The caller only inserts MIO's for those aspect of the session it wishes to permit the network to modify. For example, if the caller only wants the network to modify the codecs in the streams, it would only insert MIOs representing the codecs. J.Rosenberg [Page 6] Internet Draft Session Policy May 2, 2002 +------+ INVITE + MIO1 +--------+ INVITE + MIO1 + MFO1 +------+ | |---------------->| |---------------------->| | | | |proxy | | | | |200 + MIO2 + MFO2| | 200 + MIO2 | UA | | UA |<----------------| |<----------------------| | | | +--------+ | | | 1 | | 2 | | | | | | | | | | | | | | | RTP +--------+ | | | |---------------->| media |---------------------->| | | | |enforce | | | | | |point | RTP | | | |<----------------| |<----------------------| | +------+ +--------+ +------+ Figure 1: Session policy framework Since the MIOs are meant for manipulation by proxies, and since they are provided to enable a SIP feature (proxy insertion of session policy), the MIOs are carried as SIP headers in the INVITE request. The caller would also insert a SIP Supported header, indicating its ability to understand session policies. As the request traverses proxies, the proxies insert Media Filter Objects (MFO). The MFOs represent "diffs" that the proxy wants to apply to each MIO. These request session policy for media streams in the direction of the callee to caller. For example, if an MIO contains an IP address and port for receiving an audio stream, a proxy can insert an MIO which changes that address and port to that of a media intermediary. The proxy does not modify the MIO - that is J.Rosenberg [Page 7] Internet Draft Session Policy May 2, 2002 fundamental. Indeed, the MIO could, and should, be protected by end- to-end security measures. By specifying diffs to the MIO rather than directly modifying it, we enable an explicit consent and knowledge model. The UA can know exactly which policies where requested against the session. If a proxy inserts an MFO, it can also insert a Require header into the request. This would make sure the request fails if the UAS does not understand session policies. Not all session policies will require a Require header. Policies could be optional, in which case the Require header would not be needed. If the request should fail, the proxy would retry the request using mechanisms that would be backwards compatible with older endpoints (such as modification of the SDP). Like the MIO, the MFO will be represented in a SIP header. Each proxy can insert its own MFO. In that case, it "pushes" its MFO on top of the set of existing MFOs, much like Record-Route headers are pushed into a request. Each MFO also contains the identity of the domain which requested the policy. The MFO could also contain a signature, generated by the domain which inserted the MFO. This would allow the UA to verify the identities of the domains which have requested session policy, and to verify the integrity of those policies. Perhaps most interestingly, the MFO can specify loose routing mechanisms that should be used to deliver the media to media intermediary. Just like the Route headers allow the UA to specify the set of hops for signaling, tunneling protocols, such as IP-in-IP, or IP loose source routing, would allow those approaches to be applied to media delivery. This would have the important benefit of releaving the network from maintenance of any state. It is also very important that the MFO not be an actual diff, in the unix sense. This is because it is important that the UA understand the semantics of the requested policy, not just the syntatical change that is needed to affect that policy. When the request reaches the UAS, the UAS examines the MIOs and MFOs in the request. It will know exactly what the UAC indicated, and know exactly which policies have been requested by intermediate domains. If those policies are unacceptable, it can generate an error response with an indication of which policies were not acceptable. Proxies receiving this error response could attempt to retry with a different policy, or just pass the error response upstream. The error response would arrive at the UAC, with a full list of the set of requested policies. This would allow the UAC to know what happened to their request, and why it failed. J.Rosenberg [Page 8] Internet Draft Session Policy May 2, 2002 If, however, the policies are acceptable to the UAS, and it accepts the call, it generates a 200 OK. That 200 OK contains two things. First, it contains its own set of MIOs for its side of the session. It also contains the set of MFOs from the request, copied into the response. These are purely informational, for the benefit of the UAC. They are end-to-end, and not meant for modification by proxies. In fact, they could (and should) be protected by end-to-end integrity mechanisms. This would ensure that proxies cannot request policies without having the UAC become aware of those policies. As the response travels back to the UAC, proxies can insert MFOs that request modification of the session in the caller to callee direction. Just like the MFOs in the forward direction, these are pushed into the request, and are formatted and interpreted identically to those in the request. When the UAC receives this response, it can either reject or accept the policies. If it accepts, the ACK contains a copy of the MFOs from the response. If it rejects, the UAC ACKs, but it also sends a BYE. The BYE contains a reason code indicating that the call was terminated because of unacceptable MFOs. The BYE could also contain the list of MFOs from the 200 OK response. Both endpoints then apply the media policies to the media streams they generate. This may involve, for example, sending media to an intermediary indicated in an MFO. Since the endpoints know about the full set of intermediaries, they have many options in the event of a failure (detected through an ICMP error, for example). The UA can try to send the media to the next intermediary on the path. Or, if the MFO specifies the intermediaries as a FQDN instead of an IP address, the UA can attempt to use DNS to find an alternative, and begin routing media through that. The same mechanism could be repeated in a re-INVITE, allowing for mid-session modification of policies. This framework meets the requirements outlined in Section 3: 1. The solution allows proxies to request specific media policies. This is accomplished through the insertion of MFOs into the requests and the responses. 2. Since the solution does not require modification of the bodies or the headers of the request, it works with end- to-end encryption and authentication. 3. Since the solution does not require proxies to do anything but insert a header (no inspection or processing of the J.Rosenberg [Page 9] Internet Draft Session Policy May 2, 2002 body), it requires much less processing than existing solutions. 4. The solution is well within the scope of the SIP specification. There is no modification of bodies, or even modification of headers inserted by the UA. 5. An explicit consent model is supported. The UAS can reject the policies requested for the media it generates, and it can learn about the policies requested for the media generated by the UAC. The UAC can reject the policies requested for the media it generates, and it can learn about the policies requested for the media generated by the UAS. 6. The solution does not depend on interpretation of the session description in the body. 7. Since the endpoints have complete knowledge of the media policies requested by the network, they can route around any failures by using an alternate (detected by DNS), or by sending the media to the next media intermediary on the path. 8. The solution does not require the SIP elements to be in the same domain as the media processing elements. 9. The framework supports a wide variety of media policies. 5 Supporting Media Intermediaries In this section, we describe an initial protocol that instantiates the framework of Section 4 for insertion of media intermediaries. Media intermediaries are used for firewall and NAT traversal, enforcement of bandwidth usage, and so on. This protocol is not complete. It is meant to convey the basic idea on the usage of the framework to instantiate a particular protocol. 5.1 Media-Stream Header In this usage, the MIO is the IP address, port, and transport where the media stream is to be sent. This information is present in a new SIP header, the Media-Stream header. The header also contains an ID, which is a unique identifier for the stream. Media-Stream = stream-info *(COMMA stream-info) J.Rosenberg [Page 10] Internet Draft Session Policy May 2, 2002 stream-info = discrete-type *(SEMI stream-params) stream-params = address-param / port-param / transport-param / id-param address-param = "host" EQUAL (hostname / IPv4address / IPv6reference) port-param = "port" EQUAL port id-param = "id" EQUAL token The Media-Stream header is inserted by the UAC in an outgoing INVITE, and by the UAS in a 200 OK. An example Media-Stream header: Media-Stream: audio;id=7736ai;host=192.2.0.3;port=8876, video;id=hha9s8sd0;host=192.2.0.3;port=8878 This specifies two media streams and audio and a video stream. Both streams are sent to 192.2.0.3, but the audio is sent to port 8876 and the video to port 8878. These parameters would match the values in the SDP in the body. 5.2 Media-Middlebox Header In this usage of the framework, the MFO is the address, port, and transport of a media intermediary to be used for a particular stream. It is conveyed in a new SIP header, the Media-Middlebox header. This header contains, for a particular media stream (identified by the ID from the Media-Stream header), the address and port of the middlebox, the domain that has requested insertion of the middlebox, and a loose source routing protocol to reach that middlebox. Media-Middlebox = intermediary *(COMMA intermediary) intermediary = stream-id *(SEMI intermediary-params) stream-id = token intermediary-params = address-param / port-param / transport-param / lroute-param / domain-param lroute-param = "route" EQUAL route-protocols route-protocols = "ip-in-ip" / "ip-loose" / "media-specific" domain-param = "domain" EQUAL host J.Rosenberg [Page 11] Internet Draft Session Policy May 2, 2002 The loose routing parameter requires some further discussion. The purpose of the Media-Middlebox header is for a proxy to tell the UA to send the media for a particular stream through an IP address and port of the intermediary. Instead of merely sending the media there, the UA can instead specify a source route, which touches that intermediary, but also any other intermediaries and then the final recipient. Thus, if there are N hops, including the final recipient, there needs to be a way for the media stream to specify N destinations. This can be done in several ways: ip-in-ip: IP-in-IP tunneling [8] can be used to specify N hops of media travesal. The ultimate destination is specified in the destination IP of the innermost packet. Each subsequent hop results in another encapsulation, with the destination of that hop in the destination IP address of the packet. ip-loose: IP provides a loose routing mechanism that allows the sender of an IP datagram to specify a set of IP addresses that are to be visited on the way before reaching the final destination. media-specific: Media protocols can provide their own loose routing mechanism. If that is the case, the loose routing mechanism of that protocol is used. As an example, the IM Transport Protocol (IMTP) [9] uses SIP MESSAGE requests for sending IM. SIP provides its own loose routing mechanisms with the Route header. These can be used to direct the MESSAGE through the set of intermediaries. In the absence of a loose-routing mechanism, the media is instead just sent to the first media intermediary listed in the header. 5.3 Reverse-MM-Policy The Reverse-MM-Policy header conveys the middleboxes used in the path of media towards the recipient. This header is informational only. It is reflected in the 200 OK and ACK requests. Its syntax is identical to the Media-Middlebox header. 5.4 UAC Behavior 5.4.1 Generating the Request A UAC that supports this extension MUST insert a Supported header into an INVITE or UPDATE request with the option tag "middlebox". This indicates support for this extension, and willingless to let the network specify media intermediaries. J.Rosenberg [Page 12] Internet Draft Session Policy May 2, 2002 For each media stream being set up or modified by the request, there SHOULD be a Media-Stream header. The media type, address, port, and transport for the header SHOULD be copied from the media type, connection address, and port, and transport from the session description in the request. The UAC MUST include an id attribute for each media stream. This attribute MUST have a value that is unique within the session description. As a result, the session identifier (from the o line in SDP) along with the stream id attribute, specify a globally unique identifier for a media stream. 5.4.2 Processing the Response If the response is a 200 OK, it may contain a Require header with the value of "middlebox". In this case, the UAC is requested to use a media intermediary. There will be a Media-Stream header for each media stream in use for the session. The UAC SHOULD verify that these match the media streams from the session description. If they do not, the response may have been tampered with, and the UA SHOULD terminate the session with BYE (after ACKing, of course). If they do match, the UA checks for a Media-Middlebox header. It MUST traverse the list of Media-Middlebox header field values in reverse order. For each header field value, it looks for a matching id amongst the values of the Media-Stream header field. If there is a match, the identity of the intermediary is "pushed" into a stack associated with that media stream. When this process completes, the UAC will have a set of intermediaries to visit for each media stream. If this set of intermediaries is not acceptable, the UAC SHOULD ACK and then BYE the call. The BYE MAY contain a Reason header [10] indicating that the call was terminated because of unacceptable intermediaries. TBD: Specify the code, phrases, and a way to convey the specific objection. The 200 OK response will also contain the set of intermediaries that will be used on the media path from the callee to the UAC. This will be present in the Reverse-MM-Policy header in the 200 OK. If this is not acceptable, the UAC SHOULD ACK and then BYE the call. The BYE MAY contain a Reason header [11] indicating that the call was terminated because of unacceptable intermediaries. If the set of intermediaries is acceptable, when the UAC sends media on a stream, it sends it to the top intermediary in the stack. The media is sent using the transport protocol and loose routing mechanism (if any) specified. J.Rosenberg [Page 13] Internet Draft Session Policy May 2, 2002 The ACK generated by the UAC SHOULD contain a Reverse-MM-Policy header field. This header field contains the same value as the Media-Middlebox header field from the 200 OK. 5.5 UAS Behavior 5.5.1 Receiving the INVITE or UPDATE When the UAS receives an INVITE request, it may have a Require header indicating that the UAS must understand the media intermediary extension in order to process the request. In that case, the request will contain a Media-Stream header and a Media-Middlebox header. For each value in the Media-Stream header field, the UAS matches the stream with its counterpart in the session description in the body. Assuming it will otherwise generate an answer to the offer in the INVITE, the UAS discards any Media-Stream header field values corresponding to media streams disabled (by setting the port to zero) in the SDP in the answer. The resulting set of Media-Stream header field values are called the working set. The UAS then begins processing the values of the Media-Middlebox header in reverse order. For each value, the UAS finds the matching stream in the working set (the match is based on the id attribute in the Media-Middlebox value). The Media-Middlebox value is then pushed into a stack associated with the matching value from the working set. When the process is complete, there is a stack of intermediaries specified for each media stream accepted by the UAS. If the set of middleboxes is not acceptable to the UAS, it MAY reject the response with a TBD response code. This response can contain Warning headers indicating the specific reasons for rejection. If the set of middleboxes is acceptable, the UAS generates an answer (in the 2xx, or a reliable provisional response [12]). This response contains a Reverse-MM-Policy header that mirrors the value of the Media-Middlebox header from the request. The response also contains a Media-Stream header, containing a value for each stream used in the answer. The response MUST contain a Require header with the value "middlebox" in order to indicate that media policies were applied to the request. When the UAS sends media, it sends it to the top middlebox in the stack, using the address, port, transport, and optionally loose route specified by that policy. 5.5.2 Receiving the ACK J.Rosenberg [Page 14] Internet Draft Session Policy May 2, 2002 The ACK request will contain a Reverse-MM-Policy header that informs the UAS of the media policies used to route requests from the caller to itself. If this set is not acceptable, the UAS MAY generate a BYE to send the session. 5.6 Proxy Behavior 5.6.1 Receiving a Request When a proxy receives an INVITE or UPDATE request with a Supported header with the value middlebox, it knows it can attempt to use media policies on this request. To do so, it inserts a value into the Media-Middlebox header (adding the header field if not present) at the top for each stream it wishes to apply media processing for. The streams are identified with the Media-Stream header in the request. The proxy MAY insert multiple media policies for the same stream. The proxy MAY insert a Require header into the request, with the value "middlebox", if it insists that the UAS understand the extension in order to continue with the session. If the result is a 420 response, the UAC SHOULD retry the request without the media policy. 5.6.2 Receiving a Response When a proxy receives a response to an INVITE or UPDATE request that contained a Supported header with the value middlebox, and the response contains a Require header with the value middlebox, the proxy MAY insert values into the Media-Middlebox header (adding the header field if not present) at the top, for each stream it wishes to apply processing for. The streams are identified with the Media- Stream header in the response. The proxy MAY insert multiple media policies for the same stream. 6 Example Call Flows The framework and the protocol are best explained through some examples. We provide three example flows here. 6.1 Example I: IP-in-IP NAT This configuration is shown in Figure 2. The caller, UA1, is on the public Internet. It wishes to call a user, UA2, sip:user2@foo.com. The foo.com domain is running on a net-10 network. The network has a single multi-homed proxy server, and it has a multi-homed router for media processing. The router has a public interface of 1.2.3.4. The flow for the call is shown in Figure 3. In message 1, the caller J.Rosenberg [Page 15] Internet Draft Session Policy May 2, 2002 ................................ . . . . +--------+ . | | . | Proxy | net10 . | | Network . | | . +--------+ . . . . . +--------+ . | Multi | . |Homed | . | Router | . | | . +--------+ . +--------+ . +--------+ . | | . | | . | UA | . | UA | . | 1 | . | 2 | . | | . | | . +--------+ .foo.com +--------+ . ................................ Figure 2: IP-in-IP NAT Configuration sends an INVITE. This INVITE looks like, in part: INVITE sip:user2@foo.com SIP/2.0 Supported: middlebox Media-Stream: audio;address=9.8.7.6;port=1288;id=fxx9;transport=udp Content-Type: application/sdp Content-Length: ... v=0 o=alice 2890844526 2890844526 IN IP4 host.anywhere.com s= c=IN IP4 9.8.7.6 J.Rosenberg [Page 16] Internet Draft Session Policy May 2, 2002 UA 1 Proxy router UA 2 | | | | |(1) INVITE | | | |MS:audio 9.8.7.6:1288 | | |------------------>| | | | | | | | | | | |(2) 100 | | | |<------------------| | | | | | | | |(3) INVITE | | | |MS:audio 9.8.7.6:1288 | | |-------------------------------------->| | | | | | |(4) 200 OK | | | |MS:audio 10.0.1.1:7788 | | |<--------------------------------------| |(5) 200 OK | | | |MS:audio 10.0.1.1:7788 | | |MM:audio 1.2.3.4;ipinip | | |<------------------| | | | | | | | | | | |(6) ACK | | | |------------------>| | | | | | | | | | | | |(7) ACK | | | |-------------------------------------->| | | | | |(8) IPinIP | | | |inner 10.0.1.1:7788| | | |-------------------------------------->| | | | | | | | | | | | |(9) RTP | | | |------------------>| | | | | | | | | | | |(10) RTP | | | |<------------------| | | | | | | | | |(11) RTP | | | |<--------------------------------------| | | | | | | | | | | | | | | | | | Figure 3: IP-in-IP Flow J.Rosenberg [Page 17] Internet Draft Session Policy May 2, 2002 t=0 0 m=audio 1288 RTP/AVP 0 This is passed to the foo.com proxy. The proxy does not require the specific usage of an intermediary for media from the callee (who is within foo.com) to the caller. Therefore, it merely proxies the request after a registration lookup. This request (3) arrives at the UAS. The UAS decides to accept the session. It generates a 200 OK with its own Media-Stream headers (4), which looks like, in part: SIP/2.0 200 OK Supported: middlebox Media-Stream: audio;address=10.0.1.1;port=7788;id=jhh7;transport=udp Content-Type: application/sdp Content-Length: ... v=0 o=bob 2890887s 2890686626 IN IP4 10.0.1.1 s= c=IN IP4 10.0.1.1 t=0 0 m=audio 7788 RTP/AVP 0 This is received by the proxy. The proxy knows it needs to have media destined for this UA pass through the multi-homed router. To do that, it requests the caller to use IP-in-IP encapsulation. So, it adds a Media-Middlebox header to the response (5): SIP/2.0 200 OK Require: middlebox Media-Middlebox: jhh7;address=1.2.3.4;route="ip-in-ip" Media-Stream: audio;address=10.0.1.1;port=7788;id=jhh7;transport=udp Content-Type: application/sdp Content-Length: ... v=0 o=bob 2890887s 2890686626 IN IP4 10.0.1.1 s= c=IN IP4 10.0.1.1 t=0 0 m=audio 7788 RTP/AVP 0 J.Rosenberg [Page 18] Internet Draft Session Policy May 2, 2002 This arrives at the UAC. The UAC generates an ACK (6), which contains a Reverse-MM-Policy header which mirrors the Media-Middlebox header from the 200 OK: ACK sip:ua2@10.0.1.1 SIP/2.0 Route: sip:1.2.3.3 Reverse-MM-Policy: jhh7;address=1.2.3.4;route="ip-in-ip" The UAC then sends media. To do so, it generates an IP datagram with the destination IP address 1.2.3.4. The protocol is IP-in-IP. The inner datagram is a UDP packet, with destination 10.0.1.1, port 7788. This packet is sent to 1.2.3.4 (8), which arrives at the router. The router decapsulates the packet, and forwards the innermost packet. This packet is destined from 10.0.1.1, which is reachable from its internal interface. It sends it there (9), and the media arrives at UA 2. In the reverse direction, the callee sends packets to 9.8.7.6. These pass through the router, which NATs the source address, and forwards them on to the caller. The most interesting aspect of this flow is that there was no MIDCOM protocol needed at all! There is no state stored in either the proxy, or in the router. This is because the "state", in this case, the binding between a public address and private one, has been pushed to the end systems, and sent back into the network through the IP-in-IP encapsulation. This mechanism can be considered a cross between RSIP [13] (which also uses tunneling) and midcom (which has proxies modifying messages). The drawbacks of the use of IP-in-IP tunneling here are clear. First, there is an additional 12 byte overhead per packet for the additional IP header. The second drawback is the slow-path processing which is likely to be seen at the router for decapsulation and forwarding. This may limit the volume of traffic that can be supported on any router. Interestingly, this problem is easily resolved through load balancing. Instead of including an IP address in the Media-Middlebox header, the proxy can include a domain name which contains multiple SRV records, one for each router being used. The clients can perform a randomized selection amongst the records, distributing the load across routers with very little additional overhead. Failover is provided in the same way. If the IP-in-IP packet generates an ICMP error, the caller knows that the intermediary failed. It can then use a different DNS record for an alternate. This results in highly robust and scalable operation. Another drawback of this approach, however, is that it doesn't J.Rosenberg [Page 19] Internet Draft Session Policy May 2, 2002 provide any media policy enforcement, per se. That is, it is useful strictly for NAT. No firewall or policy enforcement is provided. Indeed, an attacker can send packets into the private network, without call setup. They merely send an IP-in-IP packet, with the outermost address equal to the router interface, and the innermost destination address that of the host which is to be communicated with. To provide firewall mechanisms while retaining the stateless mechanisms of this approach, it is neccesary to use different encapsulation protocols. Such protocols would provide encapsulation, and also allow for the presentation of authorization tokens, handed out by the proxy to the UAs, that permit specific packet processing in the router. This would effectively be a generalization of the call authorization tokens described in [14]. It is no coincidence that the routing of media operates in a similar fashion to the SIP routing of the ACK. The ACK has a destination of sip:ua2@10.0.1.1, carried in the request-URI, but an intermediate hop (carried in the Route header) of sip:1.2.3.3. The proxy can remain stateless because the ultimate destination is encapsulated within the ACK message it receives from the caller. The same is true for the router, which can also remain stateless - no storage of bindings. 6.2 Example II: Traditional MIDCOM This example is similar to that of the first example, but no IP-in-IP encapsulation is done. Rather, the proxy obtains bindings through MIDCOM. The configuration is shown in Figure 4. The caller, UA1, is on the public Internet. It wishes to call a user, UA2, situated behind a NAT in the foo.com domain. The call flow is shown in Figure 5. In message 1, the caller sends an INVITE. This INVITE looks like, in part: INVITE sip:user2@foo.com SIP/2.0 Supported: middlebox Media-Stream: audio;address=9.8.7.6;port=1288;id=fxx9;transport=udp Content-Type: application/sdp Content-Length: ... v=0 o=alice 2890844526 2890844526 IN IP4 host.anywhere.com s= c=IN IP4 9.8.7.6 t=0 0 m=audio 1288 RTP/AVP 0 J.Rosenberg [Page 20] Internet Draft Session Policy May 2, 2002 ................................ . . . . +--------+ . | | . | Proxy | net10 . | | Network . | | . +--------+ . . . . . +--------+ . | | . | NAT | . | | . | | . +--------+ . +--------+ . +--------+ . | | . | | . | UA | . | UA | . | 1 | . | 2 | . | | . | | . +--------+ .foo.com +--------+ . ................................ Figure 4: Traditional Midcom Configuration This is passed to the foo.com proxy. The proxy does not require the specific usage of an intermediary for media from the callee (who is within foo.com) to the caller. Therefore, it merely proxies the request after a registration lookup. This request (3) arrives at the UAS. The UAS decides to accept the session. It generates a 200 OK with its own Media-Stream headers (4), which looks like, in part: SIP/2.0 200 OK Supported: middlebox Media-Stream: audio;address=10.0.1.1;port=7788;id=jhh7;transport=udp Content-Type: application/sdp J.Rosenberg [Page 21] Internet Draft Session Policy May 2, 2002 UA 1 Proxy NAT UA 2 | | | | |(1) INVITE | | | |MS:audio 9.8.7.6:1288 | | | |--------------------->| | | | | | | | | | | |(2) 100 | | | |<---------------------| | | | | | | | |(3) INVITE | | | |MS:audio 9.8.7.6:1288 | | | |-------------------------------------------->| | | | | | |(4) 200 OK | | | |MS:audio 10.0.1.1:7788| | | |<--------------------------------------------| | | | | | |(5) Allocate | | | |10.0.1.1:7788 | | | |--------------------->| | | | | | | |(6) Binding= | | | |1.2.3.4:8876 | | | |<---------------------| | |(7) 200 OK | | | |MS:audio 10.0.1.1:7788| | | |MM:audio 1.2.3.4:8876 | | | |<---------------------| | | | | | | | | | | |(8) ACK | | | |--------------------->| | | | | | | | | | | | |(9) ACK | | | |-------------------------------------------->| |(10) RTP | | | |destIP= | | | |1.2.3.4:8876 | | | |-------------------------------------------->| | | | |(11) RTP | | | |destIP= | | | |10.0.1.1:7788 | | | |--------------------->| | | | | | | | | | | | | | | | | Figure 5: Traditional Midcom Flow J.Rosenberg [Page 22] Internet Draft Session Policy May 2, 2002 Content-Length: ... v=0 o=bob 2890887s 2890686626 IN IP4 10.0.1.1 s= c=IN IP4 10.0.1.1 t=0 0 m=audio 7788 RTP/AVP 0 This is received by the proxy. The proxy knows it needs to have media destined for this UA pass through the NAT. To do that, it uses a midcom-type of protocol, and requests a NAT binding for 10.0.1.1:7788 (5). The NAT returns a binding (6), which is 1.2.3.4:8876. The proxy inserts a Media-Middlebox header into the 200 OK (7), containing this address as a media intermediary. SIP/2.0 200 OK Require: middlebox Media-Middlebox: jhh7;address=1.2.3.4;port=8876 Media-Stream: audio;address=10.0.1.1;port=7788;id=jhh7;transport=udp Content-Type: application/sdp Content-Length: ... v=0 o=bob 2890887s 2890686626 IN IP4 10.0.1.1 s= c=IN IP4 10.0.1.1 t=0 0 m=audio 7788 RTP/AVP 0 This arrives at the UAC. The UAC generates an ACK (8), which contains a Reverse-MM-Policy header which mirrors the Media-Middlebox header from the 200 OK: ACK sip:ua2@10.0.1.1 SIP/2.0 Route: sip:1.2.3.3 Reverse-MM-Policy: jhh7;address=1.2.3.4;port=8876 The UAC then sends media. Since there is no loose routing mechanism specified, the UAC assumes that the network can properly route the J.Rosenberg [Page 23] Internet Draft Session Policy May 2, 2002 media from the first intermediary to the final recipient. So, it sends its RTP packets to 1.2.3.4:8876 (10). These packets arrive at the NAT. The NAT translates the address to 10.0.1.1:7788, and sends the media to the called party. There is an interesting benefit in this case. One of the problems with the flow of Figure 5 is that it might not work if the caller and callee are in the same domain. In that case, the media would go from the caller, to the NAT, and theoretically turn back around and go to the called party. This is referred to as the intra-realm case [15]. Many NATs will not properly turn the packet around. In the flow here, though, both the caller and callee will know the private IP address of their peers (present in both the SDP and the Media-Stream header). In the event the media fails when routed through the intermediary, both parties can try to send the media directly, since they have enough information to do so. 6.3 Example III: SIP Message Sessions This example is similar to the first example. However, the INVITE is used to set up an IM session [2]. The messages within the IM session are sent using the SIP MESSAGE request [3]. That draft discusses a similar approach for handling intermediaries to the one described here, but uses media-specific parameters within the SDP. Here, the MESSAGE requests in the session are routed through a SIP proxy using "media-specific" source routing specified by the Media-Middlebox header. In this case, the media is a SIP request, and therefore, it uses SIP's loose routing capabilities. The call flow is shown in Figure 6. In message 1, the caller sends an INVITE. This INVITE looks like, in part: INVITE sip:user2@foo.com SIP/2.0 Supported: middlebox Media-Stream: message;address=9.8.7.6;id=fxx9;transport=tcp Content-Type: application/sdp Content-Length: ... v=0 o=alice 2890844526 2890844526 IN IP4 host.anywhere.com s= c=IN IP4 9.8.7.6 t=0 0 m=message 5060 SIP a=user:alice J.Rosenberg [Page 24] Internet Draft Session Policy May 2, 2002 UA 1 Proxy NAT UA 2 |(1) INVITE | | | |MS:message 9.8.7.6 | | | |user=alice | | | |------------------>| | | | | | | | | | | |(2) 100 | | | |<------------------| | | | |(3) INVITE | | | |MS:message 9.8.7.6 | | | |user=alice | | | |-------------------------------------->| | |(4) 200 OK | | | |MS:message 10.0.1.1| | | |user=bob | | |(5) 200 OK |<--------------------------------------| |MS:message 10.0.1.1| | | |user=bob | | | |MM:1.2.3.4 | | | |<------------------| | | | | | | | | | | |(6) ACK | | | |------------------>| | | | | | | | | | | | |(7) ACK | | | |-------------------------------------->| | | | | |(8) MESSAGE bob@10.0.1.1 | | |Route=1.2.3.4 | | | |------------------>| | | | | | | | | | | | |(9) MESSAGE bob@10.0.1.1 | | |-------------------------------------->| | | | | | | | | | | | | | | | | Figure 6: Call Flow for MESSAGE Sessions This is passed to the foo.com proxy. The proxy does not require the specific usage of an intermediary for messages from the callee (who is within foo.com) to the caller. Therefore, it merely proxies the J.Rosenberg [Page 25] Internet Draft Session Policy May 2, 2002 with its own Media-Stream headers (4), which looks like, in part: SIP/2.0 200 OK Supported: middlebox Media-Stream: message;address=10.0.1.1;id=jhh7;transport=tcp Content-Type: application/sdp Content-Length: ... v=0 o=bob 2890887s 2890686626 IN IP4 10.0.1.1 s= c=IN IP4 10.0.1.1 t=0 0 m=message 5060 SIP a=user:bob This is received by the proxy. The proxy knows that the MESSAGE requests cannot go directly to Bob, they need to pass through an intermediary. In this case, its the proxy itself. So, the proxy inserts a Media-Middlebox header, indicating itself as the intermediary, using a media-specific loose routing mechanism (5): SIP/2.0 200 OK Require: middlebox Media-Middlebox: jhh7;address=1.2.3.4;route="media-specific";transport=tcp Media-Stream: message;address=10.0.1.1;id=jhh7;transport=tcp Content-Type: application/sdp Content-Length: ... v=0 o=bob 2890887s 2890686626 IN IP4 10.0.1.1 s= c=IN IP4 10.0.1.1 t=0 0 m=message 5060 SIP a=user:bob This arrives at the UAC. The UAC generates an ACK (6), which contains a Reverse-MM-Policy header which mirrors the Media-Middlebox header from the 200 OK: J.Rosenberg [Page 26] Internet Draft Session Policy May 2, 2002 ACK sip:ua2@10.0.1.1 SIP/2.0 Route: sip:1.2.3.4 Reverse-MM-Policy: jhh7;address=1.2.3.4;route="media-specific";transport=tcp The UAC then sends an IM. To do so, it constructs a SIP MESSAGE request. The request URI is constructed from the SDP in the 200 OK (which matches the Media-Stream header in the 200 OK), and is equal to sip:bob@10.0.1.1. It then constructs a loose route using the SIP Route headers. There is a single intermediary, a proxy at sip:1.2.3.4. The MESSAGE sent by the caller looks like (8): MESSAGE sip:bob@10.0.1.1;transport=tcp SIP/2.0 Route: sip:1.2.3.4;transport=tcp;lr This is received by the proxy, which pops the Route header, and forwards it to the recipient, Bob (9). Of course, the intermediary for the MESSAGE request need not be the same as the proxy handling the SIP. It is only in the case of this example. 7 Author's Addresses Jonathan Rosenberg dynamicsoft 72 Eagle Rock Avenue First Floor East Hanover, NJ 07936 email: jdrosen@dynamicsoft.com 8 Normative References [1] J. Rosenberg, H. Schulzrinne, et al. , "SIP: Session initiation protocol," Internet Draft, Internet Engineering Task Force, Feb. 2002. Work in progress. [2] B. Campbell and J. Rosenberg, "SIP instant message sessions," Internet Draft, Internet Engineering Task Force, July 2001. Work in progress. J.Rosenberg [Page 27] Internet Draft Session Policy May 2, 2002 [3] J. Rosenberg, "Using MESSAGE for IM sessions," Internet Draft, Internet Engineering Task Force, May 2002. Work in progress. 9 Informative References [4] P. Srisuresh, J. Kuthan, J. Rosenberg, A. Molitor, and A. Rayhan, "Middlebox communication architecture and framework," Internet Draft, Internet Engineering Task Force, Mar. 2002. Work in progress. [5] D. Kutscher, J. Ott, and C. Bormann, "Session description and capability negotiation," Internet Draft, Internet Engineering Task Force, Mar. 2002. Work in progress. [6] A. Vemuri and J. Peterson, "SIP for telephones (SIP-t): Context and architectures," Internet Draft, Internet Engineering Task Force, Mar. 2002. Work in progress. [7] S. Floyd and L. Daigle, "IAB architectural and policy considerations for open pluggable edge services," RFC 3238, Internet Engineering Task Force, Jan. 2002. [8] C. Perkins, "IP encapsulation within IP," RFC 2003, Internet Engineering Task Force, Oct. 1996. [9] J. Rosenberg et al. , "A proposal for IM transport," Internet Draft, Internet Engineering Task Force, Nov. 2001. Work in progress. [10] H. Schulzrinne, D. Oran, and G. Camarillo, "The reason header field for the session initiation protocol," Internet Draft, Internet Engineering Task Force, Apr. 2002. Work in progress. [11] H. Schulzrinne, D. Oran, and G. Camarillo, "The reason header field for the session initiation protocol," Internet Draft, Internet Engineering Task Force, Mar. 2002. Work in progress. [12] J. Rosenberg and H. Schulzrinne, "Reliability of provisional responses in SIP," Internet Draft, Internet Engineering Task Force, Feb. 2002. Work in progress. [13] M. Borella, J. Lo, D. Grabelsky, and G. Montenegro, "Realm specific IP: framework," RFC 3102, Internet Engineering Task Force, Oct. 2001. [14] W. Marshall et al. , "SIP extensions for media authorization," Internet Draft, Internet Engineering Task Force, Mar. 2002. Work in progress. [15] C. Aoun and S. Sen, "Identifying intra-realm calls and avoiding J.Rosenberg [Page 28] Internet Draft Session Policy May 2, 2002 media tromboning," Internet Draft, Internet Engineering Task Force, Feb. 2002. Work in progress. Full Copyright Statement Copyright (c) The Internet Society (2002). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. The IETF has been notified of intellectual property rights claimed in regard to some or all of the specification contained in this document. For more information consult the online list of claimed rights. J.Rosenberg [Page 29]