MMUSIC J. Rosenberg Internet-Draft Cisco Systems Expires: January 18, 2006 July 17, 2005 Interactive Connectivity Establishment (ICE): A Methodology for Network Address Translator (NAT) Traversal for Offer/Answer Protocols draft-ietf-mmusic-ice-05 Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on January 18, 2006. Copyright Notice Copyright (C) The Internet Society (2005). Abstract This document describes a methodology for Network Address Translator (NAT) traversal for multimedia session signaling protocols, such as the Session Initiation Protocol (SIP). This methodology is called Interactive Connectivity Establishment (ICE). ICE makes use of existing protocols, such as Simple Traversal of UDP Through NAT (STUN) and Traversal Using Relay NAT (TURN). ICE makes use of STUN in peer-to-peer cooperative fashion, allowing participants to discover, create and verify mutual connectivity. Rosenberg Expires January 18, 2006 [Page 1] Internet-Draft ICE July 2005 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Overview of ICE . . . . . . . . . . . . . . . . . . . . . . . 6 4. Sending the Initial Offer . . . . . . . . . . . . . . . . . . 8 5. Receipt of the Offer and Generation of the Answer . . . . . . 9 6. Processing the Answer . . . . . . . . . . . . . . . . . . . . 9 7. Common Procedures . . . . . . . . . . . . . . . . . . . . . . 10 7.1 Gathering Candidates . . . . . . . . . . . . . . . . . . . 10 7.2 Encoding Candidates into SDP . . . . . . . . . . . . . . . 13 7.3 Prioritizing the Transport Addresses and Choosing an Active One . . . . . . . . . . . . . . . . . . . . . . . . 15 7.4 Connectivity Checks . . . . . . . . . . . . . . . . . . . 17 7.4.1 UDP Connectivity Checks . . . . . . . . . . . . . . . 19 7.4.1.1 Send Validation . . . . . . . . . . . . . . . . . 19 7.4.1.2 Receive Validation . . . . . . . . . . . . . . . . 20 7.4.1.3 Learning New Candidates from Connectivity Checks . . . . . . . . . . . . . . . . . . . . . . 22 7.4.1.3.1 On Receipt of a Binding Request . . . . . . . 23 7.4.1.3.2 On Receipt of a Binding Response . . . . . . . 26 7.4.2 TCP Connectivity Checks . . . . . . . . . . . . . . . 26 7.4.2.1 Connection Establishment . . . . . . . . . . . . . 26 7.4.2.2 Sending STUN Binding Requests . . . . . . . . . . 27 7.4.2.3 Receiving STUN Requests . . . . . . . . . . . . . 29 7.5 Promoting a Valid Candidate to Active . . . . . . . . . . 30 7.5.1 Minimum Requirements . . . . . . . . . . . . . . . . . 30 7.5.2 Suggested Algorithm . . . . . . . . . . . . . . . . . 31 7.6 Subsequent Offer/Answer Exchanges . . . . . . . . . . . . 33 7.6.1 Sending of an Offer . . . . . . . . . . . . . . . . . 33 7.6.2 Receiving the Offer and Sending an Answer . . . . . . 34 7.6.3 Receiving the Answer . . . . . . . . . . . . . . . . . 36 7.7 Binding Keepalives . . . . . . . . . . . . . . . . . . . . 37 7.8 Sending Media . . . . . . . . . . . . . . . . . . . . . . 38 8. Interactions with Forking . . . . . . . . . . . . . . . . . . 38 9. Interactions with Preconditions . . . . . . . . . . . . . . . 38 10. Example . . . . . . . . . . . . . . . . . . . . . . . . . . 39 11. Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . 39 12. Security Considerations . . . . . . . . . . . . . . . . . . 40 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . 42 14. IAB Considerations . . . . . . . . . . . . . . . . . . . . . 42 14.1 Problem Definition . . . . . . . . . . . . . . . . . . . . 42 14.2 Exit Strategy . . . . . . . . . . . . . . . . . . . . . . 43 14.3 Brittleness Introduced by ICE . . . . . . . . . . . . . . 43 14.4 Requirements for a Long Term Solution . . . . . . . . . . 44 14.5 Issues with Existing NAPT Boxes . . . . . . . . . . . . . 45 15. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 45 16. References . . . . . . . . . . . . . . . . . . . . . . . . . 45 Rosenberg Expires January 18, 2006 [Page 2] Internet-Draft ICE July 2005 16.1 Normative References . . . . . . . . . . . . . . . . . . . 45 16.2 Informative References . . . . . . . . . . . . . . . . . . 46 Author's Address . . . . . . . . . . . . . . . . . . . . . . . 47 Intellectual Property and Copyright Statements . . . . . . . . 48 Rosenberg Expires January 18, 2006 [Page 3] Internet-Draft ICE July 2005 1. Introduction A multimedia session signaling protocol is a protocol that exchanges control messages between a pair of agents for the purposes of establishing the flow of media traffic between them. This media flow is distinct from the flow of control messages, and may take a different path through the network. Examples of such protocols are the Session Initiation Protocol (SIP) [3], the Real Time Streaming Protocol (RTSP) [16] and the International Telecommunications Union (ITU) H.323. These protocols, by nature of their design, are difficult to operate through Network Address Translators (NAT). Because their purpose in life is to establish a flow of packets, they tend to carry IP addresses within their messages, which is known to be problematic through NAT [17]. The protocols also seek to create a media flow directly between participants, so that there is no application layer intermediary between them. This is done to reduce media latency, decrease packet loss, and reduce the operational costs of deploying the application. However, this is difficult to accomplish through NAT. A full treatment of the reasons for this is beyond the scope of this specification. Numerous solutions have been proposed for allowing these protocols to operate through NAT. These include Application Layer Gateways (ALGs), the Middlebox Control Protocol [18], Simple Traversal of UDP through NAT (STUN) [1], Traversal Using Relay NAT [14], and Realm Specific IP [19] [20] along with session description extensions needed to make them work, such as the Session Description Protocol (SDP) [7] attribute for the Real Time Control Protocol (RTCP) [2]. Unfortunately, these techniques all have pros and cons which make each one optimal in some network topologies, but a poor choice in others. The result is that administrators and implementors are making assumptions about the topologies of the networks in which their solutions will be deployed. This introduces complexity and brittleness into the system. What is needed is a single solution which is flexible enough to work well in all situations. This specification provides that solution for protocols based on the offer-answer model, RFC 3264 [4]. It is called Interactive Connectivity Establishment, or ICE. ICE makes use of STUN and TURN, but uses them in a specific methodology which avoids many of the pitfalls of using any one alone. 2. Terminology Several new terms are introduced in this specification: Rosenberg Expires January 18, 2006 [Page 4] Internet-Draft ICE July 2005 Peer: From the perspective of one of the agents in a session, its peer is the other agent. Specifically, from the perspective of the offerer, the peer is the answerer. From the perspective of the answerer, the peer is the offeror. Transport Address: The combination of an IP address and port. Local Transport Address: A local transport address a transport address that has been allocated from the operating system on the host. This includes transport addresses obtained through Virtual Private Networks (VPNs) and transport addresses obtained through Realm Specific IP (RSIP) [19] (which lives at the operating system level). Transport addresses are typically obtained by binding to an interface. m/c line: The media and connection lines in the SDP, which together hold the transport address used for the receipt of media. Derived Transport Address: A derived transport address is a transport address which is derived from a local transport address. The derived transport address is related to the associated local transport address in that packets sent to the derived transport address are received on the socket bound to its associated local transport address. Derived addresses are obtained using protocols like STUN and TURN, and more generally, any UNSAF protocol [21]. Candidate Transport Address: A transport address advertised by a agent in an offer or answer. A candidate transport address can either by a local transport address or a derived transport address. Peer Derived Transport Address: A peer derived transport address is a derived transport address learned from a STUN server running within a peer in a media session. TURN Derived Transport Address: A derived transport address obtained from a TURN server. STUN Derived Transport Address: A derived transport address obtained from a STUN server whose address has been provisioned into the UA. This, by definition, excludes Peer Derived Transport Addresses. Candidate: A sequence of candidate transport addresses that form an atomic set for usage with a particular media stream. In the case of RTP, there are two candidate transport addresses per candidate: one for RTP, and another for RTCP. Connectivity is verified to all of the candidate transport addresses within a candidate before that candidate is used. The transport addresses that compose a Rosenberg Expires January 18, 2006 [Page 5] Internet-Draft ICE July 2005 candidate are all of the same type - local, STUN derived, TURN derived or peer derived. Local Candidate: A candidate whose transport addresses are local transport addresses. STUN Candidate: A candidate whose transport addresses are STUN derived transport addresses. TURN Candidate: A candidate whose transport addresses are TURN derived transport addresses. Peer Candidate: A candidate whose transport addresses are peer derived transport addresses. Active Candidate: The candidate that is in use for exchange of media. This is the one that an agent places in the m/c line of an offer or answer. 3. Overview of ICE ICE makes the fundamental assumption that clients exist in a network of segmented connectivity. This segmentation is the result of a number of addressing realms in which a client can simultaneously be connected. We use "realms" here in the broadest sense. A realm is defined purely by connectivity. Two clients are in the same realm if, when they exchange the addresses each has in that realm, they are able to send packets to each other. This includes IPv6 and IPv4 realms, which actually use different address spaces, in addition to private networks connected to the public Internet through NAT. The key assumption in ICE is that a client cannot know, apriori, which address realms it shares with any peer it may wish to communicate with. Therefore, in order to communicate, it has to try connecting to addresses in all of the realms. Rosenberg Expires January 18, 2006 [Page 6] Internet-Draft ICE July 2005 Agent A TURN,STUN Servers Agent B |(1) Gather Addresses | | |-------------------->| | |(2) Offer | | |------------------------------------------>| | |(3) Gather Addresses | | |<--------------------| |(4) Answer | | |<------------------------------------------| |(5) Media | | |<------------------------------------------| |(6) Media | | |------------------------------------------>| |(7) STUN Checks | | |<------------------------------------------| |(8) STUN Checks | | |------------------------------------------>| |(9) Offer | | |------------------------------------------>| |(10) Answer | | |<------------------------------------------| |(11) Media | | |<------------------------------------------| |(12) Media | | |------------------------------------------>| Figure 1 The basic flow of operation for ICE is shown in Figure 1. Before the offeror establishes a session, it obtains local transport addresses from its operating system on as many interfaces as it has access to. These interfaces can include IPv4 and IPv6 interfaces, in addition to Virtual Private Network (VPN) interfaces or ones associated with RSIP. For media protocols that support both UDP and TCP (such as the Real Time Transport Protocol (RTP) [22], which can run over either), it obtains both TCP and UDP transport addresses. In addition, the agent obtains derived transport addresses from each local transport address using protocols such as STUN and TURN. Each local and derived transport address becomes a candidate for receipt of media traffic. The agent will choose one of its candidate transport addresses as its initial media transport address for inclusion in the connection and media lines in the offer. This transport address will be utilized for media traffic while connectivity is verified to all of the candidates. Since these checks may take time to execute, media clipping will occur if the media transport address is not reachable Rosenberg Expires January 18, 2006 [Page 7] Internet-Draft ICE July 2005 by the peer. To minimize the probability of clipping, the transport address that is most likely to work is chosen. This is normally a TURN-derived tranport address, but others can be utilized based on local policy. Each candidate transport address (including the one being used as the media transport address) is listed in an a=candidate attribute in the offer. Each candidate is given a preference. Preference is a matter of local policy, but typically, lowest preference would be given to transport addresses learned from a TURN server (i.e., TURN derived transport addresses). Each candidate is also assigned a distinct ID, called a transport ID (tid). The offer is then sent to the answerer. This specification does not address the issue of how the signaling messages themselves traverse NAT. It is assumed that signaling protocol specific mechanisms are used for that purpose. The answerer follows a similar process as the offeror followed; it obtains addresses from local interfaces, obtains derived transport addresses from those, and the combination becomes its set of candidate transport addresses. It picks one as its initial media transport address and places it into the m/c line in the answer, and then lists all of them in the a=candidate attributes in the answer, along with a preference and tid. Once the offer/answer exchange has completed, each agent sends media from its media transport address to the media transport address of its peer. This media stream may or may not work, depending on whether or not the media transport address is reachable. In parallel with the transmission of media, a connectivity check begins. This check makes use of STUN messages sent from each candidate to each other candidate. These checks will allow each agent to determine whether it can send packets from a particular candidate to a candidate from its peer, and whether packets can be sent back. If, after a certain period of time, an agent determines that a pair of candidates works, and has a higher priority than the transport addresses currently in use for media (perhaps because the ones in use don't work), it sends a new offer that "promotes" its candidate into the m/c line. This causes the media traffic to switch to this new transport address. 4. Sending the Initial Offer When an agent wishes to begin a session by sending an initial offer, it starts by gathering transport addresses, as described in Section 7.1. This will produce a set of candidates, including local ones, STUN-derived ones, and TURN-derived ones. This process of gathering candidates can actually happen at any time Rosenberg Expires January 18, 2006 [Page 8] Internet-Draft ICE July 2005 before sending the initial offer. A agent can pre-gather transport addresses, using a user interface cue (such as picking up the phone, or entry into an address book) as a hint that communications is imminent. Doing so eliminates any additional perceivable call setup delays due to address gathering. When it comes time to offer communications, it determines a priority for each candidate and identifies the active candidate that will be used for receipt of media, as described in Section 7.3. The next step is to construct the offer message. For each media stream, it places its candidates into a=candidate attributes in the offer and puts its active candidate into the m/c line. The process for doing this is described in Section 7.2. The offer is then sent. 5. Receipt of the Offer and Generation of the Answer Upon receipt of the offer message, the agent checks if the offer contains any a=candidate attributes. If it does, the offeror supports ICE. In that case, it starts gathering candidates, as described in Section 7.1, and prioritizes them Section 7.3. This processing is done immediately on receipt of the offer, to prepare for the case where the user should accept the call, or early media needs to be generated. By gathering candidates while the user is being alerted to the request for communications, session establishment delays due to that gathering can be eliminated. At some point, the answerer will decide to accept or reject the communications. A rejection terminates ICE processing. In the case of acceptance, the answer is constructed, and if the offeror supported ICE, the candidates are encoded into the SDP as described in Section 7.2. The answer is then sent. If the offeror supported ICE, the answerer begins its connectivity checks as described in Section 7.4. In addition, and regardless if the offeror supported ICE, the answerer can begin sending media packets as it normally would. It sends media according to the procedures in Section 7.8. 6. Processing the Answer There are two possible cases for processing of the answer. If the answerer did not support ICE, the answer will not contain any a=candidate attributes. As a result, the offeror knows that it cannot perform its connectivity checks. In this case, it proceeds with normal media processing as if ICE was not in use. The procedures for sending media, described in Section 7.8, MUST be followed however. Rosenberg Expires January 18, 2006 [Page 9] Internet-Draft ICE July 2005 If the answer contains candidates, it implies that the answerer supported ICE. In that case, the offeror begins connectivity checks as described in Section 7.4. It also starts sending media, using the candidate in the m/c line, based on the procedures described in Section 7.8. 7. Common Procedures This section discusses procedures that are common between offeror and answerer. 7.1 Gathering Candidates An agent gathers candidates when it believes that communications is imminent. For offerors, this occurs before sending an offer (Section 4). For answerers, it occurs before sending an answer (Section 5). Each candidate is composed of a series of transport addresses of the same type. In the case of RTP, the candidate is composed of either one or two transport addresses. Normally there are two - one for RTP, and one for RTCP. However, if RTCP is not in use, a candidate will only contain a single transport address. The first step is to gather local candidates. Local candidates are obtained by binding to ephemeral ports on an interface (physical or virtual, including VPN interfaces) on the host. Specifically, for each UDP-only media stream the agent wishes to use, the agent SHOULD obtain a set of candidates (one for each interface) by binding to N ephemeral UDP ports on each interface, where N is the number of transport addresses needed for the candidate. For RTP, N is typically two. For each TCP-only media stream the agent wishes to use, the agent SHOULD obtain a set of candidates by binding to N ephemeral TCP ports on each interface, where N is the number of transport addresses needed for the candidate. For media streams that can support either UDP or TCP, the agent SHOULD obtain a set of candidates by binding to N ephemeral UDP and N ephemeral TCP ports on each interface, where N is the number of transport addresses needed for the candidate. If a host has K local interfaces, this will result in K candidates for each UDP stream (requiring K*N transport addresses), K candidates for each TCP stream (requiring K*N transport addresses), and 2K candidates for streams that support UDP and TCP (requiring 2*K*N transport addresses). Media streams carried using the Real Time Transport Protocol (RTP) [22] can run over TCP [27]. As such, it is RECOMMENDED that both UDP Rosenberg Expires January 18, 2006 [Page 10] Internet-Draft ICE July 2005 and TCP candidates be obtained. Transmission of real time media over UDP is generally preferred to TCP. However, many network environments, for better or for worse, permit only TCP traffic. Obtaining a TCP candidate, and then using it in conjunction with a TURN relay as described below, allows for ICE to make use of the TCP media only when UDP connectivity is non-existent, as it may be in these restricted environments. However, providers of real-time communications services may decide that it is preferable to have no media at all than it is to have media over TCP. To allow for choice, it is RECOMMENDED that agents be configurable with whether they obtain TCP candidates for real time media. Having it be configurable, and then configuring it to be off, is far better than not having the capability at all. An important goal of this specification is to provide a single mechanism that can be used across all types of endpoints. As such, it is preferable to account for provider and network variation through configuration, instead of hard-coded limitations in an implementation. Furthermore, network characteristics and connectivity assumptions can, and will change over time. Just because a agent is communicating with a server on the public network today, doesn't mean that it won't need to communicate with one behind a NAT tomorrow. Just because a agent is behind a full cone NAT today, doesn't mean that tomorrow they won't pick up their agent and take it to a public network access point where there is a symmetric NAT or one that only allows outbound TCP. The way to handle these cases and build a reliable system is for agents to implement a diverse set of techniques for allocating addresses, so that at least one of them is almost certainly going to work in any situation. Implementors should consider very carefully any assumptions that they make about deployments before electing not to implement one of the mechanisms for address allocation. In particular, implementors should consider whether the elements in the system may be mobile, and connect through different networks with different connectivity. They should also consider whether endpoints which are under their control, in terms of location and network connectivity, would always be under their control. Only in cases where there isn't now, and never will be, endpoint mobility or nomadicity of any sort, should a technique be omitted. Once the agent has obtained local candidates, it obtains candidates with derived transport addresses. Agents which serve end users directly, such as softphones, hardphones, terminal adaptors and so on, MUST implement STUN and SHOULD use it to obtain STUN candidates. These devices SHOULD implement and SHOULD use TURN to obtain TURN candidates. They MAY implement and MAY use other protocols that provide derived transport addresses, such as TEREDO [25]. As with Rosenberg Expires January 18, 2006 [Page 11] Internet-Draft ICE July 2005 TCP, usage of STUN and TURN is at SHOULD strength to allow for provider variation. If it is not to be used, it is also RECOMMENDED that it be implemented and just disabled through configuration, so that it can re-enabled through configuration if conditions change in the future. Agents which represent network servers under the control of a service provider, such as gateways to the telephone network, media servers, or conferencing servers that are targeted at deployment only in networks with public IP addresses MAY use STUN, TURN or other similar protocols to obtain candidates. Why would these types of endpoints even bother to implement ICE? The answer is that such an implementation greatly facilitates NAT traversal for endpoints that connect to it. The ability to process STUN connectivity checks allows for the network server to obtain peer-derived transport addresses that can be used to provide relay-free traversal of symmetric NAT for endpoints that connect to it. Furthermore, implementation of the STUN connectivity checks allows for NAT bindings along the way to be kept open. ICE also provides numerous security properties that are independent of NAT traversal, and would benefit any multimedia endpoint. See Section 12 for a discussion on these benefits. To obtain STUN candidates (which are always UDP), the client takes a local UDP candidate, and for each configured STUN server, produces a STUN candidate. It is anticipated that clients may have a multiplicity of STUN servers configured in network environments where there are multiple layers of NAT, and that layering is known to the provider of the client. To produce the STUN candidate from the local candidate, it follows the procedures of Section 9 of RFC 3489 for each local transport address in the local candidate. It obtains a shared secret from the STUN server and then initiates a Binding Request transaction from the local transport address to that server. The Binding Response will provide the client with its STUN derived transport address in the MAPPED-ADDRESS attribute. If the client had K local candidates, this will produce S*K STUN candidates, where S is the number of configured STUN servers. To obtain UDP TURN candidates, the client takes a local UDP candidate, and for each configured TURN server, produces a TURN candidate. It is anticipated that clients may have a multiplicity of TURN servers configured in network environments where there are multiple layers of NAT, and that layering is known to the provider of the client. To produce the TURN candidate from the local candidate, it follows the procedures of Section 8 of [14] for each local transport address in the local candidate. It initiates an Allocate Request transaction from the local transport address to that server. Rosenberg Expires January 18, 2006 [Page 12] Internet-Draft ICE July 2005 The Allocate Response will provide the client with its TURN derived transport address in the MAPPED-ADDRESS attribute. If the client had K local candidates, this will produce S*K UDP TURN candidates, where S is the number of configured TURN servers. To obtain a TURN-derived TCP candidates, the client takes a local TCP candidate, and for each configured TURN server, produces a TCP TURN candidate. It is anticipated that clients may have a multiplicity of TURN servers configured in network environments where there are multiple layers of NAT, and that layering is known to the provider of the client. To produce the TURN candidate from the local candidate, it iterates through the local transport addresses in the local candidate, and for for each one, initiates a TCP connection from the same interface the local transport address to the TURN server. It is not neccesary to initiate the connection from the actual port in the local transport address. Following the procedures of Section 8 of [14], it initiates an Allocate Request transaction over the connection. The Allocate Response will provide the client with its TCP TURN derived transport address in the MAPPED-ADDRESS attribute. If the client had K local TCP candidates, this will produce S*K TCP TURN candidates, where S is the number of configured TURN servers. 7.2 Encoding Candidates into SDP For each candidate to be placed into the SDP, the agent includes a series of a=candidate attributes as media-level attributes, one for each transport address in the candidate. Each of the transport addresses for the same candidate MUST have the same value of the candidate-id attribute. The a=candidate attributes for different candidates MUST be unique within that media stream. Using a simple sequence number, incrementing by one for each candidate for a media stream, meets these requirements. The transport, unicast-address and port of the attribute are set to those for the candidate. The qvalue is set to the priority of this candidate (note that, for RTP, the RTP and RTCP transport addresses MUST have equal priority values). The tid MUST be chosen randomly with 128 bits of randomness. The tid is chosen only when the transport address is placed into the SDP for the first time; subsequent offers or answers within the same session containing that same transport address would use the same tid used previously. The tid serves as a unique identifier for each transport address. It also gets combined, through concatenation, with the tid of a peer candidate to form the username and password that is placed in the STUN checks between the peers. This allows the STUN message to uniquely identify the pairing whose connectivity it is checking. The tid is needed as a unique identifier because the IP address within the candidate fails to provide that uniqueness as a consequence of Rosenberg Expires January 18, 2006 [Page 13] Internet-Draft ICE July 2005 NAT. Consider agents A, B, and C. A and B are within private enterprise 1, which is using 10.0.0.0/8. C is within private enterprise 2, which is also using 10.0.0.0/8. As it turns out, B and C both have IP address 10.0.1.1. A sends an offer to C. C, in its answer, provides A with its transport addresses. In this case, thats 10.0.1.1:8866 and 8877. As it turns out, B is in a session at that same time, and is also using 10.0.1.1:8866 and 8877. This means that B is prepared to accept STUN messages on those ports, just as C is. A will send a STUN request to 10.0.1.1:8866 and 8877. However, these do not go to C as expected. Instead, they go to B. If B just replied to them, A would believe it has connectivity to C, when in fact it has connectivity to a completely different user, B. To fix this, tid takes on the role of a unique identifier. C provides A with an identifier for its transport address, and A provides one to C. A concatenates these two identifiers and uses the result as the username and password in its STUN query to 10.0.1.1:8866. This STUN query arrives at B. However, the username is unknown to B, and so the request is rejected. A treats the rejected STUN request as if there were no connectivity to C (which is actually true). Therefore, the error is avoided. An unfortunate consequence of the non-uniqueness of IP addresses is that, in the above example, B might not even be an ICE agent. It could be any host, and the port to which the STUN packet is directed could be any ephemeral port on that host. If there is an application listening on this socket for packets, and it is not prepared to handle malformed packets for whatever protocol is in use, the operation of that application could be effected. Fortunately, since the ports exchanged in SDP are ephemeral and ususally drawn from the dynamic or registered range, the odds are good that the port is not used to run a server on host B, but rather is the agent side of some protocol. This decreases the probability of hitting a port in-use, due to the transient nature of port usage in this range. However, the possibility of a problem does exist, and network deployers should be prepared for it. Note that, because there are separate transport addresses for RTP and RTCP, each will have a distinct tid. The active candidate is placed into the m/c lines of the SDP. For RTP streams, this is done by placing the RTP address and port into the c and m lines in the SDP respectively. If the agent it utilizing RTCP, it MUST encode its address and port using the a=rtcp attribute as defined in RFC 3605 [2]. If RTCP is not in use, the agent MUST signal that using b=RS:0 and b=RR:0 as defined in RFC 3556 [8]. Rosenberg Expires January 18, 2006 [Page 14] Internet-Draft ICE July 2005 For media streams that are inherently TCP-based (as opposed to ones where TCP is a fallback and would be listed as a candidate but not the initial active address), the connections MUST be signaled using comedia [13], and those connections MUST be in "holdconn" mode. This has the effect of suspending connection attempts via the comedia mechanisms, allowing ICE to open the connections instead. These connections then get removed from holdconn mode when the ICE procedures complete and an updated offer/answer exchange takes place that promotes one of the existing ICE-established connections to active. Note that this has the result of increasing the post-dial- delay for TCP-oriented media, but brings with it substantial security and NAT traversal properties. 7.3 Prioritizing the Transport Addresses and Choosing an Active One The prioritization process takes the set of candidates and associates each with a priority. This priority reflects the desire that the agent has to receive media on that address, and is assigned as a value from 0 to 1 (1 being most preferred). Priorities are ordinal, so that their significance is only meaningful relative to other candidates for a particular media stream. This specification makes no normative recommendations on how the prioritization is done. However, some useful guidelines are suggested on how such a prioritization can be determined. One criteria for choosing one candidate over another is whether or not that candidate involves the use of a relay. That is, if media is sent to that candidate, will the media first transit a relay before being received. TURN candidates make use of relays (the TURN server), as do any local candidates associated with a VPN server. When media is transited through a relay, it can increase the latency between transmission and reception. It can increase the packet losses, because of the additional router hops that may be taken. It may increase the cost of providing service, since media will be routed in and right back out of a relay run by the provider. If these concerns are important, candidates with this property can be listed with lower priority. Another criteria for choosing one candidate over another is IP address family. ICE works with both IPv4 and IPv6. It therefore provides a transition mechanism that allows dual-stack hosts to prefer connectivity over IPv6, but to fall back to IPv4 in case the v6 networks are disconnected (due, for example, to a failure in a 6to4 relay) [24]. It can also help with hosts that have both a native IPv6 address and a 6to4 address. In such a case, higher priority could be afforded to the native v6 address, followed by the 6to4 address, followed by a native v4 address. This allows a site to Rosenberg Expires January 18, 2006 [Page 15] Internet-Draft ICE July 2005 obtain and begin using native v6 addresss immediately, yet still fallback to 6to4 addresses when communicating with agents in other sites that do not yet have native v6 connectivity. Another criteria for choosing one candidate over another is security. If a user is a telecommuter, and therefore connected to their corporate network and a local home network, they may prefer their voice traffic to be routed over the VPN in order to keep it on the corporate network when communicating within the enterprise, but use the local network when communicating with users outside of the enterprise. Another criteria for choosing one address over another is topological awareness. This is most useful for candidates which make use of relays (including TURN and VPN). In those cases, if a agent has preconfigured or dynamically discovered knowledge of the topological proximity of the relays to itself, it can use that to select closer relays with higher priority. Finally, the transport protocol itself is a criteria for choosing one candidate over another. If a particular media stream can run over UDP or TCP, the UDP candidates might be preferred over the TCP candidates. This allows ICE to use the lower latency UDP connectivity if it exists, but fallback to TCP if UDP doesn't work. Once the candidates have been prioritized, one is selected as the active one. This is the candidate that will be used for actual exchange of media, until replaced by an updated offer or answer. Since the ICE connectivity checks can take a few seconds to execute, media clipping can occur is this candidate doesn't work. The active candidate will also be used to receive media from ICE-unaware peers. As such, it is RECOMMENDED that one be chosen based on the likelihood of that candidate to work with the peer that is being contacted. Unfortunately, it is difficult to ascertain which candidate that might be. As an example, consider a user within an enterprise. To reach non-ICE capable agents within the enterprise, a local candidate has to be used, since the enterprise policies may prevent communication between elements using a relay on the public network. However, when communicating to peers outside of the enterprise, a TURN-based candidate from a publically accessible TURN server is needed. Indeed, the difficulty in picking just one address that will work is the whole problem that motivated the development of this specification in the first place. As such, it is RECOMMENDED that the default address be a TURN candidate from a TURN server providing public IP addresses. Furthermore, ICE is only truly effective when it is supported on both sides of the session. It is therefore most Rosenberg Expires January 18, 2006 [Page 16] Internet-Draft ICE July 2005 prudent to deploy it to close-knit communities as a whole, rather than piecemeal. In the example above, this would mean that ICE would ideally be deployed completely within the enterprise, rather than just to parts of it. 7.4 Connectivity Checks Once the offer/answer exchange has completed, both agents will have a set of candidates for each media stream. Each agent forms a set of pairings for each media stream by combining each of its UDP candidates with each of the UDP candidates of its peer, and by combining each of its TCP candidates with each of the TCP candidates of its peer. If candidates for other transport protocols were signaled through the offer/answer exchange, a pairing is performed between each of those as well. If an offer/answer exchange took place for a session comprised of an audio and a video stream, and each stream had two UDP and two TCP candidates from each agent, there would be 16 pairings, 8 for audio and 8 for video. Each of those eight would be comprised of four UDP and four TCP. Note that there is no requirement that the number of candidates from each peer be the same. One agent can offer two UDP candidates for a media stream, and the answer can contain three UDP candidates for the same media stream. In that case, there would be six UDP pairings. Each candidate has a number of transport addresses. In the case of RTP, there are either one or two. Within the pairing, the transport addresses of each candidate are linked together one-to-one to form a transport address pair. In the case of RTP, the result will either be one or two transport address pairs - one for RTP, and possibly another for RTCP. The relationship between a candidate, transport address, pairing and transport address pair are shown in Figure 2. This figure shows the pairing as seen by the agent that owns the candidate {A,B}. The candidate owned by that agent is called the native candidate, and the one owned by its peer is the remote candidate. As the figure shows, there is one pairing between two candidates, and two transport address pairs ({A,C} and {B,D}). If one of the candidates only had one transport address (in the case where RTCP was not being used by one agent), there would only be one transport address pair, {A,C}. Each transport address is associated with a tid. Furthermore, each transport address pair is associated with an ID, the transport address pair ID. This ID is equal to the concatenation of the tid of the native transport address with the tid of the remote transport address. This means that the identifiers are different for each agent. For the agent that owns {A,B}, the transport address pair ID is WY for the first transport address pair, and XZ for the second. For the agent that owns {C,D}, it would be reversed - YW for the first transport address pair, and ZX for the second. Rosenberg Expires January 18, 2006 [Page 17] Internet-Draft ICE July 2005 ........................................... . . .......... . . .......... . . . ............. ............. . . . . . . . . . . . . . . -- . . . -- . . -- . . . -- . . | A|<<<<<<<<<<| A|--------------------| C|>>>>>>>>>>>>| K| . . -- . . . -- . Transport . -- . . . -- . . . . . Transport . Address . Transport . . . . . . . . Address . Pair . Address . . . . . . . . tid=W . ID=WY . tid=Y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . -- . . . -- . . -- . . . -- . . | J|<<<<<<<<<<| B|--------------------| D|>>>>>>>>>>>>| D| . . -- . . . -- . Transport . -- . . . -- . .......... . . Transport . Address . Transport . . .......... Associated . . Address . Pair . Address . . Associated Local . . tid=X . ID=XZ . tid=Z . . Local Transport . . . . . . Transport Addresses . ............. ............. . Addresses . Native Remote . . Candidate Candidate . . and and . . Transport Addresses Transport Addresses . . . ........................................... Pairing Figure 2 The figure also shows that each transport address has an associated local transport address. The associated local transport address is the local transport address at which the agent will receive packets sent to the transport address. For a local transport address, its associated local transport address is the same. That is the case of transport address A and D in the diagram. For STUN derived and TURN derived transport addresses, however, they are not the same. The associated local transport address is the one from which the STUN or TURN transport was derived. Next, each agent begins sending connectivity checks for each transport address pair. The procedure differs for UDP and TCP. Rosenberg Expires January 18, 2006 [Page 18] Internet-Draft ICE July 2005 7.4.1 UDP Connectivity Checks An agent considers a UDP pairing validated when all of its transport address pairs have been validated. Each transport address pair is validated if an agent successfully completed a STUN Binding Request transaction from its native transport address to the corresponding remote transport address, and when it has received a STUN Binding Request transaction on its native transport address, sent from the remote transport address. This ensures that packets can flow in each direction. Because validation of a transport address pair involves a STUN transaction in each direction, a pair can be in one of five states - unknown, invalid, send-valid, receive-valid and valid. Each transport address pair starts in the unknown state. 7.4.1.1 Send Validation To validate a transport address pair in the send direction, an agent needs to complete a successful STUN Binding Request transaction. This means it needs to send a Binding Request from its native transport address to the remote transport address, and receive a successful Binding Response back. For UDP-based transport addresses, an agent initiates a STUN Binding Request transaction by sending from its native transport address, and sends it to the remote transport address. The meaning of "sending from its native transport address" is clear in the case of a local transport address - the request is sent such that the source IP address and port of the packet is equal to that local transport address. However, the meaning is different for STUN and TURN derived transport addresses. For STUN derived transport address, it is sent by sending from the local transport address used to derive that STUN address. For TURN derived transport addresses, it is sent by using TURN mechanisms to send the request through the TURN server (using the SEND primitive). Sending the request through the TURN server neccesarily requires that the request be sent from the client, using the local transport address used to derive the TURN transport address. The Binding Request sent by the agent MUST contain the USERNAME attribute. This attribute MUST be set to the transport address pair ID of the corresponding transport address pair as seen by its peer. Thus, for the first transport address pair in the example above, if the agent on the left sends the STUN Binding Request, the USERNAME will have the value YW. The request MAY contain the MESSAGE- INTEGRITY attribute, computed according to RFC 3489 procedures. The MESSAGE-INTEGRITY The Binding Request MUST NOT contain the CHANGE- Rosenberg Expires January 18, 2006 [Page 19] Internet-Draft ICE July 2005 REQUEST or ANSWER-ADDRESS attribute. Each of these STUN transactions will generate either a timeout, or a response. If the response is a 420, 500, or 401, the agent should try again as described in RFC 3489. Either initially, or after such a retry, the STUN transaction might produce a non-recoverable failure response (error codes 400, 431, or 600) or a failure result inapplicable to this usage of STUN and thus unrecoverable (432, 433). If this happens the transport address pair and its corresponding candidate is considered invalid. If the STUN transaction produces a 430 error or times out, the client SHOULD retry with a new STUN Binding Request transaction. The 430 response code, as described below, is generated when the server doesn't recognize the STUN username because the BindingRequest was sent received prior to the receipt of the answer. Its ocurrence is a result of a failed race between the BindingRequest and the answer. This is remedied by retrying, which allows the "slower" answer to be received. These retry transactions carry the same USERNAME value as the original Binding Request, and differ only in their STUN transaction ID. If these retries have not produced a success response after Tg seconds, the transport address pair is considered invalid. Tg SHOULD be configurable. It is RECOMMENDED that it default to 50 seconds. This is a reasonable approximation of the maximum SIP transaction duration. If the STUN transaction succeeds for a UDP transport address pair (producing a success response), and the pair was previously in the receive-valid state, it is considered valid. If the pair was previously in the unknown state, it is considered send-valid. If a transport address pair is send-valid or valid, an agent MUST generate a new STUN Binding Request transaction every Tr seconds. This transaction ensures that NAT bindings for the transport address pair remain open while the candidate is under consideration. They can also be used to keep the bindings alive when the candidate is promoted to active, as described in Section 7.7. Tr SHOULD be configurable, and SHOULD default to 15 seconds. Each new Binding Request transaction is processed according to the procedures in this Section. It is possible for a previously valid candidate to later be invalidated by a subsequent STUN transaction. This happens in cases where the NAT bindings expire. 7.4.1.2 Receive Validation As a result of providing a list of candidates in its offer or answer, an ICE implementation will receive STUN Binding Request messages. An agent MUST be prepared to receive STUN Binding Requests on each local transport address from the moment it sends an offer or answer that Rosenberg Expires January 18, 2006 [Page 20] Internet-Draft ICE July 2005 contains a candidate with that local transport address. Similarly, it MUST be prepared to receive STUN Binding Requests on a local transport address the moment it sends an offer or answer that contains a STUN or TURN candidate derived from a local candidate containing that local transport address. It can cease listening for STUN messages on that local transport address after reliably sending an updated offer or answer which does not include any candidates equal to or derived from that local transport address. Here, "reliably" means that the agent knows that the offer or answer was received by its peer. This knowledge is based on the protocol carrying the offer/answer exchanges. In the case of SIP, if the offer is in an INVITE, the agent knows this was received by its peer when a 200 OK or reliable provisional response [9] is received with the answer. If the offer is in a reliable provisional response, the agent knows it was reliably received when the PRACK arrives. If an answer is in a 200 OK response, the agent knows this was received when the ACK is received. The agent does not need to provide STUN service on any other IP address or port, unlike the STUN usage described in [1]. The need to run the service on multiple ports is to support the change flags. However, those flags are not needed with ICE, and the server SHOULD reject, with a 400 answer, any STUN requests with these flags set. The CHANGED-ADDRESS attribute in a BindingAnswer is set to the transport address on which the server is running. Furthermore, there is no need to support TLS or to be prepared to receive SharedSecret request messages. Those messages are used to obtain shared secrets to be used with BindingRequests. However, with ICE, a shared secret is not needed. The tid's that are exchanged and used to form the STUN USERNAME attribute do not actually require the security properties associated with a shared secret in order for ICE to operate securely; this is because ICE security is bootstrapped off of the protocol carrying the offer/answer exchanges. One of the candidates will be in use as the active candidate. For the transport addresses comprising that candidate, the agent will receive both STUN requests and media packets on its associated local transport addresses. The agent MUST be able to disambiguate them. In the case of RTP/RTCP, this disambiguation is easy. RTP and RTCP packets start with the bits 0b10 (v=2). The first two bits in STUN are always 0b00. This disambiguation also works for packets sent using Secure RTP [23], since the RTP header is in the clear. Disambiguating STUN with other media stream protocols may be more complicated. However, it can always be possible with arbitrarily high probabilities by selecting an appropriately random username (see below). Rosenberg Expires January 18, 2006 [Page 21] Internet-Draft ICE July 2005 The STUN Binding Request can only be usefully processed once an offer/answer exchange has completed. As a result, if an offeror receives a STUN Binding Request message prior to the receipt of an answer to its offer, it MUST reject the request with a 430 response. This will cause the answerer to retry, and give time for the answer (which is in transit) to arrive at the offerer. If the offer/answer exchange has completed, the agent MUST follow the procedures defined in RFC 3489 and verify that the USERNAME attribute is known to the server. Here, this is done by taking the USERNAME attribute, and comparing it against the transport address pair identifiers for each transport address pair as seen by that agent. If there is no match, the STUN Binding Request generates a 400. If there is a match, the resulting transport address pair is called the matching transport address pair. The user agent proceeds with the processing of the request and generation of a response as per RFC 3489. In addition, the if the state of that transport address pair was previously unknown, it changes to receive-valid. If the state was previously send-valid, it moves to valid. An agent will continue to receive periodic STUN transactions as long as it had listed its transport address in an a=candidate attribute. It MUST process those transactions according to this section. It is possible that a transport address pair that was previously valid may become invalidated as a result of a subsequent failed STUN transaction. 7.4.1.3 Learning New Candidates from Connectivity Checks ICE makes use of candidate addresses learned through protocols like STUN, as described in Section 7.1. These addresses are learned when STUN requests are sent to configured STUN servers. However, the peer-to-peer STUN connectivity checks can themselves provide additional candidates that ICE can make use of. This happens when two agents are separated by a symmetric NAT. When the agent behind the symmetric NAT sends a Binding Request to the other agent (which can have a public address or be behind any type of NAT except for symmetric), the symmetric NAT will create a new NAT binding for this Binding Request. Because of the properties of symmetric NAT, that binding can be used be the agent on the public side of the symmetric NAT to send packets back to the agent behind the symmetric NAT. To do this, ICE agents dynamically learn new candidates by examining the source IP addresses and MAPPED-ADDRESS attributes in STUN Binding Requests and Responses respectively. If they don't match any existing candidates, a new candidate is added. This candidate corresponds to the new IP address and port created by the symmetric NAT, and is a new point of contact for the agent behind the symmetric Rosenberg Expires January 18, 2006 [Page 22] Internet-Draft ICE July 2005 NAT. Since that candidate is only reachable from the very specific IP address and port where the STUN request was sent to, the new candidate is paired up with that transport address on the other agent. Since all candidates need to have properties, such as tids, priorities and candidate IDs, these are all computed algorithmically, so that they can be determined by both agents just from the STUN message. The specific procedures on receipt of a Binding Request and Response for accomplishing this are described here. 7.4.1.3.1 On Receipt of a Binding Request When a STUN Binding Request is received which generates a success response, the source IP address and port of that request is compared all existing remote transport addresses. If there is no match, the agent creates a new remote candidate, and adds a transport address to it. It sets the IP address and port of this new remote transport address to the IP address and port that was present in the incoming Binding Request. Since this is a new candidate transport address, it requires a new tid. The agent creates one algorithmically, by concatenating the tid of the remote transport address in the matching transport address pair (recall that the matching transport address pair is the one whose transport address pair ID matched the username of the incoming Binding Request) with the string representation of the source IP address and port from the incoming Binding Request. This string representation is defined using the grammar for "hostport" from RFC 3261 [3], which defines the familiar notation of the IP address and port separated by a colon. The priority of the new candidate MUST be set to the priority of the remote candidate in the matching transport address pair. There is no need to compute the candidate ID for this new candidate. Though this is a valid transport address, the agent does not pair it up with each of its own transport addresses. Rather, it pairs it up only with the native transport address from the matching transport address pair. This creates a new transport address pair. Since connectivity has been verified in the receive direction, the agent sets its state to receive-valid. As with all other transport address pairs, the agent will attempt to validate send capabilities by sending a STUN Binding Request according to the procedures in Section 7.4.1.1. It is important to note that this process creates a new remote transport address, not a whole new remote candidate. For a whole remote candidate to come into existence, all of its component transport addresses must come into existence, and all must have been Rosenberg Expires January 18, 2006 [Page 23] Internet-Draft ICE July 2005 obtained as a result of a STUN Binding Requests between transport address pairs in the same pairing. As an example, consider the pairing in Figure 2. If the peer is behind a symmetric NAT, the Binding Request sent from C to A might produce a new remote transport address for RTP. To create a full candidate, a STUN Binding Request from D to B has to also create a new remote transport address, to be used for RTCP. If this were to happen, the resulting set of relationships is shown in Figure 3. To simplify the diagram, associated local transport address relationships have been omitted. Notice how the tids of the new remote candidate have been constructed by concatenating the tids of the original remote candidate with the newly discovered transport addresses, here, {R,S}. Rosenberg Expires January 18, 2006 [Page 24] Internet-Draft ICE July 2005 ............. ............. . . . . . -- . . -- . . | A|---------------------------------------| C| . . -- -----------+ Transport . -- . . Transport . | Address . Transport . . Address . | Pair . Address . . tid=W . | ID=WY . tid=Y . . . | . . . . | . . . . | . . . -- . | . -- . . | B|-----------C---------------------------| D| . . -- ---------+ | Transport . -- . . Transport . | | Address . Transport . . Address . | | Pair . Address . . tid=X . | | ID=XZ . tid=Z . . . | | . . ............. | | ............. | | remote native | | candidate candidate | | | | ............. | | . . | | . -- . | +---------------------------| R| . | Transport . -- . | Address . Transport . | Pair . Address . | ID=WYR . tid=YR . | . . | . . | . . | . -- . +-----------------------------| S| . Transport . -- . Address . Transport . Pair . Address . ID=XZS . tid=ZS . . . ............. peer-derived remote candidate Figure 3 Rosenberg Expires January 18, 2006 [Page 25] Internet-Draft ICE July 2005 7.4.1.3.2 On Receipt of a Binding Response When an agent receives a successful Binding Response, it examines the MAPPED-ADDRESS attribute in that response. If the MAPPED-ADDRESS does match any of the existing candidate transport addresses, this represents a new peer-derived transport address. The agent creates a new local candidate, and adds a transport address to it. It sets the IP address and port of this new native transport address to the IP address and port that was present in the MAPPED- ADDRESS attribute of the Binding Response. Since this is a new candidate transport address, it requires a new tid. The agent creates one algorithmically, by concatenating the tid of the native transport address in the transport address pair that was being validated by the Binding Request with the string representation of the source IP address and port from the MAPPED-ADDRESS attribute. This string representation is defined using the grammar for "hostport" from RFC 3261 [3], which defines the familiar notation of the IP address and port separated by a colon. The priority of the new candidate MUST be set to the priority of the native candidate that was being validated by the Binding Request. The agent SHOULD assign a new candidate ID to this candidate. Though this is a valid transport address, the agent does not pair it up with each of the remote transport addresses. Rather, it pairs it up only with the remote transport address from the transport address pair that was being validated. This creates a new transport address pair. Since connectivity has been verified in the send direction, the agent sets its state to send-valid. As with all other transport address pairs, the agent will attempt to validate receive capabilities by waiting for a a STUN Binding Request according to the procedures in Section 7.4.1.2. It is important to note that this process creates a new native transport address, not a whole new candidate. For a whole native candidate to come into existence, all of its component transport addresses must come into existence, and all must have been obtained as a result of a STUN Binding Requests between transport address pairs in the same pairing. 7.4.2 TCP Connectivity Checks 7.4.2.1 Connection Establishment Because of the connection-oriented nature of TCP, the connectivity checks work differently. After the offer/answer exchange completes, each agent will have a set of TCP candidates at which it is waiting Rosenberg Expires January 18, 2006 [Page 26] Internet-Draft ICE July 2005 to receive a connection on, and it will have a similar set from its peer. Thus, a pairing of TCP candidates allows for the possibility of TCP connections in each direction. Unlike the UDP checks, where the STUN packets are sent from the native transport addresses to the remote ones, the TCP connections are not opened from the native TCP transport addresses to the remote ones. This would represent a simultaneous open, and represent an unusual condition that would either fail, or at best result in a single TCP connection. Rather, ICE desires to attempt two connections, one in each direction, and use one of them if both happen to succeed. To accomplish this, each agent will attempt to open a connection to each remote transport address in the transport address pair, and do so "from" its native transport address. Here, however, "from" means something different than the UDP case. If the native transport address is a local transport address, the agent opens the TCP connection from the same IP interface used to obtain the local transport address, but from a different and ephemeral port. Indeed, that port MUST NOT be the same as the port in the local transport address. If the native transport address is a TURN-derived TCP transport address, no attempt is made to open a connection at all. TURN-derived TCP transport addresses can only be used in passive mode. As such, for each TCP transport address pair, there will be either zero, one, or two connection attempts. If the transport address pairs are both TURN-derived, there will be zero (both sides passive). If one of the transport addresses is local, and the other TURN derived, there will be one connection attempt. The agent owning the local transport address will be in active mode, and the agent owning the TURN-derived one will be in passive mode. If both are local transport address, there will be two attempts, and each agent will act in active mode. Because a transport address pair can produce multiple connections, validity becomes a property of the TCP connection itself. A transport address pair is considered valid if at least one valid connection has been established within it. An entire pairing is valid if all transport address pairs are valid. 7.4.2.2 Sending STUN Binding Requests Once the connection is established, the agent which opened the connection (that is, acted in active mode) sends a STUN Binding Request over that connection. STUN Binding Requests as described in RFC 3489 are not normally sent over UDP, but when used in conjunction with ICE for connectivity checks, they are sent over TCP. Rosenberg Expires January 18, 2006 [Page 27] Internet-Draft ICE July 2005 This unusual operation requires some explanation. At first glance, a successful TCP connection ought to be sufficient. Clearly, connectivity is established, as TCP packets were exchanged in both directions via the TCP handshake. While that is true, the STUN Binding Requests serve many purposes, only one of which is to literally test connectivity. The STUN requests also serve as a correlation vehicle, allowing the agent to match the source of a connection attempt with the offer/answer signaling driving the entire mechanism. For example, in the case of a forked SIP INVITE carrying an offer, the UAC may receive two connection attempts to each of its passive TCP addresses, one from each branch of the fork. These are readily disambiguated by the STUN Binding Request which will follow, as the tid in the USERNAME tells the UAC which branch has initiated the connection. More importantly, however, the STUN Binding Request is an essential part of the security properties of ICE. Without it, an entity eavesdropping the signaling messages would be able to deny service or hijack media connections, and such attacks would require encryption of the offer/answer exchanges (using a mechanism like SIPS [3]) to prevent. However, when a STUN Binding Request exchange is added, these attacks are completely foiled without the need for SIPS, raising the overall security of ICE substantially with minimal cost. These properties of ICE are discussed thoroughly in Section 12. As such, once an agent has actively opened a TCP connection to the remote agent, it sends a STUN Binding Request over that connection. Recall that STUN messages include length indicators, allowing them to be framed over a connection-oriented transport protocol. The Binding Request MUST contain the USERNAME attribute. This attribute MUST be set to the transport address pair ID of the corresponding transport address pair as seen by its peer. Thus, for the first transport address pair in Figure 2, if the agent on the left sends the STUN Binding Request, the USERNAME will have the value YW. The request MAY contain the MESSAGE-INTEGRITY attribute, computed according to RFC 3489 procedures. The MESSAGE-INTEGRITY The Binding Request MUST NOT contain the CHANGE-REQUEST or ANSWER-ADDRESS attribute. The STUN BindingRequest message SHOULD NOT be retransmitted over the connection. The STUN will generate either a timeout, or a response. If the response is a 420, 500, or 401, the agent should try again as described in RFC 3489. Either initially, or after such a retry, the STUN transaction might produce a non-recoverable failure response (error codes 400, 431, or 600) or a failure result inapplicable to this usage of STUN and thus unrecoverable (432, 433). If this happens the connection is considered invalid. If the STUN transaction produces a 430 error or times out, the client SHOULD Rosenberg Expires January 18, 2006 [Page 28] Internet-Draft ICE July 2005 retry with a new STUN Binding Request transaction. The 430 response code is a result of a failed race between the BindingRequest and the answer. This is remedied by retrying, which allows the "slower" answer to be received. These retry transactions carry the same USERNAME value as the original Binding Request, and differ only in their STUN transaction ID. If these retries have not produced a success response after Tg seconds, the connection is considered invalid. Tg SHOULD be configurable. It is RECOMMENDED that it default to 50 seconds. This is a reasonable approximation of the maximum SIP transaction duration. If the STUN Binding Request generates a successful response, the connection over which it was sent is considered valid. Furthermore, the agent stores the IP address and port from the MAPPED-ADDRESS response in the STUN Binding Response. This is called the "apparent" native transport address for the active side of the connection. It will be used later if this connection is used for media transport. Once a connection is valid, the agent which initiated the connection MUST generate a new STUN Binding Request transaction every Tr seconds. This transaction ensures that NAT bindings for the connection remain open while the connection is under consideration as a candidate. Tr SHOULD be configurable, and SHOULD default to 15 seconds. Each new Binding Request transaction is processed according to the procedures in this section. It is possible for a previously valid candidate to later be invalidated by a subsequent STUN transaction. This happens in cases where the NAT bindings expire. Note that, unlike the UDP case, STUN is sent only while a connection is is not active for media. If the connection is used as the active connection for media, STUN MUST NOT be sent. 7.4.2.3 Receiving STUN Requests When an agent acted as the passive side of a TCP connection, it will receive a STUN Binding Request over that connection. One of the candidates will be in use as the active candidate. For the transport addresses comprising that candidate, the agent will receive both STUN requests and media packets on its associated local transport addresses. The agent MUST be able to disambiguate them. In the case of RTP/RTCP, this disambiguation is easy. RTP and RTCP packets start with the bits 0b10 (v=2). The first two bits in STUN are always 0b00. This disambiguation also works for packets sent using Secure RTP [23], since the RTP header is in the clear. Disambiguating STUN with other media stream protocols may be more complicated. However, it can always be possible with arbitrarily high probabilities by selecting an appropriately random username (see below). Rosenberg Expires January 18, 2006 [Page 29] Internet-Draft ICE July 2005 The STUN Binding Request can only be usefully processed once an offer/answer exchange has completed. As a result, if an offeror receives a STUN Binding Request message prior to the receipt of an answer to its offer, it MUST reject the request with a 430 response. This will cause the answerer to retry, and give time for the answer (which is in transit) to arrive at the offerer. If the offer/answer exchange has completed, the agent MUST follow the procedures defined in RFC 3489 and verify that the USERNAME attribute is known to the server. Here, this is done by taking the USERNAME attribute, and comparing it against the transport address pair identifiers for each transport address pair as seen by that agent. If there is no match, the STUN Binding Request generates a 400. If there is a match, the resulting transport address pair is called the matching transport address pair. The user agent proceeds with the processing of the request and generation of a response as per RFC 3489. In addition, the agent stores the source IP address and port of the Binding Request, and associates it with the connection. This address is called the "apparent" remote transport address for this connection. An agent will continue to receive periodic STUN transactions as long as it had listed its transport address in an a=candidate attribute. It MUST process those transactions according to this section. It is possible that a transport address pair that was previously valid may become invalidated as a result of a subsequent failed STUN transaction. Note that, unlike the UDP case, there will never be simultaneous transmission of media and STUN packets over TCP connections. This is because the connection is listed as on hold according to comedia procedures, and no media will be transmitted. ICE will establish the connections as described here. Once established, an updated offer/ answer exchange can promote those connections to active usage through the comedia "exist" mechanism, as described below. The additional offer/answer exchange provides a barrier synchronization point at which a TCP connection switches from ICE control to control by the media source and sinks. Once it is active, STUN packets will no longer be sent on the connection. 7.5 Promoting a Valid Candidate to Active 7.5.1 Minimum Requirements As the STUN connectivity checks run, they will result in the validation of pairings. Once validated, a pairing can be used by promoting it to active. This promotion occurs by placing the transport addresses for the native candidate of the pairing into the Rosenberg Expires January 18, 2006 [Page 30] Internet-Draft ICE July 2005 m/c line and sending an updated offer. It MAY promote a candidate associated with any validated pairing at any time, as long as the candidate had been provided in series of a=candidate attributes in the most recent offer (in other words, an agent can't validate a candidate, omit that candidate from the a=candidate attribute of an offer, and then later on, generate a new offer that promotes the candidate to active). The procedures for doing so are described here. Any candidates which the agent would like to retain as valid candidates are also included in a=candidate lines in the offer. It SHOULD include any candidates learned from the peer-to-peer discovery processing of Section 7.4.1.3, and SHOULD include any candidates of higher priority than the one just promoted to active. It SHOULD omit candidates of lower priority than the one being promoted to active. It SHOULD omit any for whom all pairings that include that candidate have become invalid. If a candidate is omitted, and that candidate was a TURN-derived transport address, the agent SHOULD de-allocate the address from the TURN server. If a local candidate was omitted, along with all of its derived transport addresses, local operating system resources for that candidate SHOULD be de-allocated. Once it has decided on the set of candidates to provide in the updated offer, the agent constructs the offer and follows the procedures in Section 7.6 which defines general subsequent offer/ answer processing. 7.5.2 Suggested Algorithm ICE leaves substantial variability to implementors around when an agent decides to generate a new offer. However, there are good ways to do this, and bad ways. Perhaps the worst algorithm possible would be to generate a new offer every time a candidate with higher priority than the active one becomes valid. This algorithm will likely result in a large number of offer/answer exchanges in rapid succession, many of which will produce "glare" as each agent will independently initiate an exchange. This will consume CPU and network resources for little benefit. Rather, the ideal algorithm strikes a balance between usage of network resources and the desire to use the ideal pair of candidates. The following algorithm provides a good tradeoff, and usage of this algorithm is RECOMMENDED. The algorithm results in a bounded number of additional offer/answer exchanges after the initial one - never more than two, and frequently one or zero. The algorithm almost never produces a glare condition. Rosenberg Expires January 18, 2006 [Page 31] Internet-Draft ICE July 2005 Once the initial offer/answer exchange completes, media flow will happen, though not optimally (where optimal is defined by the policies used to set the priorities of the candidates), as long as the candidate that is active has been validated. Thus, the objective of the algorithm is to quickly make sure that there is a valid path for media (to avoid clipping), and then do a single offer/answer exchange to use the highest priority pairing that was validated. After the initial offer/answer exchange, each agent sets a timer Tu. This timer SHOULD have a configurable baseline value, which SHOULD default to 3 seconds. The actual timer is set to this baseline, plus a time value chosen uniformly beween -1 and 1 seconds. This causes the actual timer to be randomized so that the timer doesnt fire simultaneously at each agent. In addition, each agent monitors the status of the active pairing. If the active media stream is UDP- based, the status of the active candidates is equal to the status of the pairing with matching transport addresses. In the case of TCP- based media, the active media stream is never active initially, since it always begins with the "holdconn" state. If, when Tu fires, the active pairing has not been validated, and there exists at least one pairing that has been validated, the agent generates a new offer. This offer promotes its highest priority candidate with a validated pairing to the active candidate. If there are no pairings that have been validated when the timer fires, the agent waits until one is validated, and once that happens, sets a timer to fire randomly between 0 and 2 seconds. When the timer fires, a new offer is generated that promotes the candidate from this validating pairing to active. If the active pairing is validated when the timer fires, the agent does nothing at this time. If new offer is to be sent, the agent includes the new active candidate in the a=candidate attribute list. It also includes all candidates with higher priority than the one that is active, including ones it learned from the connectivity checks themselves. At this point, media is flowing successfully, since a valid candidate is active. However, it may not be optimal. So, the next stage of the algorithm is to let the connectivity checks continue. If those checks indicate that a pairing between the two highest priority candidates from both agents has been validated, each agent sets a timer whose value is randomly set between 0 and 2 seconds. When the timer fires, a new offer is generated that promotes the candidate from this validating pairing to active. Otherwise, when the connectivity checks have all concluded, such that no pairing exists in the invalid state, each agent sets a timer whose value is randomly set between 0 and 2 seconds. When the timer fires, a new offer is generated that promotes the candidate from the valid pairing with the Rosenberg Expires January 18, 2006 [Page 32] Internet-Draft ICE July 2005 highest priority to active. 7.6 Subsequent Offer/Answer Exchanges An offer/answer exchange within a session can occur at any time, whether it is the result of the algorithm described in Section 7.5.2, or because one of the agents wishes to add or remove a media stream, or add a codec, and so on. 7.6.1 Sending of an Offer The meaning of a=candidate attributes within a subsequent offer have the same meaning they do in an initial offer. They are a request for the peer to attempt (or continue to attempt if the candidate was provided previously) a connectivity check using STUN from each of its own candidates. As such, an a=candidate attribute is included in subsequent offers when (1) connectivity checks haven't concluded yet to that candidate, or (2) the checks have concluded, and the candidate is currently active. In that case, STUN is used to keep the bindings active. If an agent sends an offer which omits candidates it had sent to its peer previously, it MUST cease connectivity checks from that candidate. Any pairings that include the absent native candidate are discarded. Any STUN transactions in progress from that candidate are immediately terminated - no further retransmissions take place, and no further transactions from that candidate will be made. If a TCP connection was opened to or from that candidate, and that connection is not listed as the active one in the offer, the connection is torn down. The offer MAY contain a new active candidate in the m/c line. If the new active transprot address is UDP, candidate is encoded into an update offer as described in Section 7.2. The transport addresses constituting the candidate SHOULD also be listed in a=candidate attributes, so that STUN can be used as an ongoing keepalive. If the new active transport address is TCP, it is more complicated. Recall that each TCP connection is opened from one of the agents to the other, such that, for each connection, one agent has the active role, and the other, the passive. The ICE mechanisms allow the active agent to actually choose a specific connection for use in an offer, so long as the agent has used a different ephemeral port for each connection it initiated (which is almost always the case). If, however, an agent was in the passive role, it cannot choose a specific connection. Rather, it can choose a specific native transport address which may have been used to receive multiple connections. This assymetric behavior brings with it some important Rosenberg Expires January 18, 2006 [Page 33] Internet-Draft ICE July 2005 security properties, which are discussed in Section 12. If the agent was the active one and established the connection, it includes its apparent native transport address in the m/c line of the SDP (recall that this address was discovered via the STUN exchange over the connection). Note that this is instead of the SHOULD- strength recommendation in comedia, which recommends that the port number sent by the entity which initiated the connection should be '9'. The actual port number is present to facilitate identification of the connection. The a=setup attribute MUST be present and MUST contain the value "active". The a=connection attribute MUST be present and MUST have the value of "existing". If the agent was the passive one and was the recipient of the connection, it includes its transport address in the m/c line of the SDP. In this case, that address will be the same as the one it had placed into the a=candidate line of the SDP. The a=setup attribute MUST be present and MUST contain the value of "passive". The a=connection attribute MUST be present and MUST have the value of "existing". 7.6.2 Receiving the Offer and Sending an Answer If an agent receives an updated offer with a=candidate attributes, it checks to see if it already knows about the listed candidates. This is done by comparing the tid with the candidates it had received in the previous offer or answer from the peer. If the tid is already known, processing for that candidate continues as if no offer had been made. Any connectivity checks in progress continue, and any ongoing STUN keepalives continue. If a candidate which had been listed previously is no longer present in the offer, this tells the answerer to cease connectivity checks. Any pairings that include the absent remote candidate are discarded. Any STUN transactions in progress to that candidate are immediately terminated - no further retransmissions take place, and no further transactions to that candidate will be made. If a TCP connection was opened to or from that candidate, and that connection is not listed as the active one in the offer, the connection is torn down. The agent then sends its answer. Like the offerer, it can add or remove candidates from its answer. If it removed candidates from its answer, it ceases STUN connectivity checks from those candidates, and any pairings that include those candidates are discarded. Any STUN transactions in progress to that candidate are immediately terminated - no further retransmissions take place, and no further transactions to that candidate will be made. If a TCP connection was opened to or from that candidate, and that connection is not listed as the active Rosenberg Expires January 18, 2006 [Page 34] Internet-Draft ICE July 2005 one in the answer, the connection is torn down. After transmission of the answer, there may be a set of candidates which were new in the offer, and a set that were new in the answer. The agent begins connectivity checks as described in Section 7.4, pairing each new candidate in its answer with all candidates in the offer, and each new candidate in the offer with all of its candidates in the answer. The m/c line may have also changed, indicating a new active candidate. If the m/c line contains a UDP stream, the agent begins sending media to the transport addresses listed there. In addition, it checks to see if those transport addresses correspond to a remote candidate in a valid pairing. So long as the remote agent has offered up a candidate that has been validated by ICE, it should be the case. Indeed, there may be a multitude of valid pairings containing the transport addresses in the m/c line as the remote candidate. In that case, the agent MUST choose the pairing whose native candidate has the highest priority. It MUST place this candidate in the m/c line. Transmission of media occurs as defined in Section 7.8. If the m/c line has changed, and now indicates a new TCP candidate, the agent examines it. The comedia "a=connection" attribute will normally be present and normally contain the value of "existing". If not present, or if present but with a value of "new", comedia process is followed, as apparently the peer has abandoned ICE operation for this media stream. Assuming it contains a value of "existing", the agent looks at whether the a=setup attribute is present. If its value is "active", it means that a connection that was initiated by the remote agent is to be used. The agent examines the transport address in the m/c line. It looks for a matching value in the apparent remote transport addresses of existing connections. If it matches multiple connections (though it should normally match just one), one of those connections is chosen. The native transport address of that connection is then placed into the m/c line of the answer. If no existing connections where matched, an error has occured. The agent SHOULD respond with "holdconn", and then generate its own offer with a connection to the peer which it believes is valid. If the a=setup attribute had a value of "passive", it means that a connection that was initiated by the agent itself is to be used. The agent examines the transport address in the m/c line. It looks for a matching value amongst the remote transport addresses in valid pairings. If multiple pairings match, it MUST choose the one whose native transport address has the highest priority. The apparent native transport address associated with an active connection Rosenberg Expires January 18, 2006 [Page 35] Internet-Draft ICE July 2005 initiated by the agent is then placed into the m/c line, and that TCP connection is used to send and receive media. If no pairings match, an error has occured. The agent SHOULD respond with "holdconn", and then generate its own offer with a connection to the peer which it believes is valid. 7.6.3 Receiving the Answer If an agent receives an answer with a=candidate attributes, it checks to see if it already knows about the listed candidates. This is done by comparing the tid with the candidates it had received in the previous offer or answer from the peer. If the tid is already known, processing for that candidate continues as if no offer had been made. Any connectivity checks in progress continue, and any ongoing STUN keepalives continue. If a candidate which had been listed previously is no longer present in the answer, this tells the offerer to cease connectivity checks. Any pairings that include the absent remote candidate are discarded. Any STUN transactions in progress to that candidate are immediately terminated - no further retransmissions take place, and no further transactions to that candidate will be made. If a TCP connection was opened to or from that candidate, and that connection is not listed as the active one in the answer, the connection is torn down. Furthermore, there may be a set of candidates which were new in the offer, and a set that were new in the answer. The agent begins connectivity checks as described in Section 7.4, pairing each new candidate in its offer with all candidates in the answer, and each new candidate in the answer with all of its candidates in the offer. The m/c line may have also changed, indicating a new active candidate. If the m/c line contains a UDP stream, the agent begins sending media to the transport addresses listed there as defined in Section 7.8. It will send from the m/c line it had signaled in the offer. If the m/c line has changed, and now indicates a new TCP candidate, the agent examines it. If the agent had, in its offer, indicated the desire to use a specific connection that it had initiated, it would have used the a=connection attribute with the value of "existing", and the a=setup attribute with the value of "active", and have placed its apparent native transport address in the m/c line. In that case, the m/c line in the answer will normally have the a=connection attribute with the value "existing", which means that the remote agent agrees with the usage of that connection. The transport addresses in the m/c line should correspond to the remote transport addresses that the agent had initiated its connection to. If so, Rosenberg Expires January 18, 2006 [Page 36] Internet-Draft ICE July 2005 that connection is used. If the agent had, in its offer, indicated the desire to use any connection that had been established to a specific native transport address, it would have, in its offer, used the a=connection attribute with the value of "existing" and the a=setup attribute with the value of "passive", and placed that address in the m/c line. In that case, the m/c line in the answer will normally have the a=connection attribute with the value of "existing" and the a=setup attribute with the value of "active". The transport address in the m/c line will correspond to the apparent remote transport address. The agent MUST scan its existing connections to the native transport address it had advertised in the offer, and find the one whose apparent remote transport address matches the m/c line in the answer. If there is a match, that connection is used for sending media. If there is no match, an error has occurred. 7.7 Binding Keepalives Once the candidates are promoted to active, and media begins flowing, it is still necessary to keep the bindings alive at intermediate NATs for the duration of the session. Normally, the RTP packets themselves meet this objective. However, several cases merit further discussion. Firstly, in some RTP usages, such as SIP, the media streams can be "put on hold". This is accomplished by using the SDP "sendonly" or "inactive" attributes, as defined in RFC 3264 [4]. RFC 3264 directs implementations to cease transmission of media in these cases. However, doing so may cause NAT bindings to timeout, and media won't be able to come off hold. Secondly, some RTP payload formats, such as the payload format for text conversation [28], may send packets so infrequently that the interval exceeds the NAT binding timeouts. Thirdly, if silence suppression is in use, long periods of silence may cause media transmission to cease sufficiently long for NAT bindings to time out. To prevent these problems, ICE implementations MUST continue to list their active transport addresses as candidates in a=candidate lines. As a consequence of this, STUN packets will be transmitted periodically independently of the transmission (or lack thereof) of media packets. This provides a media independent, RTP independent, and codec independent solution for keeping the NAT bindings alive. If an ICE implementation is communciating with one that does not support ICE, keepalives MUST still be sent. In that case, it is RECOMMENDED that an agent support the RTP No-Op payload format [15], Rosenberg Expires January 18, 2006 [Page 37] Internet-Draft ICE July 2005 and send it at least once every 20 seconds if media is not otherwise being sent. This No-Op MUST be sent even if the media stream is inactive or recvonly. 7.8 Sending Media When an agent sends media packets, it MUST send them from the same IP address and port it has advertised in the m/c-line. This provides a property known as symmetry, which is an essential facet of NAT travresal. In the case of a STUN-derived transport address, this means that the RTP packets are sent from the local transport address used to obtain the STUN address. In the case of a TURN-derived transport address, this means that media packets are sent through the TURN server (using the TURN SEND primitive). For local transport addresses, media is sent from that local transport address. This symmetric behavior MUST be followed by an agent even if its peer in the session doesn't support ICE. 8. Interactions with Forking SIP allows INVITE requests carrying offers to fork, which means that they are delivered to multiple user agents. Each of those user agents then provides an answer to the offer in the INVITE. The result is that a single offer generated by the UAC produces multiple answers. ICE interacts very well with forking. Indeed, ICE fixes some of the problems associated with forking. Once the offer/answer exchange has completed, the UAC will have an answer from each UAS that received the INVITE. The ICE connectivity checks that ensue will carry tids that correlate each of those checks (and thus their corresponding source IP address and port or TCP connection) with a specific remote user agent. As these checks happen before any media is transmitted, ICE allows a UAC to disambiguate subsequent media traffic, and corelate that traffic with a particular remote UA. When SIP is used without ICE, the incoming media traffic cannot be disambiguated without an additional offer/answer exchange. 9. Interactions with Preconditions Because ICE involves multiple addresses and pre-session activities, its interactions with preconditions [10] merits further discussion. Quality of Service (QoS) preconditions, which are defined in RFC 3312, apply only to the IP addresses and ports listed in the m/c Rosenberg Expires January 18, 2006 [Page 38] Internet-Draft ICE July 2005 lines in an offer/answer. If ICE changes the address and port where media is received, this change is reflected in the m/c lines of a new offer/answer. As such, it appears like any other re-INVITE would, and is fully treated in RFC 3312, which applies without regard to the fact that the m/c lines are changing due to ICE negotiations ocurring "in the background". ICE also has (purposeful) interactions with connectivity preconditions [12]. As described there, the precondition is satisfied once ICE has verified that there exists a valid path of connectivity for each media stream to which the precondition applies. More specifically, it is satisfied when there is at least one valid UDP transport address pairing or TCP connection for such a media stream. Furthermore, when a subsequent offer is made to promote one of those valid transport address pairings or connections into the m/c-line, the preconditions is marked as met in that same offer/ answer exchange. 10. Example In the example that follows, messages are labeled with "message name A,B" to mean a message from transport address A to B. For STUN Requests, this is followed by curly brackets enclosing the username (which is also the password). For STUN answers, this is followed by square brackets containing the value of MAPPED ADDRESS. The example shows a flow of two agents where one is behind a full cone NAT, and the other is behind a symmetric NAT. TODO: Fill in. This is a big complicated flow! 11. Grammar This specification defines a new SDP attribute. It is called "candidate". The candidate attribute MUST be present within a media block of the SDP. It contains a transport address for a candidate that can be used for connectivity checks. There MAY be multiple candidate attributes in a media block. The syntax of this attribute is: Rosenberg Expires January 18, 2006 [Page 39] Internet-Draft ICE July 2005 candidate-attribute = "candidate" ":" candidate-id SP tid SP transport SP qvalue SP ;qvalue from RFC 3261 addr SP port SP ;addr, port from RFC 2327 transport = "UDP" / "TCP" / transport-extension transport-extension = token candidate-id = 1*DIGIT id = non-ws-string The candidate-id is used to group together the transport addresses for a particular candidate. It MUST be a positive integer whose value is less than (2^31 -1). It MUST have the same value for all transport addresses within the same candidate. It MUST have a different value for transport addresses within different candidates for the same media stream. The tid production contains an identifier, chosen with 128 bits of randomness, that identifies the transport address. The tid of a pair of transport addresses is combined to for the username and password of a STUN request from one transport address to another. The transport production indicates the transport protocol for the candidate. This can be either UDP or TCP. Extensibility is provided to allow for future transport protocols to be used with ICE, such as the Datagram Congestion Control Protocol (DCCP) [26]. The unicast-address production is from RFC 2327, and contains the IPv4 or IPv6 address of the candidate. The port production contains its port. 12. Security Considerations There are numerous threats in a system using ICE. This section overviews these threats and discusses how they are mitigated. STUN itself introduces many security considerations, which receive an extensive treatment in RFC 3489. STUN is used within ICE in two ways - one, as a technique for address gathering, and two, as a peer-to- peer connectivity check. All of the security considerations of RFC 3489 apply directly to the former usage. However, the latter usage, as a peer-to-peer connectivity check, is sufficiently different that a discussion of its security considerations is appropriate. It remains the case that many attacks are rooted in a single primitive - an attacker attempts to inject a STUN response with an invalid MAPPED-ADDRESS attribute. In the usages of STUN described in RFC 3489, this injection can occur as a result of compromises of STUN servers, attacks on the DNS, rogue NATs, injection of faked responses coupled with a dos attack, and replaying modified requests. With Rosenberg Expires January 18, 2006 [Page 40] Internet-Draft ICE July 2005 peer-to-peer STUN, compromises of STUN servers are not much of a concern, since the STUN servers are embedded in endpoints and distributed throughout the network. Thus, compromising the STUN server is equivalent to comprimising the endpoint, and if that happens, far more problematic attacks are possible than those against ICE. Similarly, DNS attacks are irrelevant since STUN servers are not discovered via DNS, they are signaled via SIP. Rogue NATs, injection of fake responses and relaying modified requests all can be handled in ICE with the countermeasures discussed below. Consider an attacker that intercepts a STUN packet used for connectivity checks, and replays it using its own source address. If successful, this would fool an endpoint into thinking that this faked source address was a valid destination for media (recall that the source transport address of received STUN packets is used as a potential candidate address). However, the recipient of the replayed packet will not just send media to that candidate. It will verify it with a STUN connectivity check. This check will be sent to that faked source address, and if there is no answer, the address will not be used. The attacker cannot answer the STUN request without access to the username and password, which are exchanged as part of the signaling. Thus, if the signaling is protected as recommended above, the attacker cannot obtain the username or password. If an attacker instead intercepts and replays STUN packets used for the purposes of unilateral allocation, a similar result occurs. The target of the attack will be fooled into thinking it has a STUN derived transport address that it does not. Its peer will perform a connectivity check to this address, which will fail. The attacker cannot force this check to succeed without access to the username and password, which are protected. Thus, this address will not be used. In the worst case, an attacker can generate enough traffic so that none of the valid STUN checks or unilateral allocations succeed. This would result in a service disruption. However, this attack is no worse than any pure packet flood disruption attack launched against any other protocol. These attacks cannot be prevented by any protocol means. If an attacker could intercept and modify the contents of the Offer or Accept messages, they could disrupt the session, divert the media, and otherwise take control over the session. This attack is prevented by encryption, authentication and message integrity of the signaling channel used for ICE. SIP-based implementations of ICE SHOULD use the sips URI scheme when transporting SDP with ICE information, and MAY use S/MIME [3]. Rosenberg Expires January 18, 2006 [Page 41] Internet-Draft ICE July 2005 13. IANA Considerations This specification defines one new SDP attribute per the procedures of Appendix B of RFC 2327. The required information for the registration is: Contact Name: Jonathan Rosenberg, jdrosen@jdrosen.net. Attribute Name: candidate Long Form: candidiate Type of Attribute: media level Charset Considerations: The attribute is not subject the the charset attribute. Purpose: This attribute is used with Interactive Connectivity Establishment (ICE), and provides one of many possible candidate addresses for communication. These addresses are validated with an end-to-end connectivity check using Simple Traversal of UDP with NAT (STUN). Appropriate Values: See Section 11 of RFC XXXX [Note to RFC-ed: please replace XXXX with the RFC number of this specification]. 14. IAB Considerations The IAB has studied the problem of "Unilateral Self Address Fixing", which is the general process by which a agent attempts to determine its address in another realm on the other side of a NAT through a collaborative protocol reflection mechanism [21]. ICE is an example of a protocol that performs this type of function. Interestingly, the process for ICE is not unilateral, but bilateral, and the difference has a signficant impact on the issues raised by IAB. The IAB has mandated that any protocols developed for this purpose document a specific set of considerations. This section meets those requirements. 14.1 Problem Definition From RFC 3424 any UNSAF proposal must provide: Precise definition of a specific, limited-scope problem that is to be solved with the UNSAF proposal. A short term fix should not be generalized to solve other problems; this is why "short term fixes usually aren't". Rosenberg Expires January 18, 2006 [Page 42] Internet-Draft ICE July 2005 The specific problems being solved by ICE are: Provide a means for two peers to determine the set of transport addresses which can be used for communication. Provide a means for resolving many of the limitations of other UNSAF mechanisms by wrapping them in an additional layer of processing (the ICE methodology). Provide a means for a agent to determine an address that is reachable by another peer with which it wishes to communicate. 14.2 Exit Strategy From RFC 3424, any UNSAF proposal must provide: Description of an exit strategy/transition plan. The better short term fixes are the ones that will naturally see less and less use as the appropriate technology is deployed. ICE itself doesn't easily get phased out. However, it is useful even in a globally connected Internet, to serve as a means for detecting whether a router failure has temporarily disrupted connectivity, for example. However, what ICE does is help phase out other UNSAF mechanisms. ICE effectively selects amongst those mechanisms, prioritizing ones that are better, and deprioritizing ones that are worse. Local IPv6 addresses can be preferred. As NATs begin to dissipate as IPv6 is introduced, derived transport addresses from other UNSAF mechanisms simply never get used, because higher priority connectivity exists. Therefore, the servers get used less and less, and can eventually be remove when their usage goes to zero. Indeed, ICE can assist in the transition from IPv4 to IPv6. It can be used to determine whether to use IPv6 or IPv4 when two dual-stack hosts communicate with SIP (IPv6 gets used). It can also allow a network with both 6to4 and native v6 connectivity to determine which address to use when communicating with a peer. 14.3 Brittleness Introduced by ICE From RFC3424, any UNSAF proposal must provide: Discussion of specific issues that may render systems more "brittle". For example, approaches that involve using data at multiple network layers create more dependencies, increase debugging challenges, and make it harder to transition. Rosenberg Expires January 18, 2006 [Page 43] Internet-Draft ICE July 2005 ICE actually removes brittleness from existing UNSAF mechanisms. In particular, traditional STUN (the usage described in RFC 3489) has several points of brittleness. One of them is the discovery process which requires a agent to try and classify the type of NAT it is behind. This process is error-prone. With ICE, that discovery process is simply not used. Rather than unilaterally assessing the validity of the address, its validity is dynamically determined by measuring connectivity to a peer. The process of determining connectivity is very robust. The only potential problem is that bilaterally fixed addresses through STUN can expire if traffic does not keep them alive. However, that is substantially less brittleness than the STUN discovery mechanisms. Another point of brittleness in STUN, TURN, and any other unilateral mechanism is its absolute reliance on an additional server. ICE makes use of a server for allocating unilateral addresses, but allows agents to directly connect if possible. Therefore, in some cases, the failure of a STUN or TURN server would still allow for a call to progress when ICE is used. Another point of brittleness in traditional STUN is that it assumes that the STUN server is on the public Internet. Interestingly, with ICE, that is not necessary. There can be a multitude of STUN servers in a variety of address realms. ICE will discover the one that has provided a usable address. The most troubling point of brittleness in traditional STUN is that it doesn't work in all network topologies. In cases where there is a shared NAT between each agent and the STUN server, traditional STUN may not work. With ICE, that restriction can be lifted. Traditional STUN also introduces some security considerations. Fortunately, those security considerations are also mitigated by ICE. 14.4 Requirements for a Long Term Solution From RFC 3424, any UNSAF proposal must provide: Identify requirements for longer term, sound technical solutions -- contribute to the process of finding the right longer term solution. Our conclusions from STUN remain unchanged. However, we feel ICE actually helps because we believe it can be part of the long term solution. Rosenberg Expires January 18, 2006 [Page 44] Internet-Draft ICE July 2005 14.5 Issues with Existing NAPT Boxes From RFC 3424, any UNSAF proposal must provide: Discussion of the impact of the noted practical issues with existing, deployed NA[P]Ts and experience reports. A number of NAT boxes are now being deployed into the market which try and provide "generic" ALG functionality. These generic ALGs hunt for IP addresses, either in text or binary form within a packet, and rewrite them if they match a binding. This will interfere with proper operation of any UNSAF mechanism, including ICE. 15. Acknowledgements The authors would like to thank Douglas Otis, Francois Audet and Magnus Westerland for their comments and input. 16. References 16.1 Normative References [1] Rosenberg, J., Weinberger, J., Huitema, C., and R. Mahy, "STUN - Simple Traversal of User Datagram Protocol (UDP) Through Network Address Translators (NATs)", RFC 3489, March 2003. [2] Huitema, C., "Real Time Control Protocol (RTCP) attribute in Session Description Protocol (SDP)", RFC 3605, October 2003. [3] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002. [4] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with Session Description Protocol (SDP)", RFC 3264, June 2002. [5] Zopf, R., "Real-time Transport Protocol (RTP) Payload for Comfort Noise (CN)", RFC 3389, September 2002. [6] Mealling, M., "The IETF XML Registry", BCP 81, RFC 3688, January 2004. [7] Handley, M. and V. Jacobson, "SDP: Session Description Protocol", RFC 2327, April 1998. [8] Casner, S., "Session Description Protocol (SDP) Bandwidth Modifiers for RTP Control Protocol (RTCP) Bandwidth", RFC 3556, July 2003. Rosenberg Expires January 18, 2006 [Page 45] Internet-Draft ICE July 2005 [9] Rosenberg, J. and H. Schulzrinne, "Reliability of Provisional Responses in Session Initiation Protocol (SIP)", RFC 3262, June 2002. [10] Camarillo, G., Marshall, W., and J. Rosenberg, "Integration of Resource Management and Session Initiation Protocol (SIP)", RFC 3312, October 2002. [11] Camarillo, G., "The Alternative Network Address Types Semantics (ANAT) for theSession Description Protocol (SDP) Grouping Framework", draft-ietf-mmusic-anat-02 (work in progress), October 2004. [12] Andreasen, F., "Connectivity Preconditions for Session Description Protocol Media Streams", draft-ietf-mmusic-connectivity-precon-00 (work in progress), May 2005. [13] Yon, D., "Connection-Oriented Media Transport in the Session Description Protocol (SDP)", draft-ietf-mmusic-sdp-comedia-10 (work in progress), November 2004. [14] Rosenberg, J., "Traversal Using Relay NAT (TURN)", draft-rosenberg-midcom-turn-07 (work in progress), February 2005. [15] Andreasen, F., "A No-Op Payload Format for RTP", draft-ietf-avt-rtp-no-op-00 (work in progress), May 2005. 16.2 Informative References [16] Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time Streaming Protocol (RTSP)", RFC 2326, April 1998. [17] Senie, D., "Network Address Translator (NAT)-Friendly Application Design Guidelines", RFC 3235, January 2002. [18] Srisuresh, P., Kuthan, J., Rosenberg, J., Molitor, A., and A. Rayhan, "Middlebox communication architecture and framework", RFC 3303, August 2002. [19] Borella, M., Lo, J., Grabelsky, D., and G. Montenegro, "Realm Specific IP: Framework", RFC 3102, October 2001. [20] Borella, M., Grabelsky, D., Lo, J., and K. Taniguchi, "Realm Specific IP: Protocol Specification", RFC 3103, October 2001. [21] Daigle, L. and IAB, "IAB Considerations for UNilateral Self- Rosenberg Expires January 18, 2006 [Page 46] Internet-Draft ICE July 2005 Address Fixing (UNSAF) Across Network Address Translation", RFC 3424, November 2002. [22] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", RFC 3550, July 2003. [23] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC 3711, March 2004. [24] Carpenter, B. and K. Moore, "Connection of IPv6 Domains via IPv4 Clouds", RFC 3056, February 2001. [25] Huitema, C., "Teredo: Tunneling IPv6 over UDP through NATs", draft-huitema-v6ops-teredo-05 (work in progress), April 2005. [26] Kohler, E., "Datagram Congestion Control Protocol (DCCP)", draft-ietf-dccp-spec-11 (work in progress), March 2005. [27] Lazzaro, J., "Framing RTP and RTCP Packets over Connection- Oriented Transport", draft-ietf-avt-rtp-framing-contrans-05 (work in progress), January 2005. [28] Hellstrom, G., "RTP Payload for Text Conversation", draft-ietf-avt-rfc2793bis-09 (work in progress), August 2004. Author's Address Jonathan Rosenberg Cisco Systems 600 Lanidex Plaza Parsippany, NJ 07054 US Phone: +1 973 952-5000 Email: jdrosen@cisco.com URI: http://www.jdrosen.net Rosenberg Expires January 18, 2006 [Page 47] Internet-Draft ICE July 2005 Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Copyright Statement Copyright (C) The Internet Society (2005). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. Rosenberg Expires January 18, 2006 [Page 48]