MMUSIC J. Rosenberg Internet-Draft Cisco Systems Expires: December 28, 2006 June 26, 2006 Interactive Connectivity Establishment (ICE): A Methodology for Network Address Translator (NAT) Traversal for Offer/Answer Protocols draft-ietf-mmusic-ice-09 Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on December 28, 2006. Copyright Notice Copyright (C) The Internet Society (2006). Abstract This document describes a protocol for Network Address Translator (NAT) traversal for multimedia session signaling protocols based on the offer/answer model, such as the Session Initiation Protocol (SIP). This protocol is called Interactive Connectivity Establishment (ICE). ICE makes use of the Simple Traversal of UDP through NAT (STUN), applying its binding discovery, connectivity check and relay usages. Rosenberg Expires December 28, 2006 [Page 1] Internet-Draft ICE June 2006 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Overview of ICE . . . . . . . . . . . . . . . . . . . . . . . 4 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 15 4. Sending the Initial Offer . . . . . . . . . . . . . . . . . . 18 5. Receipt of the Offer and Generation of the Answer . . . . . . 19 6. Processing the Answer . . . . . . . . . . . . . . . . . . . . 19 7. Common Procedures . . . . . . . . . . . . . . . . . . . . . . 20 7.1. Gathering Candidates . . . . . . . . . . . . . . . . . . 20 7.2. Prioritizing the Candidates and Choosing an Operating One . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 7.3. Encoding Candidates into SDP . . . . . . . . . . . . . . 27 7.4. Forming Candidate Pairs . . . . . . . . . . . . . . . . . 31 7.5. Ordering the Candidate Pairs . . . . . . . . . . . . . . 33 7.6. Performing the Connectivity Checks . . . . . . . . . . . 36 7.7. Sending a Binding Request for Connectivity Checks . . . . 42 7.8. Receiving a Binding Request for Connectivity Checks . . . 44 7.9. Promoting a Candidate to Operating . . . . . . . . . . . 46 7.10. Learning New Candidates from Connectivity Checks . . . . 47 7.10.1. On Receipt of a Binding Request . . . . . . . . . . 47 7.10.2. On Receipt of a Binding Response . . . . . . . . . . 51 7.11. Subsequent Offer/Answer Exchanges . . . . . . . . . . . . 53 7.11.1. Sending of a Subsequent Offer . . . . . . . . . . . 53 7.11.2. Receiving the Offer and Sending an Answer . . . . . 56 7.11.3. Receiving the Answer . . . . . . . . . . . . . . . . 59 7.12. Binding Keepalives . . . . . . . . . . . . . . . . . . . 59 7.13. Sending Media . . . . . . . . . . . . . . . . . . . . . . 61 7.14. Receiving Media . . . . . . . . . . . . . . . . . . . . . 63 8. Guidelines for Usage with SIP . . . . . . . . . . . . . . . . 64 9. Interactions with Forking . . . . . . . . . . . . . . . . . . 66 10. Interactions with Preconditions . . . . . . . . . . . . . . . 67 11. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 67 11.1. Basic Example . . . . . . . . . . . . . . . . . . . . . . 68 11.2. Advanced Example . . . . . . . . . . . . . . . . . . . . 72 12. Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 13. Security Considerations . . . . . . . . . . . . . . . . . . . 95 13.1. Attacks on Connectivity Checks . . . . . . . . . . . . . 95 13.2. Attacks on Address Gathering . . . . . . . . . . . . . . 98 13.3. Attacks on the Offer/Answer Exchanges . . . . . . . . . . 99 13.4. Insider Attacks . . . . . . . . . . . . . . . . . . . . . 99 13.4.1. The Voice Hammer Attack . . . . . . . . . . . . . . 99 13.4.2. STUN Amplification Attack . . . . . . . . . . . . . 99 14. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 100 14.1. candidate Attribute . . . . . . . . . . . . . . . . . . . 100 14.2. remote-candidate Attribute . . . . . . . . . . . . . . . 100 14.3. ice-pwd Attribute . . . . . . . . . . . . . . . . . . . . 101 15. IAB Considerations . . . . . . . . . . . . . . . . . . . . . 101 Rosenberg Expires December 28, 2006 [Page 2] Internet-Draft ICE June 2006 15.1. Problem Definition . . . . . . . . . . . . . . . . . . . 102 15.2. Exit Strategy . . . . . . . . . . . . . . . . . . . . . . 102 15.3. Brittleness Introduced by ICE . . . . . . . . . . . . . . 103 15.4. Requirements for a Long Term Solution . . . . . . . . . . 104 15.5. Issues with Existing NAPT Boxes . . . . . . . . . . . . . 104 16. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 104 17. References . . . . . . . . . . . . . . . . . . . . . . . . . 105 17.1. Normative References . . . . . . . . . . . . . . . . . . 105 17.2. Informative References . . . . . . . . . . . . . . . . . 106 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 108 Intellectual Property and Copyright Statements . . . . . . . . . 109 Rosenberg Expires December 28, 2006 [Page 3] Internet-Draft ICE June 2006 1. Introduction RFC 3264 [4] defines a two-phase exchange of Session Description Protocol (SDP) messages [5] for the purposes of establishment of multimedia sessions. This offer/answer mechanism is used by protocols such as the Session Initiation Protocol (SIP) [2]. Protocols using offer/answer are difficult to operate through Network Address Translators (NAT). Because their purpose is to establish a flow of media packets, they tend to carry IP addresses within their messages, which is known to be problematic through NAT [15]. The protocols also seek to create a media flow directly between participants, so that there is no application layer intermediary between them. This is done to reduce media latency, decrease packet loss, and reduce the operational costs of deploying the application. However, this is difficult to accomplish through NAT. A full treatment of the reasons for this is beyond the scope of this specification. Numerous solutions have been proposed for allowing these protocols to operate through NAT. These include Application Layer Gateways (ALGs), the Middlebox Control Protocol [17], Simple Traversal of UDP through NAT (STUN) [14] and its revision [12], the STUN Relay Usage [13], and Realm Specific IP [18] [19] along with session description extensions needed to make them work, such as the Session Description Protocol (SDP) [5] attribute for the Real Time Control Protocol (RTCP) [1]. Unfortunately, these techniques all have pros and cons which make each one optimal in some network topologies, but a poor choice in others. The result is that administrators and implementors are making assumptions about the topologies of the networks in which their solutions will be deployed. This introduces complexity and brittleness into the system. What is needed is a single solution which is flexible enough to work well in all situations. This specification provides that solution for media streams established by signaling protocols based on the offer-answer model. It is called Interactive Connectivity Establishment, or ICE. ICE makes use of STUN and its relay extension, commonly called TURN, but uses them in a specific methodology which avoids many of the pitfalls of using any one alone. 2. Overview of ICE A typical architecture for an ICE deployment is shown in Figure 1. The figure shows two endpoints (known as agents in RFC 3264 terminology) which we call L and R (for left and right, which helps visualize call flows). Both L and R are behind a NAT. The type of Rosenberg Expires December 28, 2006 [Page 4] Internet-Draft ICE June 2006 NAT and its properties are unknown. Indeed, it is not known whether the agent is behind a NAT at all, or whether there are multiple NATs between it and the network. Agents A and B are capable of engaging in an offer/answer exchange [4] by which they can exchange SDP messages, whose purpose is to set up a media session between A and B. Of course, the offer/answer exchange itself must be capable of traversing the NAT. Such traversal is facilitated through signaling elements such as SIP servers, and is outside the scope of this specification. Different solutions are applied for traversal of the signaling that carries the offer/answer exchange, and for the media set up by that offer/answer exchange. This is because of the vastly different requirements on latency, packet loss, and overall bandwidth between the signaling and media. For example, usage of a signaling intermediary, such as a SIP proxy, as a relay for all signaling at all times, is acceptable, whereas usage of relays at all times for media is highly undesirable. In addition to the agents, a SIP server and NATs, ICE is typically used in concert with STUN servers in the network. Each agent can have its own STUN server, or they can be the same. +-------+ | SIP | +-------+ | Srvr | +-------+ | STUN | | | | STUN | | Srvr | +-------+ | Srvr | | | | | +-------+ +-------+ +--------+ +--------+ | NAT | | NAT | +--------+ +--------+ +-------+ +-------+ | Agent | | Agent | | L | | R | | | | | +-------+ +-------+ Rosenberg Expires December 28, 2006 [Page 5] Internet-Draft ICE June 2006 Figure 1 Prior to initiating an offer, the offering agent (L in this example) starts by performing a process known as address gathering. This process allows the client to obtain one or more transport addresses, one more of which might be viable addresses at which the agent can receive incoming media packets from the other agent, which we call its peer. A transport address is just the combination of an IP address and port. With ICE, an agent will actually provide its peer with all of its possible transport addresses, and ICE will figure out which one to actually use. Naturally, one viable transport address is one obtained directly from a local interface the client has towards the network. Such a transport address is called a local transport address. The local interface could be one on a local layer 2 network technology, such as ethernet or WiFi, or it could be one that is obtained through a tunnel mechanism, such as a Virtual Private Network (VPN) or Mobile IP (MIP). In all cases, these appear to the agent as a local interface from which ports (and thus transport addresses) can be allocated. If an agent is multihomed, it can obtain a transport address from each interface. Depending on the location of the peer on the IP network, the agent may be reachable through one of those interfaces, or through another. Consider, for example, an agent which has a local interface to a private net 10 network, and also to the public Internet. A transport address from the net10 interface will be directly reachable when communicating with a peer on the same private net 10 network, while a transport address from the public interface will be directly reachable when communicating with a peer on the public Internet. Rather than trying to guess which interface will work prior to sending an offer, the offering agent includes both transport addresses in its offer. Indeed, when using a media technology like the Real Time Transport Protocol (RTP), an agent needs two transport addresses on each interface - one for the RTP, and one for the Real Time Control Protocol (RTCP). Other media technologies may require a multiplicity of transport addresses to be used and treated as a bundle. Each of these transport addresses is called a component. There are two components in an RTP stream - the RTP itself, and the RTCP. In ICE, the set of transport addresses that represent an atomic grouping on which communications is possible is called a candidate. In the example so far, the agent would obtain two candidates - one from the net 10 interface, and one from the interface on the public Internet. Each candidate would contain two transport addresses, corresponding to each of the two components. Rosenberg Expires December 28, 2006 [Page 6] Internet-Draft ICE June 2006 Once the agent has obtained local transport addresses, it uses STUN to obtain additional transport addresses. To do this, it would send a STUN Binding Request, using the Binding Discovery Usage [12] or the Relay Usage [13] from a local transport address, to its STUN server. It is assumed that the address of the STUN server is configured, or learned in some way. Indeed, an agent might even have multiple STUN servers. As a consequence of communicating with the STUN server, the agent can learn potentially two new types of transport addresses - server reflexive transport addresses and relayed transport addresses. The relationship of these addresses to the local transport address is shown in Figure 2. To Internet | | | /------------ Relayed | / Address +--------+ | | | STUN | | Server | | | +--------+ | | | /------------ Server |/ Reflexive +------------+ Address | NAT | +------------+ | | /------------ Local |/ Address +--------+ | | | Agent | | | +--------+ Figure 2 The local transport address is resident on the agent itself. Through either the Binding Discovery Usage or the Relay Usage, the agent can discover its server reflexive transport address. This is the address on the public side of the NAT, facing the STUN server. It is the transport address allocated to the agent on the public side of the Rosenberg Expires December 28, 2006 [Page 7] Internet-Draft ICE June 2006 NAT as a consequence of the transmission of the STUN request through the NAT, to the STUN server. The NAT will allocate a binding, mapping this server reflexive transport address to the local transport address. Packets received at the NAT, targeted towards the server reflexive transport address, will have their destination address rewritten to the local transport address by the NAT, and then be forwarded to the agent. When there are multiple NATs between the agent and the STUN server, the STUN request will create a binding on each NAT, but only the outermost server reflexive transport address will be discovered by the agent. In addition, through the Relay Usage, the agent can request that the STUN server itself allocate a transport address from one of its local interfaces, and establish a binding that maps that transport address (called a relayed transport address, naturally) towards the source transport address of the STUN request, which will actually be equal to the server reflexive transport address allocated by the outermost NAT. Consequently, packets sent to the relayed transport address will be routed by the IP network towards the STUN server. The STUN server will receive them, rewrite the destination address to be equal to the server reflexive transport address, and forward them. They will then arrive at the NAT, where the destination address is rewritten once again, and the packet forward finally to the agent at its local address. Since the server reflexive transport addresses and relayed transport addresses and obtained from a local transport address, they are said to be derived transport addresses, since they are derived from (and ultimately map to) their associated local transport address. During the process of address gathering, the agent will obtain as many transport addresses of a given type as are needed for the media session. For example, with RTP, two transport addresses are needed for a candidate. The agent will obtain two server reflexive transport addresses (each derived from a local transport address), and they would be used to constitute a server reflexive candidate. The local transport addresses make up a local candidate, and the relayed transport addresses make up a relayed candidate. Rosenberg Expires December 28, 2006 [Page 8] Internet-Draft ICE June 2006 Server Server Reflexive Reflexive Candidate Candidate .............. .............. . . . . . +-+ +-+ . . +-+ +-+ . . | | | | . . | | | | . . +-+ +-+ . . +-+ +-+ . . ^ ^ . . ^ ^ . ....|....|.... ....|....|.... | | | | | | | | ....|....|.... ....|....|.... . | | . . | | . . +-+ +-+ . Local . +-+ +-+ . Local . | | | | . Candidate . | | | | . Candidate . +-+ +-+ . . +-+ +-+ . . | | . . | | . ....|....|.... ....|....|.... | | | | | | | | ....|....|.... ....|....|.... . V V . . V V . . +-+ +-+ . . +-+ +-+ . . | | | | . . | | | | . . +-+ +-+ . . +-+ +-+ . . . . . .............. .............. Relayed Relayed Candidate Candidate Legend ------ +-+ | | Transport Address +-+ ---> Derived From ... . . Candidate ... Figure 3 Rosenberg Expires December 28, 2006 [Page 9] Internet-Draft ICE June 2006 The relationship between these various transport addresses and candidates is shown pictorially in Figure 3. The figure shows our example agent with two local interfaces, each of which provides two transport address pairs to make up two candidates. From those two local candidates, a server reflexive and relayed candidate are derived. Once the agent has completed gathering its candidates, it assigns each a candidate identifier, called the candidate ID. The candidate ID is a random number used to uniquely identify each candidate, and is used in the connectivity checks discussed below. The components of each candidate are ordered numerically, starting at one, such that each transport address has a component ID. For example, in an RTP candidate there are two components, component ID 1 and component ID 2. Each transport address pair is therefore uniquely identified by a combination of its candidate ID and its component ID. The combination of the two is called, unsurprisingly, a transport address ID, or tid for short. The agent will place all of its candidates in an offer, using a new SDP attribute called the candidate attribute. This attribute contains the actual transport address, the candidate ID and component ID, and a q-value. The q-value is used for the agent to prioritize its candidates. An agent will typically prefer to receive media at particular candidates over other candidates, based on local policy. For example, an agent would normally prefer to receive interactive voice RTP packets at its local candidate as opposed to its relayed candidate, due to the extra latency incurred by traveling through the relay. The candidate attribute will also include an indicator of the type of candidate (server reflexive, local, relayed), and its related transport address. For server reflexive transport addresses, the related transport address is the local transport address from which it was derived. For relayed transport addresses, the related transport address is the server reflexive address towards the relay. The related transport address for reflexive candidates is used by the ICE algorithm itself, as explained below. For relayed candidates, the related transport address is not used by ICE directly; it is useful for diagnostic purposes and for Quality of Service mechanisms that require knowledge of addresses closer to the agent. Finally, the agent chooses one of its candidates for inclusion in the m and c lines (called the m/c-line collectively). Assuming that candidate is verified as functional by the ICE connectivity checks described below, this is the actual IP address and port to which media will be sent. The candidate selected for inclusion in the m/c- line of an offer (or an answer) is called the operating candidate, Rosenberg Expires December 28, 2006 [Page 10] Internet-Draft ICE June 2006 since it is the one that is the in-use destination for receipt of media traffic. Once the operating candidate is chosen, the agent sends the offer. Through the wonders or SIP or other signaling protocols, this offer is delivered to the peer, which must now select its answer. To create the answer, the agent starts by gathering addresses, in exactly the same way the offered did. It includes those as candidates in its answer, and selects one as the operating candidate, just like the offered did. It then sends the answer. Each agent then pairs up each of its candidates with the candidates of its peer. From the perspective of the offerer, the set of candidates it sent in its offer are called its native candidates, and the ones received in the answer are the remote candidates. Similarly, from the perspective of the answerer, the set of candidates it sent in its answer are the native candidates, and the ones received in the offer are the remote candidates. Both agents pair up each of their native candidates with each of the remote candidates, producing a set of candidate pairs. If there were N native candidates and M remote candidates, there will be N*M candidate pairs. Within each candidate pair, the transport addresses themselves are paired up one for one, resulting in transport address pairs as well. The transport addresses are paired up such that they have identical component IDs. Each transport address pair has an ID, called the transport address pair ID, formed by concatenating the transport address IDs of its two transport addresses. Once the pairing is done, the transport address pairs are ordered in such a way that both the offerer and answerer will end up with the same order. This ordering is done by using the q-values each side provided, along with the candidate IDs to help break ties. Then, each side begins a process known as connectivity checks. Connectivity checks are STUN transactions, using the connectivity check usage of STUN, sent from the native transport address to the remote transport address of a particular transport address pair. If an agent sends a STUN request and gets a successful response, the transport address pair is said to be Receive Valid, or Recv Valid for short, since the agent knows that its peer was able to receive a packet. If an agent receives a request and sends a response, the transport address pair is said to be Send Valid, since the agent knows that its peer was able to send it a packet. When transactions in both directions complete, the transport address pair is said to be Valid. The idea behind ICE is that if a transport address pair is valid, it means that agents were able to succesfully exchange IP packets in both directions. Consequently, any media packets, which are sent to and from exactly the same IP addresses and ports, should also work, since they don't differ in their IP addresses or ports. Rosenberg Expires December 28, 2006 [Page 11] Internet-Draft ICE June 2006 It's important to point out that, when used with ICE, an agent will always send and receive media on the same transport address. That is, if an agent includes a transport address of 192.0.2.1:2444 (meaning an IP address of 192.0.2.1 and port of 2444) in its SDP for receiving RTP packets (and also STUN connectivity check), it will not only receive STUN requests and RTP packets on this transport address, it will also send STUN requests and RTP packets from this transport address. This property, known as symmetric RTP, is essential for proper operation of ICE. Peer reflexive transport addresses, discussed further below, will generally only work when symmetric RTP is used. Symmetric RTP is also key for keeping NAT bindings alive. Since there can be quite a few transport address pairs to check, performing all of the connectivity checks in parallel can cause substantial load on the network. Instead, each agent will start at the top of the ordered list they each created, and every 50ms, begin a new connectivity check. In order to succesfully process a STUN connectivity check, an agent must be able to correlate the STUN request or response with the transport address pair whose connectivity the STUN message is meant to validate. To perform this correlation, the STUN connectivity checks contain a USERNAME attribute formed in a special way. In particular, the USERNAME contains the actual transport address pair ID, which, as described above, is formed by concatenating the transport address IDs of each of the candidates. The USERNAME is used in conjunction with an authentication and message integrity operation on the STUN message that requires a password. This password is conveyed in the offer/answer exchange, and is a random number valid only for the duration of the media session. This ensures that, if the signaling channel carrying the offer/answer exchange is secure, the agent can be certain that its STUN connectivity checks are taking place with the agent which responded to the signaling. Because each agent is receiving STUN requests on the same IP address and port that media will later be sent to, each agent is effectively acting as its own mini STUN server, implementing the connectivity check usage described in [12]. Like all STUN servers, when the agent sends a STUN response to a request, the response includes the XOR- MAPPED-ADDRESS attribute that contains the source IP address and port that the request came from. In certain deployment scenarios, and in particular where one of the agents is behind a NAT whose address and port mapping properties are address and port dependent [32], this source IP address and port may differ from the server reflexive ones allocated by the peer during the address gathering phase. This source IP address and port, conveyed in the XOR-MAPPED-ADDRESS attribute of the STUN response, therefore constitutes a new transport Rosenberg Expires December 28, 2006 [Page 12] Internet-Draft ICE June 2006 address, called a peer reflexive transport address, which can be used for communications. +-------+ | STUN | | Srvr | | | +-------+ ^ | | | | +--------------------------+ | | NAT-2| |NAT-1 | +-----------+ | | APD NAT | | +-----------+ | | | | \ | VL1 \|R1 +-------+ +-------+ | Agent | | Agent | | L | | R | | | | | +-------+ +-------+ Figure 4 Consider the example of Figure 4. The agent on the left, agent L, has a single interface and is not behind a NAT. Consequently, it ends up with a single candidate with a single transport address (normally two for RTP, but we'll consider just one for ease of explanation), transport address L1. It sends an offer to agent R, which is behind one of these Address and Port Dependent (APD) mapping NATs. Agent R has a local transport address R1, and obtains a server reflexive transport address from its STUN server, transport address NAT-1. Now, when agent R sends a connectivity check from its local transport address (R1) to L's local transport address (L1), this check will traverse the NAT. The connectivity check itself will create a new mapping in the NAT and be allocated a new binding on the NAT - NAT-2. This STUN request arrives at L, which generates a STUN response containing transport address NAT-2. Agent R, noticing that this is not the same as its other two transport addresses, treats this as a new peer reflexive transport address. This new peer reflexive transport address is paired up with the Rosenberg Expires December 28, 2006 [Page 13] Internet-Draft ICE June 2006 remote transport address containing the STUN server from which that transport address was learned (transport address L1 in the example above). This becomes a new transport address pair, and connectivity checks are run on it as well. Once all of the transport address pairs in a candidate pair have been validated, that candidate pair is ready to be used. Media starts being sent on it immediately, and the offerer will send an updated offer, now containing the agents half of the validated candidate pair in the m/c-line. This is called "promoting a candidate to operating". The updated offer only contains a single candidate attribute - the one for the operating candidate. It also contains an attribute, called the remote-candidate attribute, which tells the answerer the remote candidate in the validated candidate pair. The answerer uses this attribute, along with its own view on the states of the candidate pairs, to place a candidate in the m/c-line and populate the candidate attributes in its answer. It is important to understand that, when ICE is in use, media is not sent to a candidate without validation, even if that candidate appears in the m/c-line. This is in order to avoid denial-of-service attacks. In particular, without ICE, an offerer can send an offer to another agent, and list the IP address and port of a target in the offer. If the agent is an automata that answers a call automatically, it will do so and then proceed to send media to the target. This provides substantial packet amplifications. ICE fixes this by requiring that an agent never send media packets unless it has sent a STUN message towards the target of the RTP packets, and received a reply from that target. See Section 7.13 for details. A summary of this overall behavior is shown in the basic call flow in Figure 5. Rosenberg Expires December 28, 2006 [Page 14] Internet-Draft ICE June 2006 Agent A STUN Servers Agent B |(1) Gather Addresses | | |-------------------->| | |(2) Offer | | |------------------------------------------>| | |(3) Gather Addresses | | |<--------------------| |(4) Answer | | |<------------------------------------------| |(5) STUN Check | | |<------------------------------------------| |(6) STUN Check | | |------------------------------------------>| |(7) Media | | |<------------------------------------------| |(8) Media | | |------------------------------------------>| |(9) Offer | | |------------------------------------------>| |(10) Answer | | |<------------------------------------------| Figure 5 3. Terminology Several new terms are introduced in this specification: Agent: As defined in RFC 3264, an agent is the protocol implementation involved in the offer/answer exchange. There are two agents involved in an offer/answer exchange. Peer: From the perspective of one of the agents in a session, its peer is the other agent. Specifically, from the perspective of the offerer, the peer is the answerer. From the perspective of the answerer, the peer is the offerer. Transport Address: The combination of an IP address and port. Local Transport Address: A local transport address is a transport address that has been allocated from the operating system on the host. This includes transport addresses obtained through Virtual Private Networks (VPNs) and transport addresses obtained through Realm Specific IP (RSIP) [18] (which lives at the operating system level). Transport addresses are typically obtained by binding to an interface. Rosenberg Expires December 28, 2006 [Page 15] Internet-Draft ICE June 2006 m/c line: The media and connection lines in the SDP, which together hold the transport address used for the receipt of media. Derived Transport Address: A derived transport address is a transport address which is obtained from a local transport address. The derived transport address is related to the associated local transport address in that packets sent to the derived transport address are received on the socket bound to its associated local transport address. Derived addresses are obtained using protocols like STUN, and more generally, any UNSAF protocol [20]. Reflexive Transport Address: As defined in [12], a derived transport address learned by a client which identifies that client as seen by another host on an IP network, typically a STUN server. When there is an intervening NAT between the client and the other host, the reflexive transport address represents the binding allocated to the client on the public side of the NAT. Reflexive transport addresses are learned from the XOR-MAPPED-ADDRESS attribute in STUN Binding Responses and Allocate Responses [13]. Server Reflexive Transport Address: A server reflexive transport address is a reflexive address that is reflected off of a server, distinct from the peer, whose address is configured or learned by the client prior to an offer/answer exchange. Peer Reflexive Transport Address: A peer reflexive transport address is a reflexive address that is reflected off of the peer. Peer reflexive transport addresses are learned by connectivity checks. Relayed Transport Address: A derived transport address that terminates on a server, and is forwarded towards the client. The STUN Allocate Request, defined as part of the STUN relay usage [13] can be used to obtain a relayed transport address, for example. Associated Local Transport Address: When a peer sends a packet to a transport address, the associated local transport address is the local transport address at which those packets will actually arrive. For a local transport address, its associated local transport address is the same as the local transport address itself. For reflexive and relayed transport addresses, however, they are not the same. The associated local transport address is the one from which the reflexive or relayed transport was derived. Candidate: A sequence of transport addresses that form an atomic set for usage with a particular media session. Here, atomic means that all of transport addresses in the candidate need to work before the candidate will be used for actual media transport. In Rosenberg Expires December 28, 2006 [Page 16] Internet-Draft ICE June 2006 the case of RTP, there can be one or more transport addresses per candidate. In the most common case, there are two - one for RTP, and another for RTCP. If the agent doesn't use RTCP, there would be just one. If Generic Forward Error Correction (FEC) [16] is in use, there may be more than two. The transport addresses that compose a candidate are all of the same type - local, server reflexive, peer reflexive or relayed. Local Candidate: A candidate whose transport addresses are local transport addresses. Server Reflexive Candidate: A candidate whose transport addresses are server reflexive transport addresses. Peer Reflexive Candidate: A candidate whose transport addresses are peer reflexive transport addresses. Relayed Candidate: A candidate whose transport addresses are relayed transport addresses. Generating Candidate: The candidate from which a peer reflexive candidate is derived. Operating Candidate: The candidate that is in use for exchange of media. This is the one that an agent places in the m/c line of an offer or answer. Candidate ID: An identifier for a candidate. Component: When a media stream, and as a consequence, its candidate, require several IP addresses and ports to work atomically, each of the constituent IP addresses and ports represents a component of that media stream. For example, RTP-based media streams typically have two components - one for RTP, and one for RTCP. Component ID: An integer, starting with one within each candidate and incrementing by one for each component, which identifies the component. Transport Address ID (tid): An identifier for a transport address, formed by concatenating the candidate ID with the component ID, separated by a "colon". Candidate Pair: The combination of a candidate from one agent along with a candidate from its peer. Rosenberg Expires December 28, 2006 [Page 17] Internet-Draft ICE June 2006 Native Candidate: From the perspective of each agent, the candidate in a candidate pair which represents a set of addresses obtained by that agent. Remote Candidate: From the perspective of each agent, the candidate in a candidate pair which represents the set of addresses obtained by that agents peer. Transport Address Pair: The combination of the transport address for one component of a candidate with the transport address of the same component for the matching candidate in a candidate pair. Transport Address Pair ID: An identifier for a transport address pair. Formed by concatenating the native transport address ID with the remote transport address ID, separated by a "colon". Matching Transport Address Pair: When a STUN Binding Request is received on a local transport address, the matching transport address pair is the transport address pair whose connectivity is being checked by that Binding Request. Candidate Pair Priority Ordering: An ordering of candidate pairs based on a combination of the qvalues of each candidate and the candidate IDs of each candidate. Candidate Pair Check Ordering: An ordering of candidate pairs that is similar to the candidate pair priority ordering, except that the operating candidate appears at the top of the list, regardless of its priority. Transport Address Pair Check Ordering: An ordering of transport address pairs that determines the sequence of connectivity checks performed for the pairs. Transport Address Pair Count: The number of transport address pairs in a candidate pair. This is equal to the minimum of the number of transport addresses in the native candidate and the number of transport addresses in the remote candidate. 4. Sending the Initial Offer When an agent wishes to begin a session by sending an initial offer, it starts by gathering transport addresses, as described in Section 7.1. This will produce a set of candidates, including local ones, server reflexive ones, and relayed ones. This process of gathering candidates can actually happen at any time Rosenberg Expires December 28, 2006 [Page 18] Internet-Draft ICE June 2006 before sending the initial offer. A agent can pre-gather transport addresses, using a user interface cue (such as picking up the phone, or entry into an address book) as a hint that communications is imminent. Doing so eliminates any additional perceivable call setup delays due to address gathering. When it comes time to offer communications, the agent determines a priority for each candidate and identifies the operating candidate that will be used for receipt of media, as described in Section 7.2. The next step is to construct the offer message. For each media stream, it places its candidates into a=candidate attributes in the offer and puts its operating candidate into the m/c line. The process for doing this is described in Section 7.3. The offer is then sent. 5. Receipt of the Offer and Generation of the Answer Upon receipt of the offer message, the agent checks if the offer contains any a=candidate attributes. If the offer does, the offerer supports ICE. In that case, it starts gathering candidates, as described in Section 7.1, and prioritizes them as described in Section 7.2. This processing is done immediately on receipt of the offer, to prepare for the case where the user should accept the call, or early media needs to be generated. By gathering candidates (and performing connectivity checks) while the user is being alerted to the request for communications, session establishment delays are reduced. The agent then constructs its answer, encoding its candidates into a=candidate attributes and including the operating one in the m/c- line, as described in Section 7.3. The agent then forms candidate pairs as described in Section 7.4. These are ordered as described in Section 7.5. The agent then begins connectivity checks, as described in Section 7.6. It follows the logic in Section 7.10 on receipt of Binding Requests and responses to learn new candidates from the checks themselves. Transmission of media is performed according to the procedures in Section 7.13. 6. Processing the Answer There are two possible cases for processing of the answer. If the answerer did not support ICE, the answer will not contain any a=candidate attributes. As a result, the offerer knows that it Rosenberg Expires December 28, 2006 [Page 19] Internet-Draft ICE June 2006 cannot perform its connectivity checks. In this case, it proceeds with normal media processing as if ICE was not in use. However, it SHOULD send media with the symmetric property described in Section 7.13, and follow the keepalive procedures in Section 7.12. If the answer contains candidates, it implies that the answerer supports ICE. The offerer then forms candidate pairs as described in Section 7.4. These are ordered as described in Section 7.5. The agent then begins connectivity checks, as described in Section 7.6. It follows the logic in Section 7.10 on receipt of Binding Requests and responses to learn new candidates from the checks themselves. Transmission of media is performed according to the procedures in Section 7.13. 7. Common Procedures This section discusses procedures that are common between offerer and answerer. 7.1. Gathering Candidates An agent gathers candidates when it believes that communications is imminent. For offerers, this occurs before sending an offer (Section 4). For answerers, it occurs before sending an answer (Section 5). Each candidate has one or more components, each of which is associated with a sequence number, starting at 1 for the first component of each candidate, and incrementing by 1 for each additional component within that candidate. These components represent a set of transport addresses for which connectivity must be validated. For a particular media stream, all of the candidates SHOULD have the same number of components. The number of components that are needed are a function of the type of media stream. All of the components in a candidate MUST be of the same type - server reflexive, relayed, or local, and obtained from the same server in the case of server reflexive or relayed candidates. For local candidates, each component MUST be obtained from the same interface. For server reflexive and relayed candidates, each component MUST be derived from a component with the same component ID, all of which come from a single local candidate. For traditional RTP-based media streams, it is RECOMMENDED that there be two components per candidate - one for RTP and one for RTCP. The component with the component ID of 1 MUST be RTP, and the one with component ID of 2 MUST be RTCP. If an agent doesn't implement RTCP, Rosenberg Expires December 28, 2006 [Page 20] Internet-Draft ICE June 2006 it SHOULD have a single component for the RTP stream (which will have a component ID of 1 by definition). Each component of a candidate has a single transport address. The first step is to gather local candidates. Local candidates are obtained by binding to ports (typically ephemeral) on an interface (physical or virtual, including VPN interfaces) on the host. The process for gathering local candidates depends on the transport protocol. Procedures are specified here for UDP. Extensions to ICE that define procedures for other transport protocols MUST specify how local transport addresses are gathered. For each UDP media stream the agent wishes to use, the agent SHOULD obtain a set of candidates (one for each interface) by binding to N UDP ports on each interface, where N is the number of components needed for the candidate. For RTP, N is typically two. If a host has K local interfaces, this will result in K candidates for each UDP stream, requiring K*N local transport addresses. Once the agent has obtained local candidates, it obtains candidates with derived transport addresses. The process for gathering derived candidates depends on the transport protocol. Procedures are specified here for UDP. Extensions to ICE that define procedures for other transport protocols MUST specify how derived transport addresses are gathered. Agents which serve end users directly, such as softphones, hardphones, terminal adapters and so on, MUST implement the STUN Binding Discovery usage and SHOULD use it to obtain server reflexive candidates. These devices SHOULD implement the STUN Relay usage, and SHOULD use its Allocate request to obtain both server reflexive and relayed candidates. They MAY implement and MAY use other protocols that provide derived transport addresses, such as TEREDO [29]. The requirement to use the relay Usage is at SHOULD strength to allow for provider variation. If it is not to be used, it is RECOMMENDED that it be implemented and just disabled through configuration, so that it can re-enabled through configuration if conditions change in the future. Agents which represent network servers under the control of a service provider, such as gateways to the telephone network, media servers, or conferencing servers that are targeted at deployment only in networks with public IP addresses MAY use the STUN Binding Discovery usage and relay usage, or other similar protocols to obtain candidates. Rosenberg Expires December 28, 2006 [Page 21] Internet-Draft ICE June 2006 Why would these types of endpoints even bother to implement ICE? The answer is that such an implementation greatly facilitates NAT traversal for clients that connect to it. Consider a PC softphone behind a NAT whose mapping policy is address and port dependent. The softphone initiates a call through a gateway that implements ICE. The gateway doesn't obtain any server reflexive or relayed transport addresses, but it implements ICE, and consequently, is prepared to receive STUN connectivity checks on its local transport addresses. The softphone will send a STUN connectivity to check to that local transport address, causing the NAT to allocate a new binding for the softphone. The connectivity check will inform the softphone of this address, allowing it to be used by the gateway as a peer reflexive remote candidate. This allows direct media transmission between the gateway and softphone, without the need for relays. Furthermore, implementation of the STUN connectivity checks allows for NAT bindings along the way to be kept open. ICE also provides numerous security properties that are independent of NAT traversal, and would benefit any multimedia endpoint. See Section 13 for a discussion on these benefits. Obtaining derived candidates requires transmission of packets which have the effect of creating bindings on NAT devices between the client and the STUN servers. Experience has shown that many NAT devices have upper limits on the rate at which they will create new bindings. Furthermore, transmission of these packets on the network makes use of bandwidth and needs to be rate limited by the agent. As a consequence, a client SHOULD pace its STUN transactions, such that the start of each new transaction occurs at least Ta seconds after the start of the previous transaction. The value of Ta SHOULD be configurable, and SHOULD have a default of 50ms. Note that this pacing applies only to the start of a new transaction; pacing of retransmissions within a STUN transaction is governed by the retransmission rules defined by STUN. Derived candidates can be obtained from the STUN Binding Discovery usage or the STUN Relay usage. The latter is preferred since it will provide the client with both a server reflexive and a relayed transport address with a single transaction. It is possible that some STUN servers will only support the Relay usage or only the Binding Discovery usage, in which case a client might be configured with different servers depending on the usage. To obtain both server reflexive and relayed candidates using the STUN Relay Usage, the client takes a local UDP candidate, and for each configured STUN server, produces both candidates. It is anticipated that clients may have a multiplicity of STUN servers configured or discovered in network environments where there are multiple layers of NAT, and where that layering is known to the provider of the client. Rosenberg Expires December 28, 2006 [Page 22] Internet-Draft ICE June 2006 To obtain these candidates, for each configured STUN server, the client initiates an Allocate Request transaction using the procedures of Section 8.1.2 of [13] from each transport address of a particular local candidate. The Allocate Response will provide the client with its server reflexive transport address (obtained from the XOR-MAPPED- ADDRESS attribute) and its relayed transport address in the RELAY- ADDRESS attribute. Indeed, these two transport addresses are related to each other. The relay will forward packets received on the relayed transport address towards that server reflexive transport address. As such, the server reflexive transport address is said to be the associated server reflexive transport address for that relayed address. Once the Allocate requests have given a client a relayed transport address for all transport addresses in a relayed candidate, there is no reason for a client to obtain further relayed candidates through the same STUN server. Thus, if there are other local candidates from which the client has not yet obtained relayed transport address, the client SHOULD NOT bother to obtain them. Instead, it SHOULD use the STUN Binding Discovery usage and obtain just server reflexive addresses from that STUN server. The order in which local candidates are tried against the STUN server to obtain relayed candidates is a matter of local policy. To obtain server reflexive candidates using the STUN Binding Discovery usage, the client takes a local UDP candidate, and for each configured STUN server, produces a server reflexive candidate. To produce the server reflexive candidate from the local candidate, it follows the procedures of Section 12.2 of [12] for each local transport address in the local candidate. The Binding Response will provide the client with its server reflexive transport address. If the client had K local candidates, this will produce S*K server reflexive candidates, where S is the number of STUN servers. Since a client will pace its STUN transactions (both Binding and Allocate requests) at a total rate of one new transaction every Ta seconds, it will take a certain amount of time to complete the address gathering phase. It is RECOMMENDED that implementations have a configurable upper bound on the total amount of time allotted to address gathering. Any transactions not completed at that point SHOULD be abandoned, but MAY continue and be used in an updated offer once they complete. A default value of 5s is RECOMMENDED. Since the total number of allocations that could be done (based on the number of STUN servers and local interfaces) might exceed this value, clients SHOULD prioritize their local candidates and STUN servers, performing transactions from the highest priority local candidates to the highest priority STUN servers first. A STUN server would typically be higher priority if it supports the STUN Relay Usage, since such a server provides two transport addresses with one transaction. Rosenberg Expires December 28, 2006 [Page 23] Internet-Draft ICE June 2006 Once the allocations are complete, any redundant candidates are discarded. Candidate A is redundant with candidate B if the transport addresses of each component match, and each component of their associated local candidates match. For example, consider a set of candidates with a single component. One candidate is a local candidate, and its one component has a transport address of 10.0.1.1: 4458. A reflexive transport address is derived from this local transport address, producing a 10.0.1.1:4458. These two candidates are identical, and also have identical associated local transport addresses, so they are redundant. +----------+ | STUN Srvr| +----------+ | | ----- // \\ | | | B:net10 | | | \\ // ----- | | +----------+ | NAT | +----------+ | | ----- // \\ | A | |192.168/16 | | | \\ // ----- | | |192.168.1.1 ----- +----------+ // \\ +----------+ | | | | | | | Offerer |---------| C:net10 |---------| Answerer | | |10.0.1.1 | | 10.0.1.2 | | +----------+ \\ // +----------+ ----- Rosenberg Expires December 28, 2006 [Page 24] Internet-Draft ICE June 2006 Figure 6 Consider the more complicated case of Figure 6. In this case, the offerer is multi-homed. It has one interface, 10.0.1.1, on network C, which is a net 10 private network. The Answerer is on this same network. The offerer is also connected to network A, which is 192.168/16. The offerer has an interface of 192.168.1.1 on this network. There is a NAT on this network, natting into network B, which is another net10 private network, but not connected to network C. There is a STUN server on network B. The offerer obtains local transport address on its interface on network C (10.0.1.1:2498) and a local transport address on its interface on network A (192.168.1.1:3344). It performs a STUN query to its configured STUN server from 192.168.1.1:3344. This query passes through the NAT, which happens to assign the binding 10.0.1.1: 2498. The STUN server reflects this in the STUN Binding Response. Now, the offerer has obtained a candidate with a transport address it already has (10.0.1.1:2498), but from a new interface. It therefore keeps it. When it performs its connectivity checks, the offerer will end up sending packets from both interfaces, and those sent from its interface on network C will succeed. 7.2. Prioritizing the Candidates and Choosing an Operating One The prioritization process takes the set of candidates for a particular media stream and associates each with a priority. This priority reflects the desire that the agent has to receive media at that candidate, and is assigned as a value from 0 to 1 (1 being most preferred). Priorities are a property of a candidate, and thus shared across all components of a candidate. Priorities are ordinal, so that their significance is only meaningful relative to other candidates from that agent for a particular media stream. Candidates MAY have the same priority. However, it is RECOMMENDED that each candidate have a distinct priority. Doing so improves the efficiency of ICE. This specification makes no normative statements on how the prioritization is done. However, some useful guidelines are suggested on how such a prioritization can be determined. One criteria for choosing one candidate over another is whether or not that candidate involves the use of an intermediary. That is, if media is sent to that candidate, will the media first transit an intermediate server before being received. Relayed candidates are clearly one type of candidates that involve an intermediary. Another are local candidates associated with a VPN server. When media is transited through an intermediary, it can increase the latency Rosenberg Expires December 28, 2006 [Page 25] Internet-Draft ICE June 2006 between transmission and reception. It can increase the packet losses, because of the additional router hops that may be taken. It may increase the cost of providing service, since media will be routed in and right back out of an intermediary run by the provider. If these concerns are important, candidates with this property can be listed with lower priority. Another criteria for choosing one candidate over another is IP address family. ICE works with both IPv4 and IPv6. It therefore provides a transition mechanism that allows dual-stack hosts to prefer connectivity over IPv6, but to fall back to IPv4 in case the v6 networks are disconnected (due, for example, to a failure in a 6to4 relay) [23]. It can also help with hosts that have both a native IPv6 address and a 6to4 address. In such a case, higher priority could be afforded to the native v6 address, followed by the 6to4 address, followed by a native v4 address. This allows a site to obtain and begin using native v6 addresses immediately, yet still fallback to 6to4 addresses when communicating with agents in other sites that do not yet have native v6 connectivity. Another criteria for choosing one candidate over another is security. If a user is a telecommuter, and therefore connected to their corporate network and a local home network, they may prefer their voice traffic to be routed over the VPN in order to keep it on the corporate network when communicating within the enterprise, but use the local network when communicating with users outside of the enterprise. Another criteria for choosing one address over another is topological awareness. This is most useful for candidates that make use of relays. In those cases, if an agent has preconfigured or dynamically discovered knowledge of the topological proximity of the relays to itself, it can use that to select closer relays with higher priority. There may be transport-specific reasons for preferring one candidate over another. In such a case, specifications defining usage of ICE with other transport protocols SHOULD document such considerations. Once the candidates have been prioritized, one may be selected as the operating one. This is the candidate that will be used for actual exchange of media if and when its validated, until a higher priority candidate is validated. The operating candidate will also be used to receive media from ICE-unaware peers. As such, it is RECOMMENDED that one be chosen based on the likelihood of that candidate to work with the peer that is being contacted. Unfortunately, it is difficult to ascertain which candidate that might be. As an example, consider a user within an enterprise. To reach non-ICE capable agents within the enterprise, a local candidate has to be used, since Rosenberg Expires December 28, 2006 [Page 26] Internet-Draft ICE June 2006 the enterprise policies may prevent communication between elements using a relay on the public network. However, when communicating to peers outside of the enterprise, a relayed candidate from a publically accessible STUN server is needed. Indeed, the difficulty in picking just one address that will work is the whole problem that motivated the development of this specification in the first place. As such, it is RECOMMENDED that the operating candidate be a relayed candidate from a STUN server providing public IP addresses in response to an Allocate request. Furthermore, ICE is only truly effective when it is supported on both sides of the session. It is therefore most prudent to deploy it to close-knit communities as a whole, rather than piecemeal. In the example above, this would mean that ICE would ideally be deployed completely within the enterprise, rather than just to parts of it. An additional consideration for selection of the operating candidate is the switching of media stream destinations between the initial offer and the subsequent offer. The operating candidate pair in the initial offer is validated first, and if that validation succeeds, media will immediately begin to flow between the pair. When the ICE checks complete and yield a higher priority candidate pair, media will begin to flow to it (there will also be an updated offer/answer exchange that changes the operating candidate). This will result in a change in the destination of the media packets. This may also cause a different path for the media packets. That path might have different delay and jitter characteristics. As a consequence, the jitter buffers may see a glitch, causing possible media artifacts. If these issues are a concern, the initial offer MAY omit an operating candidate. This is done by including an m/c-line with an a=inactive attribute. In such a case, an updated offer will need to be sent immediately when communicating with an ICE-unaware agent, setting an operating candidate. There may be transport-specific reasons for selection of an operating candidate. In such a case, specifications defining usage of ICE with other transport protocols SHOULD document such considerations. 7.3. Encoding Candidates into SDP For each candidate for a media stream, the agent includes a series of a=candidate attributes as media-level attributes, one for each component in the candidate. Each candidate has a unique identifier, called the candidate ID. The candidate ID MUST be chosen randomly and contain at least 24 bits of randomness. This means that a candidate ID must be at least 4 characters long, since each character in the base64 alphabet used for candidate IDs contains at most 6 bits of randomness. A candidate ID MAY be longer than 4 characters, and Rosenberg Expires December 28, 2006 [Page 27] Internet-Draft ICE June 2006 different candidate IDs MAY have different lengths. It is chosen only when the candidate is placed into the SDP for the first time; subsequent offers or answers within the same session containing that same candidate MUST use the same candidate ID used previously. 24 bits is sufficient because the candidate ID is not providing security (the much more random password is). Its sole purpose is to make it highly unlikely that both the offerer and answerer select the same value for a candidate for the same media stream. Different values for the candidate ID are required to break ties in the procedure that is used to order the candidate pairs. Each component of the candidate has an identifier, called the component ID. The component ID is a sequence number. For each candidate, it starts at one, and increments by one for each component. As discussed below, ICE will perform connectivity checks such that, between a pair of candidates, checks only occur between transport addresses with the same component ID. As a consequence, if one candidate has three components, and it is paired with a candidate that has two, there will only be two transport address pairs and two connectivity checks. ICE will work without a standardized mapping between the components of a media stream and the numerical value of the component ID. This allows ICE to be used with media streams with multiple components without development of standards around such a mapping. However, a specific mapping has been defined in this specification for RTP - component ID 1 corresponds to RTP, and component ID of 2 corresponds to RTCP. Like the candidate ID, the component ID is assigned at the time the candidate is first placed into the SDP; subsequent offers or answers within the same session containing that same candidate MUST use the same component ID used previously. The transport, addr and port of the a=candidate attribute (all defined in Section 12) are set to the transport protocol, unicast address and port of the tranport address. A Fully Qualified Domain Name (FQDN) for a host MAY be used in place of a unicast address. In that case, when receiving an offer or answer containing an FQDN in an a=candidate attribute, the FQDN is looked up in the DNS using an A or AAAA record, and the resulting IP address is used for the remainder of ICE processing. The qvalue is set to the priority of the candidate, and MUST be the same for all components of the candidate. The agent MUST include a type for the transport address by populating the candidate-types production with the appropriate value - "local" for local transport addresses, "srflx" for server reflexive candidates, and "relay" for relayed candidates. If the transport address is server reflexive, the agent MUST include the rel-addr and rel-port productions containing the associated local transport Rosenberg Expires December 28, 2006 [Page 28] Internet-Draft ICE June 2006 address for that server reflexive transport address. There are environments in which the policy of an agent is such that it never provides local transport addresses in its offers or answers, for fear of revealing internal topology to external hosts. In such cases, an agent MAY include a random transport address instead, as long as it is the same transport address for all server reflexive candidates derived from the same actual local transport address. This is because the transport address in the rel-addr and rel-port production are used by the ICE algorithm itself for correlation purposes. If the tranport address is relayed, the agent SHOULD include the rel- addr and rel-port productions, containing the associated server reflexive transport address. When a relayed address is obtained from a STUN relay, the associated server reflexive transport address is the value from the XOR-MAPPED-ADDRESS that was returned in the same STUN response which provided the relayed address to the agent. Though not used directly with ICE, the rel-addr and rel-port attributes are essential for proper functioning of QoS mechanisms, such as those defined by 3gpp and Packetcable. The rel-addr and rel-port production MUST NOT be present for a local transport address. All of the candidates for a media stream share a password that is used for securing the STUN connectivity checks. The password will be used to process the MESSAGE-INTEGRITY attribute for STUN requests received by the agent. The password for candidates for different media streams MAY be the same, or MAY be different. This password MUST be chosen randomly with 128 bits of randomness (though it can be longer than 128 bits). This password is contained in the a=ice-pwd attribute, present as a session or media level attribute. Since each character of the ice-pwd attribute can represent six bits of randomness, the ice-pwd attribute will always be at least 22 characters long. New passwords MUST be selected for each new session, even if the transport address from a previous session is being recycled. The combination of candidate ID and component ID uniquely identify each transport address. As a consequence, each transport address has a unique identifier, called the transport address ID. The transport address ID is formed by concatenating the candidate ID with the component ID, separated by the colon (":"). The transport address ID is not explicitly encoded in the SDP; it is derived from the candidate ID and component ID, which are present in the SDP. The usage of the colon as a separator allows the candidate ID and component ID to be extracted from the transport address ID, since the colon is not a valid character for the candidate ID. Rosenberg Expires December 28, 2006 [Page 29] Internet-Draft ICE June 2006 The transport address ID gets combined, through further concatenation, with the transport address ID of a transport address from the remote candidate (separated again by another colon) to form the username that is placed in the STUN checks between the peers. This allows the STUN message to uniquely identify the pairing whose connectivity it is checking. The transport address ID is needed as a unique identifier because the IP address within the candidate fails to provide that uniqueness as a consequence of NAT. Consider agents A, B, and C. A and B are within private enterprise 1, which is using 10.0.0.0/8. C is within private enterprise 2, which is also using 10.0.0.0/8. As it turns out, B and C both have IP address 10.0.1.1. A sends an offer to C. C, in its answer, provides A with its transport addresses. In this case, that is 10.0.1.1:8866 and 10.0.1.1:8877. As it turns out, B is in a session at that same time, and is also using 10.0.1.1:8866 and 10.0.1.1:8877. This means that B is prepared to accept STUN messages on those ports, just as C is. A will send a STUN request to 10.0.1.1:8866 and and another to 10.0.1.1:8877. However, these do not go to C as expected. Instead, they go to B. If B just replied to them, A would believe it has connectivity to C, when in fact it has connectivity to a completely different user, B. To fix this, the transport address ID takes on the role of a unique identifier. C provides A with an identifier for its transport address, and A provides one to C. A concatenates these two identifiers (with a colon between) and uses the result as the username in its STUN query to 10.0.1.1:8866. This STUN query arrives at B. However, the username is unknown to B, and so the request is rejected. A treats the rejected STUN request as if there were no connectivity to C (which is actually true). Therefore, the error is avoided. An unfortunate consequence of the non-uniqueness of IP addresses is that, in the above example, B might not even be an ICE agent. It could be any host, and the port to which the STUN packet is directed could be any ephemeral port on that host. If there is an application listening on this socket for packets, and it is not prepared to handle malformed packets for whatever protocol is in use, the operation of that application could be affected. Fortunately, since the ports exchanged in SDP are ephemeral and usually drawn from the dynamic or registered range, the odds are good that the port is not used to run a server on host B, but rather is the agent side of some protocol. This decreases the probability of hitting a port in-use, due to the transient nature of port usage in this range. However, the possibility of a problem does exist, and network deployers should be prepared for it. Note that this is not a problem specific to ICE; stray packets can arrive at a port at any time for any type of protocol, especially ones on the public Internet. As such, this requirement is just restating a general design guideline for Internet Rosenberg Expires December 28, 2006 [Page 30] Internet-Draft ICE June 2006 applications - be prepared for unknown packets on any port. The operating candidate, if there is one, is placed into the m/c lines of the SDP. For RTP streams, this is done by placing the RTP address and port into the c and m lines in the SDP respectively. If the agent is utilizing RTCP, it MUST encode its address and port using the a=rtcp attribute as defined in RFC 3605 [1]. If RTCP is not in use, the agent MUST signal that using b=RS:0 and b=RR:0 as defined in RFC 3556 [6]. If there is no operating candidate, the agent MUST include an a=inactive attribute. The media address and port in the m/c-line is inconsequential, since it won't be used. Encoding of candidates may involve transport protocol specific considerations. There are none for UDP. However, extensions that define usage of ICE with other transport protocols SHOULD specify any special encoding considerations. Once an offer or answer are sent, an agent MUST be prepared to receive both STUN and media packets on each candidate. As discussed in Section 7.13, media packets can be sent to a candidate prior to its promotion to operating. 7.4. Forming Candidate Pairs Once the offer/answer exchange has completed, both agents will have a set of candidates for each media stream. Each agent forms a set of candidate pairs for each media stream by combining each of its candidates with each of the candidates of its peer. Candidates can be paired up only if their transport protocols are identical. Each candidate has a number of components, each of which has a transport address. Within a candidate pair, the components themselves are paired up such that transport addresses with the same component ID are combined to form a transport address pair. If one candidate has more components than the other, those extra components will not be part of a transport address pair, won't be validated, and will effectively be treated as if they weren't included in the candidate pair in the first place. For example, if an offer/answer exchange took place for a session comprised of an audio and a video stream, and each agent had two candidates per media stream, there would be 8 candidate pairs, 4 for audio and 4 for video. For each of the 8 candidate pairs, there would be two transport address pairs - one for RTP, and one for RTCP. The relationship between a candidate, candidate pair, transport address, transport address pair and component are shown in Figure 7. Rosenberg Expires December 28, 2006 [Page 31] Internet-Draft ICE June 2006 This figure shows the relationships as seen by the agent that owns the candidate with candidate ID "L". This candidate has two components with transport addresses A and B respectively. This candidate is called the native candidate, since it is the one owned by the agent in question. The candidate owned by its peer is called the remote candidate. As the figure shows, there is a single candidate pair, and two components in each candidate. The native candidate has a candidate ID of "L", and the remote candidate has a candidate ID of "R". Since the two component IDs are 1 and 2, candidate "L" has two transport addresses with transport address IDs of "L:1" and "L:2" respectively. Similarly, candidate "R" has two transport addresses with transport address IDs of "R:1" and "R:2" respectively. Note that these candidate IDs are not actually legal since they are not sufficiently random. However, we use "L" and "R" to keep the figures readable. Furthermore, each transport address pair is associated with an ID, the transport address pair ID. This ID is equal to the concatenation of the transport address ID of the native transport address with the transport address ID of the remote transport address, separated by a colon. This means that the identifiers are seen differenly for each agent. For the agent that owns candidate "L", there are two transport address pairs. One contains transport address "L:1" and "R:1", with a transport address pair ID of "L:1:R:1". The other contains transport address "L:2" and "R:2", with a transport address pair ID of "L:2:R:2". For the agent that owns candidate "R", the identifiers for these two transport address pairs are reversed; it would be "R:1:L:1" for the first one and "R:2:L:2" for the second. Rosenberg Expires December 28, 2006 [Page 32] Internet-Draft ICE June 2006 ............................................... . . . . . ............. ............. . . . tid=L:1 . . tid=R:1 . . . . -- . . -- . . component component. . | A|------------------------| C| . . id=1 id=1 . . -- . Transport . -- . . . . . Address . . . . . . Pair . . . . . . id=L:1:R:1 . . . . . . . . . . . . . . . . . tid=L:2 . . tid=R:2 . . component . . -- . . -- . . id=2 . . | B|------------------------| D| component . . -- . Transport . -- . . id=2 . . . Address . . . . . . Pair . . . . . . id=L:2:R:2 . . . . . . . . . . ............. ............. . . Native Remote . . Candidate Candidate . . id=L id=R . . . . . ............................................... Candidate Pair Figure 7 If a candidate pair was created as a consequence of an offer generated by an agent, then that agent is said to be the offerer of that candidate pair and all of its transport address pairs. Similarly, the other agent is said to be the answerer of that candidate pair and all of its transport address pairs. As a consequence, each agent has a particular role, either offerer or answerer, for each transport address pair. This role is important; when a candidate pair is to be promoted to operating, the offerer is the one which performs the updated offer. 7.5. Ordering the Candidate Pairs Recall that when each candidate is encoded into SDP, it contains a qvalue between 1 and 0, with 1 being the highest priority. Peer Rosenberg Expires December 28, 2006 [Page 33] Internet-Draft ICE June 2006 reflexive candidates, learned through the procedures described in Section 7.10 also have a priority between 0 and 1. For each media stream, the native candidates are ordered based on their qvalues, with higher q-values coming first. Amongst candidates with the same qvalue, they are ordered based on candidate ID, using reverse ASCII sort order. For example, the candidate with candidate ID "lagDx" sorts before the candidate with ID "bad79", and both of those follow the candidate with ID "m8zz". The usage of a reverse ASCII sort order is important; as discussed in Section 13, it allows peer-derived candidates to be preferred over native ones. The result of these ordering rules will be an ordered list of candidates. The first candidate in this list is given a sequence number of 1, the next is given a sequence number of 2, and so on. This same procedure is done for the remote candidates. The result is that each candidate pair has two sequence numbers, one for the native candidate, and one for the remote candidate. First, all of the candidate pairs for whom the smaller of the two sequence numbers equals 1 are taken first. Then, all of those for whom the smaller of the two sequence numbers equals 2 are taken next, and so on. Amongst those pairs that share the same value for their smaller sequence number, they are ordered by the larger of their two sequence numbers (smallest first). Amongst those pairs that share the same value for their smaller sequence number and the same value for their larger sequence number, the larger of the two candidate IDs in each pair are selected, and the pairs are ordered in reverse ASCII order of the candidate ID, largest first. The resulting ordering of candidate pairs is called the candidate pair priority ordered list. As an example, consider two agents, A and B. One offers two candidates for a media stream with candidate IDs of "g9g9" and "8888", with q-values of 1.0 and 0.8 respectively. The other answers with three candidates with candidate IDs of "h8h8", "6565" and "klkl", with q-values of 0.3, 0.2 and 0.1 respectively. The following table shows the rank ordering of the six candidate pairs. The column labeled "Max SN" is the larger of the two sequence numbers in the candidate pair, and "Min SN" is the minimum. The column labeled "Max Cand. ID" is the value of the larger of the two candidate IDs in the candidate pair. Rosenberg Expires December 28, 2006 [Page 34] Internet-Draft ICE June 2006 Order A A A B B B Max Cand. Cand. Cand. Cand. Cand. Cand. Max Min Cand. ID q-value SN ID q-value SN SN SN ID --------------------------------------------------------------------- 1 g9g9 1.0 1 h8h8 0.3 1 1 1 h8h8 2 8888 0.8 2 h8h8 0.3 1 2 1 h8h8 3 g9g9 1.0 1 6565 0.2 2 2 1 g9g9 4 g9g9 1.0 1 klkl 0.1 3 3 1 klkl 5 8888 0.8 2 6565 0.2 2 2 2 8888 6 8888 0.8 2 klkl 0.1 3 3 2 klkl The candidate pair priority ordered list is then used to obtain an ordered list of transport address pairs, on which the agent will, in order, attempt to send STUN connectivity checks. This list, called the transport address pair check ordered list, is very similar to the candidate pair priority ordered list, but differs in two important respects. Firstly, the candidate pairs matching the operating candidate pair (there can actually be more than one) get promoted to the top of the list. This allows the operating candidate pair to be validated first. Secondly, many of the checks would be redundant, and a filtering algorithm is used to eliminate these redundant checks. Ordering of candidates may involve transport protocol specific considerations. There are none for UDP. However, extensions that define usage of ICE with other transport protocols SHOULD specify any special ordering considerations. To form the transport address pair check ordered list, the candidate list is first modified by taking the candidate pairs corresponding to the operating candidate pair, and promoting them to the top of the list. A candidate pair matches the operating candidate pair when its native and remote transport address match the native and remote transport addresses in the m/c-line, respectively. In unusual circumstances, there may be more than one such candidate pair. In such a case, they should be promoted such that the higher priority candidate pairs appear first. In addition, it is possible that none of the candidate pairs match the operating candidate pair. In that case, no candidate pairs are promoted. Within each candidate pair there will be a set of transport address pairs, one for each component ID. Those pairs are ordered by component ID. The result is an absolute ordering of all transport address pairs for a media stream, sorted first by the order of their candidate pairs (with the exception of the operating candidate), followed by the order of their component IDs. This ordering is used as the start of the transport address pair check ordering. Rosenberg Expires December 28, 2006 [Page 35] Internet-Draft ICE June 2006 The next step is to remove redundant transport addresses. Starting at the top of the list, the agent moves down from one transport address pair to the next. If a transport address pair under consideration has the same remote transport address as a previous pair, based on transport address pair ID comparisons, and the native transport address from that previous pair has the same origination transport address as the one under consideration (based on IP address and port comparison), the one under consideration is removed from the list. The origination transport address is the address that the agent would send from in order to emit a packet with that native transport address as a source transport address. For a local transport address, the origination transport address is equal to that local transport address. For a server reflexive transport address, the origination transport address is equal to the local transport address from which it was derived. For relayed addresses, packets are emitted by explicitly sending them through the relay. Consequently, the origination transport address is equal to the relayed address. After the agent has gone through the entire list, the result is the transport address pair check ordered list. The pairs that get removed are redundant since the agent would send a STUN connectivity check using the same source and destination addresses as a previous check. Consequently, the connectivity check will provide no information to the remote agent except for the transport address pair ID its associated with. These turn out to be unnecesary due to the STUN processing rules outlined below. 7.6. Performing the Connectivity Checks Connectivity checks are a STUN usage defined in [12]. They are performed by sending peer-to-peer STUN Binding Requests. These checks result in a transport address pair progressing through a state machine that captures the progress of the connectivity checks. The specific state machine and the procedures for the connectivity checks are specific to the transport protocol. This specification defines rules for UDP. The state machine processing described in this section MUST be followed by agents. Extensions to ICE that describe other transport protocols SHOULD describe the state machine and the procedures for connectivity checks. The set of states for a transport address pair visited by the offerer and answerer are depicted graphically in Figure 9. Note that this state machine exists for all transport address pairs, including ones pruned from the transport address pair check ordered list. Rosenberg Expires December 28, 2006 [Page 36] Internet-Draft ICE June 2006 | |Start | | V +------------+ +-----------------| | | | | | +----| Waiting |----------------+ | | | | | | | | | | | Miss | +------------+ | | ---- | | | Match Res| - | | Selected | Match Req ---------| | | --------. | ------- - | | | Send Req Match Req | Send Req | | V --------- | | Match Res | +------------+ Re-Xmit | | --------- | | | Req | | - | | | | | +------c----| Testing |-----------+ | | | | | | | | | | | | | | | | | | +------------+ | | | | | | | | | | | | Error or | | | | | | Miss | | Timer Tr | | | | ----- | | -------- V V | V - V V Send Req +------------+ | +------------+ +------------+ +-----| | +--->| | | | | | Recv- | | | | Send- | | | Valid |------->| Invalid |<-------| Valid | | | | | | | | +---->| | Error, | | Error, | | +------------+ Miss +------------+ Miss +------------+ | ----- ^ ----- | | - | Error, - | | | Miss | | | ----- | | | - | | +------------+ | | | | | | | | | +-------------->| Valid |<-------------+ Match Req | | Match Res --------- | | --------- - +------------+ - Rosenberg Expires December 28, 2006 [Page 37] Internet-Draft ICE June 2006 | ^ | | | | +-------+ Timer Tr -------- Send Req Figure 9 The state machine has six states - Waiting, Testing, Recv-Valid, Send-Valid, Valid and Invalid. In the Waiting state, the agent is waiting to send or receive a connectivity check for the pair. In the Testing state, the agent has sent a connectivity check and is awaiting a response. In the Recv-Valid state, the agent knows that its peer can receive packets from it on this transport address pair. In the Send-Valid state, the agent knows that its peer can send packets to it. In the Valid state, the agent knows that its peer can both send and receive packets from it. Initially, all transport address pairs start in the Waiting state. In this state, the agent waits for one of three events - a chance to send a Binding Request, receipt of a Binding Request, or receipt of a Binding Response. Since there is an instance of the state machine for each transport address pair, Binding Requests and responses need to be matched to the specific state machine for which they were meant to apply. As described below, the Binding Request may not be a match for the transport address pair it was meant to validate. To find the transport address pair it was meant to validate, called the target transport address pair, the agent examinines the USERNAME of the incoming Binding Request. The USERNAME directly contains the transport address pair ID for the pair it was meant to validate. Binding Responses are matched to their requests using the STUN transaction ID, and then mapped to the transport address pair from that. For each media stream, the agent starts a new connectivity check for a transport address pair every Tb*RND seconds. Tb SHOULD scale linearly with the number of media streams, so that the pace of connectivity checks overall is invariant to the number of media streams. Consequently, it is RECOMMENDED that Tb have a default value of N*50ms, where N is the number of media streams. RND is a random number chosen uniformly between 0.7 and 1.3, and it helps to avoid synchronization between the transmission of connectivity checks for different media streams. On average, if there are N media streams, the checks across all media streams will be paced out at a Rosenberg Expires December 28, 2006 [Page 38] Internet-Draft ICE June 2006 total of N/Tb checks per second. The check is started for the first transport address pair in the transport address pair check ordered list that is in the Waiting state. The "Selected" event is passed to the state machine for this transport address pair, causing it to be moved to the Testing state. The agent then sends a connectivity check using a STUN Binding Request, as outlined in Section 7.7. Once a STUN connectivity check begins, the processing of the check follows the rules for STUN. Specifically, retransmits of STUN requests are done as specified in [12], and furthermore, if a transaction fails and needs to be retried, that retry can happen rapidly, as described below. It doesn't "count" against the average rate limit of 1/Tb checks per second per media stream. In addition, the keepalives that are generated for a valid pair do not count against the rate limit either. The rate limit applies strictly to the start of connectivity checks for a transport address pair that has been newly signaled through an offer/answer exchange. When an agent receives a Binding Request, which per the processing rules of Section 7.8 produces a succesful response, the agent examines the source transport address of the request. If the native transport address was relayed, this would be the source as seen by the relay. For the STUN relay usage, that source transport address will be present in the REMOTE-ADDRESS attribute of a STUN Data Indication message, if the Binding Request was delivered through a Data Indication. If the Binding Request was not encapsulated in a Data Indication, that source transport address is equal to the current active destination for the STUN relay session. If the source transport address matches the remote transport address of the target transport address pair, the Binding Request is considered to be a match for the target transport address pair. Consequently, a Match Req event is passed to the state machine for the target transport address pair. If the state machine was in the Waiting or Testing state, the state machine moves into the Send-Valid state. If it was previously in the Waiting state, the agent sends a connectivity check of its own for the target transport address pair, as outlined in Section 7.7. If it was in the Testing state, it retransmits a Binding Request for the transaction in progress. This retransmission is one that would not normally occur based on the procedures in [12]. ICE "prods" the STUN transaction state machine to send an extra retransmit, in addition to the one which is scheduled to be sent next. This helps speed up bidirectional connectivity verification when one agent is behind a NAT with an address and port dependent filtering behavior [32]. If the source transport addresses in the Binding Request was not a match for the remote transport address, the Binding Request is Rosenberg Expires December 28, 2006 [Page 39] Internet-Draft ICE June 2006 considered to be a miss for the target transport address pair. Consequently, a Miss event is passed to the state machine of the target transport address pair, and it immediately moves into the Invalid state. Typically, the source transport address won't match when there was a NAT between the sender and receiver with an address and port dependent mapping property, though there are other cases in which this can happen. Though it was a miss for the target transport address pair, the connectivity check may have been a match for a different transport address pair. To determine this, the agent checks the source transport address of the Binding Request against all of the other remote transport addresses of transport address pairs for the same media stream that use the same transport protocol and share the same native transport address (based on transport address ID comparison) of the target. Of those that match (assuming at least one matches), it refines the set further by selecting only those for whom the origination transport address of the remote transport address matches the origination transport address of the remote transport address in the target transport address pair. The origination transport address for a remote transport address is obtained from information signaled in the SDP, and depends on the type. For a local transport address, the origination address equals that local transport address. For a server reflexive transport address, the origination address is obtained from the related address information provided in the SDP. For a relayed transport address, the origination transport address quals that relayed transport address. For these three types, the type is signaled in the SDP. For a peer derived transport address, the origination address is the same as the origination address of the generating transport address. If there was a match (there can only be either one or zero matches), this match is called the alternate. In many cases, the alternate transport address pair will not be in the transport address pair check ordered list; it will have been one of the ones pruned. Indeed, this is why it was pruned - a check on the remaining transport address pairs can serve to validate it. The state machine for the alternate is passed the Match Req event. If it was in the Waiting state, this causes it to move into the Send-Valid state, and a connectivity check is generated for the alternate transport address pair. It may have been in the Testing state, in which case it moves move into the Send-Valid state, and the agent restransmits the Binding Request for the transaction in progress. If it was the in the Recv-Valid state, this causes it to move into the Valid state. If no alternate could be found, it means that a new remote transport address and corresponding origination transport address have been discovered. In this case, the agent follows the procedures of Rosenberg Expires December 28, 2006 [Page 40] Internet-Draft ICE June 2006 Section 7.10.1 to create a new transport address pair and state machine for it. If the Binding Request didn't generate a success response, an Error event is passed to the state machine of the target, causing it to move into the Invalid state. If the agent receives a successful response to its STUN request, it agent examines the transport address in the XOR-MAPPED-ADDRESS attribute of the response. This will be a peer reflexive transport address. If the peer reflexive transport address matches (based on IP address and port comparison) the native transport address of the target transport address pair, a Match Res event is passed to the state machine of the target. If the state machine was in the Testing state, the state machine moves into the Recv-Valid state. If it was in the Send-Valid state, it moves into the Valid state. If, however, the transport addresses didn't match, a Miss event is passed to the state machine of the target, and it immediately moves into the Invalid state. The agent checks the peer reflexive transport address against all of the other native transport addresses for transport address pairs for the same media stream with the same transport protocol and the same remote transport address (based on comparison of transport address ID) as the target. Of those that match (assuming at least one matches), it refines the set further by selecting only those for whom the origination transport address of the native transport address matches the origination address of the native transport address in the target transport address pair. The resulting transport address pair (there can be only zero or one) is called the alternate. In many cases, the alternate transport address pair will not be in the transport address pair check ordered list; it will have been one of the ones pruned. The state machine for the alternate is passed the Match Res event. If it was in the Waiting state, this causes it to move into the Recv-Valid state. It may have been in the Testing state, in which case it moves move into the Recv- Valid state. If it was the in the Send-Valid state, this causes it to move into the Valid state. If no alternate could be found, the Binding Response will create a new peer reflexive transport address, and the procedures of Section 7.10.2 are followed to create a new transport address pair and state machine for it. In any state, if the STUN transaction results in an error, the state machine moves into the Invalid state. A STUN transaction produces an "error" based on the processing in Section 7.7, which indicates which STUN response codes constitute an error as far as ICE processing is concerned. Rosenberg Expires December 28, 2006 [Page 41] Internet-Draft ICE June 2006 If a transport address pair is in the Recv-Valid or Valid state, an agent MUST generate a new STUN Binding Request transaction every Tr seconds. This transaction ensures that NAT bindings for the transport address pair remain open while the candidate is under consideration. The transaction is performed as outlined in Section 7.7. These transactions can also be used to keep the NAT bindings alive when the candidate is promoted to operating, as described in Section 7.12. Tr SHOULD be configurable, and SHOULD default to 15 seconds. These STUN transactions are processed in the same way as any other, and can result in new peer derived transport addresses, or can fail and cause the transport address pair to be invalidated. The candidate pair itself has a state, which is derived from the states of its transport address pairs. If at least one of the transport address pairs in a candidate pair is in the invalid state, the state of the candidate pair is considered to be invalid. If the candidate pair enters this state, an agent moves the state machines for all of the other transport address pairs in this candidate pair into the invalid state as well. This will ensure that connectivity checks never start for those transport address pairs. Furthermore, if checks are already in progress for one of those transport address pairs, the agent ceases them. If all of the transport address pairs making up the candidate pair are Valid, the candidate pair is considered valid. If all of the transport address pairs making up the candidate pair are either Valid or Recv-Valid, and at least one is Recv-Valid, the candidate pair is considered to be Recv-Valid. If all of the transport address pairs making up the candidate pair are either Valid or Send-Valid, and at least one is Send-Valid, the candidate pair is considered to be Send- Valid. If all of the transport address pairs in a candidate pair are in the Waiting state, the candidate pair is in the waiting state. If all of the transport address pairs in the candidate pair are either in the Waiting or Testing states, and at least one is in the Testing state, the state of the candidate pair is Testing. Otherwise, the state of the candidate pair is considered Indeterminate. A candidate itself also has a state. If a candidate is present in at least one valid candidate pair, that candidate is said to be valid. If all of the candidate pairs containing that candidate are invalid, the candidate itself is invalid. Otherwise, the candidate's state is Indeterminate. 7.7. Sending a Binding Request for Connectivity Checks An agent performs a connectivity check on a transport address pair by sending a STUN Binding Request from its native transport address, and Rosenberg Expires December 28, 2006 [Page 42] Internet-Draft ICE June 2006 sending it to the remote transport address. Sending from its native transport address is done by sending it from its origination transport address. As mentioned above, the origination transport address depends on the type of transport protocol and the type of transport address (local, reflexive, or relayed). This specification defines the meaning for UDP. Specifications defining other transport protocols must define what this means for them. For UDP-based local transport addresses, sending from the local transport address has the meaning one would expect - the request is sent such that the source IP address and port equal that of the local transport address. For reflexive transport addresses, it is sent by sending from the associated local transport address used to derive that reflexive address. For relayed transport addresses, it is sent by using STUN mechanisms to send the request through the STUN relay (using the Send request). Sending the request through the STUN relay server necesarily requires that the request be sent from the client, using the local transport address used to derive the relayed transport address. The Binding Request sent by the agent MUST contain the USERNAME attribute. This attribute MUST be set to the transport address pair ID of the corresponding transport address pair as seen by its peer. Thus, for the first transport address pair in Figure 7, if the agent on the left sends the STUN Binding Request, the USERNAME will have the value R:1:L:1. If the agent on the right sends the STUN Binding Request, the USERNAME will have the value L:1:R:1. To be clear, the USERNAME that is used is NOT the one seen locally, but rather the one as seen by its peer. The request SHOULD contain the MESSAGE- INTEGRITY attribute, computed according to [12]. The key used as input to the HMAC is the password provided by the peer for this remote transport address. This password will be identical for all remote transport addresses for the same media stream. Note that all ICE implementations are required to be compliant to [12], as opposed to the older [14]. Consequently, all connectivity checks will contain the magic cookie in the STUN header, and cause the STUN server embedded in each ICE implementation to include XOR- MAPPED-ADDRESS attributes in the response, rather than MAPPED- ADDRESS. Once created, the STUN transaction is linked to the transport address pair so that, when the response is received, the state machine on the linked transport address pair can be updated. The STUN transaction will generate either a timeout, or a response. If the response is a 420, 500, or 401, the agent should try again as described in [12] (as mentioned above, it need not wait the roughly Rosenberg Expires December 28, 2006 [Page 43] Internet-Draft ICE June 2006 Tb seconds to try again). Either initially, or after such a retry, the STUN transaction might produce a non-recoverable failure response or a failure result inapplicable to this usage of STUN and thus unrecoverable. If this happens, an error event is generated into the state machine, and the transport address pair enters the invalid state. If the STUN transaction times out, the client SHOULD NOT retry. The only reason a retry might succeed is if there was severe packet loss during the duration of the check, or the answer was significantly delayed, also due to packet loss. However, STUN Binding Request transactions run for 9.5 seconds, which is well beyond the typical tolerance for a session establishment. The retries come with a penalty of additional traffic, which can be used to launch DoS attacks (see Section 13.4.2). The only reason to not follow the SHOULD NOT is if the agent has adjusted the STUN transaction timers to be more aggressive. If the Binding Response is a 200, the agent SHOULD check for the MESSAGE-INTEGRITY attribute and verify it, as discussed in [12]. Indeed, this check SHOULD be done for all responses. This will result in the response being discarded (eventually leading to a timeout), if the integrity check fails. 7.8. Receiving a Binding Request for Connectivity Checks As a result of providing a list of candidates in its offer or answer, an agent will receive STUN Binding Request messages. An agent MUST be prepared to receive STUN Binding Requests on each local transport address from the moment it sends an offer or answer that contains a candidate with that local transport address. Similarly, it MUST be prepared to receive STUN Binding Requests on a local transport address the moment it sends an offer or answer that contains a derived candidate derived from that local transport address. It can cease listening for STUN messages on that local transport address after sending an updated offer or answer which does not include any candidates with transport addresses that are equal to or derived from that local transport address. As discussed in [12], since the username and password for STUN requests are exchanged through another mechanism - here, ICE - the Shared Secret Request mechanism is not needed and need not be implemented by agents that provide the connectivity check usage. One of the candidates may be in use as the operating candidate, or may become promoted to the operating candidate in the next offer/ answer exchange as a consequence of a successful validation. In either case, both media and STUN packets will be sent to the Rosenberg Expires December 28, 2006 [Page 44] Internet-Draft ICE June 2006 transport addresses comprising that candidate, causing both to receive on their associated local transport addresses. The agent MUST be able to disambiguate them. This is done trivially by looking for the STUN magic cookie as the value of the second 32-bit word in the packet. If present, it identifies a STUN packet. Processing of the Binding Request proceeds in two steps. The first is generation of the response, and the second is ICE-specific processing. Generation of the response follows the general procedures of [12], and is independent of the state machinery described in Section 7.6. The USERNAME is considered valid if one of the candidate IDs sent in an offer or answer is a prefix of the USERNAME (this will always be the case, even for peer reflexive candidates), and for the component indicated in the USERNAME, the associated local transport address matches the local transport address on which the request was received. The password associated with that candidate ID, which was provided by the agent to its peer, is used to verify the MESSAGE-INTEGRITY attribute, if one was present in the request. If the USERNAME is not valid, the agent generates a 430. Otherwise, the success response will include the XOR-MAPPED- ADDRESS attribute, which is used for learning new candidates, as described in Section 7.10. The XOR-MAPPED-ADDRESS attribute is constructed using the source IP address and port of the Binding Request. For Binding Requests received over relayed transport addresses, this MUST be the source IP address and port of the Binding Request when it arrived at the relay, prior to forwarding towards the agent. That source transport address will be present in the REMOTE- ADDRESS attribute of a STUN Data Indication message, if the Binding Request was delivered through a Data Indication. If the Binding Request was not encapsulated in a Data Indication, that source address is equal to the current active destination for the STUN relay session. The ICE processing involves changes to the state machine for a transport address pair. This processing cannot be done until the initial offer/answer exchange has completed. As a consequence, if the offerer received a Binding Request that generated a success response, but had not yet received the answer to its offer, it waits for the answer, and when it arrives, then performs the ICE processing. The agent takes the entire contents of the USERNAME, and compares them against the transport address pair identifiers as seen by that agent for each transport address pair. If there is no match, nothing is done - this should never happen for compliant implementations. If there is a match, the resulting transport address pair is called the matching transport address pair. The state machine for the matching transport address pair is then updated based on the receipt of a STUN Rosenberg Expires December 28, 2006 [Page 45] Internet-Draft ICE June 2006 Binding Request, and the resulting actions described in Section 7.6 are undertaken. An agent will continue to receive periodic STUN connectivity checks on a local transport address as long as it had listed that transport address, or one derived from it, in an a=candidate attribute in its most recent offer or answer and the transport address is for UDP. Whether STUN keepalives are used for other transport protocols is defined by the specifications for that transport protocol. The agent processes any such transactions according to this section. It is possible that a transport address pair that was previously valid may become invalidated as a result of a subsequent failed STUN transaction. 7.9. Promoting a Candidate to Operating As a consequence of the connectivity checks, each agent will change the states for each transport address pair, and consequently, for the candidate pairs. When a candidate pair enters the valid state, and the agent is in the role of offerer for that candidate pair, the agent follows the logic in this section. The rules only apply to the offerer of a candidate pair in order to eliminate the possibility of both agents simultaneously offering an update to promote a candidate to operating. The agent locates the candidate pair in the candidate pair priority ordered list. If it is the highest priority candidate pair, the agent SHOULD send an updated offer immediately as described in Section 7.11.1. If it is not the highest priority candidate pair, and the states of all lower priority candidate pairs are Invalid, the agent SHOULD send an updated offer immediately. If it is not the highest priority candidate pair, and the state of at least one of the lower priority candidate pairs is Indeterminate, the agent does nothing. Tests have yet to begin for higher priority candidate pairs. If it is not the highest priority candidate pair, and none of the lower priority candidate pairs have a state of Indeterminate, the agents starts a timer, called the wait-state timer, but only if this timer is not already running. The timer is set to fire in Tws seconds. Tws SHOULD be configurable, and SHOULD have a default of Tws = max(0, 200ms - N*Tb), where N is the number of components for the candidates for this media stream. The 200ms allows for a single STUN retransmission (which takes 100ms) and an RTT of 100ms. This timer allows for a higher priority connectivity check to complete, in the event its STUN Binding Request was lost or delayed in the network. Note that the timer goes to zero as the number of components increases. If, prior to the wait-state timer firing, another connectivity check completes and a candidate pair is validated, there is no need to reset or cancel the timer. Once the Rosenberg Expires December 28, 2006 [Page 46] Internet-Draft ICE June 2006 timer fires, the agent SHOULD issue an updated offer as described in Section 7.11.1. This updated offer will use the highest priority candidate pair in Valid state when the timer fires. 7.10. Learning New Candidates from Connectivity Checks ICE makes use of reflexive addresses, which are addresses that inform an agent of its transport address as seen by another host. An initial offer or answer generated by an agent includes server reflexive addresses, which are learned from a configured or discovered STUN server in the network. However, the connectivity checks themselves can inform an agent of reflexive addresses, and in particular, ones that are reflexive towards its peer. These are called peer reflexive candidates. A new peer reflexive candidate is typically observed when two agents are separated by a NAT with the address-dependent or address and port dependent mapping properties [32]. However, in unusual topologies, peer reflexive candidates can be observed even when there are only NATs with the endpoint independent mapping property. Because STUN and the media packets are sent on the same port, regardless of the filtering properties of the NAT (whether endpoint independent, address dependent, or address and port dependent), this reflexive address can be used by the peer for sending STUN and media packets back towards the agent. To obtain and use these peer reflexive transport addresses, ICE agents MUST perform the additional processing on the receipt of STUN Binding Requests and responses described in the following two subsections. These procedures are not just applied in the (hopefully increasingly rare) case of address and port dependent mapping NATs. They are also needed for behave-compliant NATs [32]. 7.10.1. On Receipt of a Binding Request The procedures in this section are followed when an agent receives a STUN Binding Request matched to a target transport address pair whose source transport address (where the source is the one seen by the relay for requests received on a relayed transport address) doesn't match any of the existing remote transport addresses, or where the source matches, but the origination transport address does not. This source address and its associated origination transport address become a new remote transport address. To use it, that source transport address needs to be associated with a candidate (called a peer-derived candidate). In this case, however, the candidate isn't signaled through an offer/answer exchange; it is constructed dynamically from information in the STUN request. Like all other candidates, the peer-derived candidate has a candidate ID. The candidate ID is derived from the candidate IDs of Rosenberg Expires December 28, 2006 [Page 47] Internet-Draft ICE June 2006 the target candidate pair. In particular, the candidate ID is constructed by concatenating the remote candidate ID with the native candidate ID (without the colon). The password for the new candidate equals that of the remote candidate ID in the target candidate pair (note that, this password would be the same for all remote candidates for the same media line). When the STUN Binding Request is received, the agent constructs the candidate ID for the peer reflexive candidate, and checks to see if that candidate exists. It may already exist if it had been constructed as a consequence of a previous application of this logic on receipt of a Binding Request from a different remote transport address of the same new peer reflexive candidate. If there is not yet a peer reflexive candidate with that candidate ID, the agent creates it, and assigns it the newly computed candidate ID. The priority of the peer-derived candidate is set to the priority of its generating candidate. The generating candidate is the one that the new peer derived candidate comes from - the remote candidate in the target candidate. Note that, at this time, the peer derived candidate has no transport addresses in it. The remote candidate is then paired up with a native candidate. However, unlike the procedures of Section 7.5, which pair up each remote candidate with each native candidate, this peer reflexive candidate is only paired up with a the native candidate from the candidate pair from which it was derived. This creates a new candidate pair. This new candidate pair is inserted into the candidate pair priority ordered list based on the ordering rules defined in Section 7.5. Note that no entries are added to the transport address pair check ordered list. Recall that, for each candidate pair, one agent plays the role of offerer, and the other of answerer. For a peer-reflexive candidate, the role is identical to that of its generating candidate. Newly created or not, the agent extracts the component ID from the matching transport address pair, and sees if a transport address with that same component ID exists in the peer reflexive candidate. If it does, the agent does nothing further. This can happen in unusual cases when there is a NAT reboot in the middle of a STUN transaction, causing two requests in the same transaction two produce two different transport addresses. If there is no transport address with the same component ID in the peer reflexive candidate, the agent adds a transport address to the peer reflexive candidate. This transport address is equal to the source IP address and port from the incoming STUN Binding Request (and in the case of Binding Request received on a relayed transport address, the one seen by the relay), and has a transport protocol equal to that of the incoming STUN request. It is Rosenberg Expires December 28, 2006 [Page 48] Internet-Draft ICE June 2006 assigned the component ID equal to the component ID in the target transport address pair. This new transport address will have a transport address ID, equal to the concatenation of the candidate ID for this new candidate, and the component ID, separated by a colon. The type of the transport address is considered to be peer reflexive, though this is never signaled through SDP and so there is no candidate-types value defined for it. Recall that each transport address is associated with an origination transport address. For server reflexive candidates, the origination transport address is signaled through SDP. For peer reflexive transport addresses, it is inherited from the origination transport address of the generating transport address. If the generating transport address was a local transport address, then the origination transport address is that transport address. If the generating transport address was server reflexive, the origination transport address is the related transport address that was signaled for that server reflexive candidate. If the generating transport address was relayed, the origination transport address is the relayed transport address itself. Whether and how other candidate attributes defined by extensions are inherited depends on the extension. The newly added transport address is paired up with the native transport address with the same component ID. Initially, the peer reflexive candidate will start with a single transport address a transport address pair. More are added as the connectivity checks for the original candidate pair take place. Figure 10 provides a pictorial representation of the peer reflexive candidate (the one with id=RL) and its pairing with the native candidate with ID L. The candidate with ID R is the generating candidate. The peer reflexive candidate is effectively an alternate for that generating candidate, but is only paired with a specific native candidate. Note that, for a particular generating candidate, there can be many peer derived candidates, up to one for each native candidate. Also note that candidate IDs with values "L" and "R" and "RL" are not actually permitted, since all candidate IDs must be at least four characters long. These shortened candidate IDs are used to keep the figure readable. Rosenberg Expires December 28, 2006 [Page 49] Internet-Draft ICE June 2006 ............. ............. . tid=L:1 . . tid=R:1 . component. -- . id=L:1:R:1 . -- .component id=1 . | A|-------------------------| C| . id=1 . -- -------+ . -- . . . | . . Generating . . | . . Candidate . tid=L:2 . | . tid=R:2 . component. -- . | id=L:2:R:2 . -- .component id=2 . | B|-------C-----------------| D| . id=2 . -- -----+ | . -- . .............| | ............. Native | | Remote Candidate | | Candidate id=L | | id=R | | | | ............. | | . tid=RL:1 . | | id=L:1:RL:1 . -- .component | +-----------------| C| . id=1 | . -- . | . . Peer Derived | . . Candidate | . tid=RL:2 . | id=L:2:RL:2 . -- .component +-------------------| D| . id=2 . -- . ............. Remote Candidate id=RL Figure 10 The new transport address pair has a state machine associated with it. The state that is entered, and actions to take as a consequence, are specific to the transport protocol. For UDP, the procedures are defined here. Extensions that define processing for other transport protocols SHOULD describe the behavior. For UDP, the state machine enters the Send-Valid state. Effectively, the Binding Request just received "counts" as a validation in this direction, even though it was formally done for a different transport address pair. In addition, the agent generates a Binding Request for the new transport address pair, as described in Section 7.7. Processing of the response follows the logic described in Section 7.6. Rosenberg Expires December 28, 2006 [Page 50] Internet-Draft ICE June 2006 As with all candidate pairs, the state of this new candidate pair is derived from the states of its transport address pairs. Until the number of transport address pairs in the candidate pair equals the transport address pair count of the candidate pair from which it is derived, the state of the candidate pair is Indeterminate. Once they are equal, the state is derived just like any other candidate pair. 7.10.2. On Receipt of a Binding Response The procedures on receipt of a Binding Response are nearly identical to those for receipt of a Binding Request as described above. The procedures in this section are followed when an agent receives a STUN Binding Response matched to a transport address pair whose XOR- MAPPED-ADDRESS doesn't match any of the existing native transport addresses. The XOR-MAPPED-ADDRESS becomes a new native transport address. To use it, the XOR-MAPPED-ADDRESS needs to be associated with a candidate (called a peer-derived candidate). In this case, however, the candidate isn't signaled through an offer/answer exchange; it is constructed dynamically from information in the STUN response. Like all other candidates, the peer-derived candidate has a candidate ID. The candidate ID is derived from the candidate IDs of the target candidate pair. In particular, the candidate ID is constructed by concatenating the native candidate ID with the remote candidate ID (without the colon). The password for the new candidate equals that of the native candidate ID in the matching candidate pair (note that, this password would be the same for all native candidates for the same media line). When the Binding Response is received, the agent constructs the candidate ID that represents the peer reflexive candidate, and checks to see if that candidate exists. It may already exist if it had been constructed as a consequence of a previous application of this logic on receipt of a Binding Response for a different transport address pair of the same candidate pair. If there is not yet a peer reflexive candidate with that candidate ID, the agent creates it, and assigns it the newly computed candidate ID. The priority of the peer-derived candidate is set to the priority of its generating candidate - the native candidate in the target transport address pair. Note that, at this time, the peer derived candidate has no transport addresses in it. The native candidate is then paired up with a remote candidate. However, unlike the procedures of Section 7.5, which pair up each native candidate with each remote candidate, this peer reflexive candidate is only paired up with the remote candidate from the target candidate pair. This creates a new candidate pair. This new candidate pair is inserted into the Rosenberg Expires December 28, 2006 [Page 51] Internet-Draft ICE June 2006 candidate pair priority ordered list based on the ordering rules defined in Section 7.5. Note that no entries are added to the transport address pair check ordered list. Recall that, for each candidate pair, one agent plays the role of offerer, and the other of answerer. For a peer-reflexive candidate, the role is identical to that of its generating candidate. Newly created or not, the agent extracts the component ID from the target transport address pair, and sees if a transport address with that same component ID exists in the peer reflexive candidate. If it does, the agent does nothing further. This can happen in unusual cases when there is a NAT reboot in the middle of a STUN transaction, causing two requests in the same transaction two produce two different transport addresses. If there is no transport address with the same component ID in the peer reflexive candidate, the agent adds a transport address to the peer reflexive candidate. This transport address is equal to the XOR-MAPPED-ADDRESS from the incoming STUN Binding Response, and has a transport protocol equal to the one used for the Binding Response. It is assigned the component ID equal to the component ID in the matching transport address pair. This transport address will have a transport address ID, equal to the concatenation of the candidate ID for this new candidate, and the component ID, separated by a colon. The type of the transport address is considered to be peer reflexive, though this is never signaled through SDP and so there is no candidate-types value defined for it. Recall that each transport address is associated with an origination transport address. For server reflexive candidates, the origination transport address is signaled through SDP. For peer reflexive transport addresses, it is inherited from the origination transport address of the generating transport address. If the generating transport address was a local transport address, then the origination transport address is that transport address. If the generating transport address was server reflexive, the origination transport address is the related transport address that was signaled for that server reflexive candidate. If the generating transport address was relayed, the origination transport address is the relayed transport address itself. Whether and how other candidate attributes defined by extensions are inherited depends on the extension. The newly added transport address is paired up with the remote transport address with the same component ID. Initially, the peer reflexive candidate will start with a single transport address a transport address pair. More are added as the connectivity checks for the original candidate pair take place. The new transport address pair has a state machine associated with it. The state that is entered, and actions to take as a consequence, Rosenberg Expires December 28, 2006 [Page 52] Internet-Draft ICE June 2006 are specific to the transport protocol. For UDP, the procedures are defined here. Extensions that define processing for other transport protocols SHOULD describe the behavior. For UDP, the state machine enters the Recv-Valid state. Effectively, the Binding Response just received "counts" as a validation in this direction, even though it was formally done for a different candidate pair. The peer will likely generate a Binding Request for this candidate pair; processing of the request follows the logic described in Section 7.6. As with all candidate pairs, the state of this new candidate pair is derived from the states of its transport address pairs. Until the number of transport address pairs in the candidate pair equals the transport address pair count of the candidate pair from which it is derived, the state of the candidate pair is Indeterminate. Once they are equal, the state is derived just like any other candidate pair. 7.11. Subsequent Offer/Answer Exchanges An agent MAY issue an updated offer at any time. This updated offer may be sent for reasons having nothing to do with ICE processing (for example, the addition of a video stream in a multimedia session), or it may be due to a change in ICE-related parameters. For example, if an agent acquires a new candidate after the initial offer/answer exchange, it may seek to add it. However, agents SHOULD follow the logic described in Section 7.9 to determine when to send an updated offer as a consequence of promoting a candidate to operating. If there are any aspects of this processing that are specific to the transport protocol, those SHOULD be called out in ICE extensions that define operation with other transport protocols. There are no additional considerations for UDP. 7.11.1. Sending of a Subsequent Offer The offer MAY contain a new operating candidate in the m/c line. This candidate SHOULD be the native candidate from the highest priority candidate pair in the candidate pair priority ordered list whose state is Valid. If there are no candidate pairs in this state, the highest one whose state is Send-Valid or Recv-Valid SHOULD be used. If there are no candidate pairs in these states, the candidate pair that is most likely to work with this peer, as described in Section 7.2, SHOULD be used. The candidate is encoded into the m/c line in an updated offer as described in Section 7.3. Note that, while peer-derived candidates never appear in a=candidate attributes Rosenberg Expires December 28, 2006 [Page 53] Internet-Draft ICE June 2006 (only their generating candidates appear there), a peer-derived candidate can appear in the m/c line if it has been selected for usage for media. If the candidate pair whose native candidate was encoded into the m/c-line was Valid, Send-Valid or Recv-Valid, the agent MUST include an a=remote-candidate attribute into the offer. This attribute MUST contain the candidate ID of the remote candidate in the candidate pair. It is used by the recipient of the offer in selecting its candidate for the answer. Because the native candidate in the m/c- line will typically be Valid, Send-Valid or Recv-Valid in every offer after the initial one, the a=remote-candidate attribute will typically be used in all subsequent offers. The meaning of a=candidate attributes within a subsequent offer have the same meaning as they do in an initial offer. They are a request for the peer to attempt (or continue to attempt if the candidate was provided previously) a connectivity check using STUN from each of its own candidates. When an updated offer is sent, there are several dispositions regarding the candidates: retained: A candidate is retained if the candidate ID for the candidate is included in the new offer, and matches the candidate ID for a candidate in the previous offer or answer from the agent. In this case, all of the information about the candidate - its qvalue and components, and the IP addresses, ports, and transport protocols of its components, MUST be the same as the previous offer or answer from the agent. If the agent wants to change them, this is accomplished by changing the candidate ID as well. That will have the effect of removing the old candidate and adding a new one with the updated information. removed: A candidate is removed if its candidate ID appeared in a previous offer or answer, and that candidate ID is not present in the new offer. added: A candidate is added if its candidate ID appeared in the new offer, but was not present in a previous offer or answer from that agent. The following rules are used to determine the disposition of the each of the current native candidates in the new offer: o If a candidate is invalid, and all peer reflexive candidates generated from it are invalid as well, it SHOULD be removed. o If the candidate in the m/c-line is valid, all other lower priority candidates SHOULD be removed. This has the effect of Rosenberg Expires December 28, 2006 [Page 54] Internet-Draft ICE June 2006 stopping connectivity checks of other candidates. This SHOULD would not be followed if an agent wanted to keep a candidate ready for usage if, for some reason, the operating candidate later become invalid. o If the candidate in the m/c-line is valid, and it is not peer reflexive, that candidate MUST be retained. If the candidate in the m/c-line is peer reflexive, its generating candidate MUST be retained, even if it is itself invalid. o If the candidate in the m/c-line has not been validated, all other candidates that are not invalid, or candidates for whom their derived candidates are not invalid, SHOULD be retained. o Peer reflexive candidates MUST NOT be added; they continue to be used as long as their generating candidate was retained. Peer derived candidates are learned exclusively through the STUN connectivity checks. A new candidate MAY be added. This can happen when the candidate is a new one, learned since the previous offer/answer exchange, and it has a higher priority than the currently operating candidate. It can also occur when an agent wishes to restart checks for a transport address it had tried previously. Effectively, changing the candidate ID value in an updated offer will "restart" connectivity checks for that candidate. If a candidate is removed, the agent takes the following steps once the offer is sent: 1. The agent eliminates any candidate pairs whose native candidate equalled the candidate that was removed. Equality is based on comparison of candidate IDs. 2. The agent eliminates any candidate pairs that had a native candidate that is a peer reflexive candidate generated from the candidate that was removed. 3. The candidate pairs that are eliminated are removed from the candidate pair priority ordered list. Their corresponding transport address pairs are removed from the transport address pair check ordered list. As a consequence of this, if connectivity checks had not yet begun for the candidate pair, they won't. If a transport address pair had been pruned from the transport address pair check ordered list because it was redundant with one of the transport address pairs which was just removed, that transport address pair is added back to the list. Rosenberg Expires December 28, 2006 [Page 55] Internet-Draft ICE June 2006 4. If connectivity checks were already in progress for transport addresses in a candidate pair that was removed, the agent SHOULD immediately terminate them. No further retransmissions take place, and no further transactions from that candidate will be made. 5. If the removed candidate was a relayed candidate, the agent SHOULD de-allocate its transport addresses from the STUN relay if it is not using those resources elswhere. If a local candidate was removed, and all of its derived candidates were also removed (including any peer reflexive candidates), local operating system resources for each of the transport addresses in the local candidate SHOULD be de-allocated, as long as it is not using those resources elsewhere. The resources may be in use elsewhere if they were included in an initial offer which generated multiple answers (as can happen with SIP forking). In such a case, a subsequent offer which removes the candidate will not imply its removal with the other branches; each becomes a separate offer/answer relationship. Subsequent offers MUST contain a=ice-pwd attributes that specify the password for the candidates for each media stream. If any of the candidates for a particular m-line are the same as the previous offer, the ICE password for that m-line MUST be the same. If all of the candidates for a particular m-line are different from the previous offer, the ICE password for that m-line MAY be different. Note that it is permissible to use a session-level attribute in one offer, but to provide the same password as a media-level attribute in a subsequent offer. This is not a change in password, just a change in its representation. 7.11.2. Receiving the Offer and Sending an Answer To generate the answer, the answerer has to decide which transport addresses to include in the m/c line, and which to include in candidate attributes. The first step in the process is to look for the a=remote-candidate attribute in the offer. The a=remote-candidate exists to eliminate a race condition between the updated offer and the response to the STUN Binding Request that moved a candidate into the Valid state. This race condition is shown in Figure 11. On receipt of message 5, agent A can move its transport address pair state machine into the Valid state. It sends a STUN response to the request (message 6), but this is lost. Agent A proceeds with an updated offer (message 7), which is received at agent B. As far as agent B is concerned, the transport address pair is still in the Send-Valid state. It will move into the Valid state only on receipt of the STUN response in message 10. Rosenberg Expires December 28, 2006 [Page 56] Internet-Draft ICE June 2006 Thus, upon receipt of the offer, agent B cannot determine which candidate to include in its answer. To eliminate this condition, the identity of the validated candidate is included in the offer itself. Note, however, that the answerer will not send media until it has received this STUN response. Agent A Network Agent B |(1) Offer | | |------------------------------------------>| |(2) Answer | | |<------------------------------------------| |(3) STUN Req. | | |------------------------------------------>| |(4) STUN Res. | | |<------------------------------------------| |(5) STUN Req. | | |<------------------------------------------| |(6) STUN Res. | | |-------------------->| | | |Lost | |(7) Offer | | |------------------------------------------>| |(8) Answer | | |<------------------------------------------| |(9) STUN Req. | | |<------------------------------------------| |(10) STUN Res. | | |------------------------------------------>| Figure 11 If the a=remote-candidate attribute is present, the agent examines the transport addresses in the m/c-line of the offer. It compares these with the transport addresses in the remote candidates of all candidate pairs. If there is no match, no further processing of the a=remote-candidate attribute is done. If there is at least one match, the agent compares the native candidate ID of each matching pair with the value of the a=remote-candidate attribute. If there is a match, that candidate pair is selected. For each transport address pair in that candidate pair, if the state of the transport address pair is Send-Valid, the agent considers the state to be Valid just for the purpose of constructing the answer. In particular, it will impact selection of the candidate for the m/c-line and the set of additional candidates to include or exclude from the answer. However, the actual state MUST remain Send-Valid. This state will be used to determine when it is safe to send media. Keeping it at Send- Rosenberg Expires December 28, 2006 [Page 57] Internet-Draft ICE June 2006 Valid is necessary to prevent against DoS attacks. Note that the a=remote-candidate attribute SHOULD NOT be included in the answer, and if included, will just be ignored by the offerer, since it is not used in any processing of the answer. Rules for choosing transport addresses for the m/c-line are as follows. The agent examines the transport addresses in the m/c-line of the offer. It compares these with the transport addresses in the remote candidates of candidate pairs whose states are Valid. If there is a matching candidate pair in that state, the pair with the highest priority MUST be chosen, and the native candidate from that pair used as the operating candidate. If there were no matching candidate pairs in the Valid state (possibly because the transport addresses in the m/c-line in the offer didn't match any of the remote candiadtes), the candidate that is most likely to work with this peer, as described in Section 7.2, SHOULD be used. Note that this candidate may be Valid as a consequence of being temporarily changed to such by the a=remote-candidate attribute. Like the offerer, the answerer can decide, for each of its candidates, whether they are retained or removed. The same rules defined in Section 7.11.1 for determining their disposition apply to the answerer. Similarly, if a candidate is removed, the same rules in Section 7.11.1 regarding removal of canididate pairs and freeing of resources apply. As with selection of the candidate for the m/c- line, the state of one of the candidates may be Valid as a consequence of being temporarily changed to such by the a=remote- candidate attribute. Once the answer is sent, the answerer will have the set of native and remote candidates before this offer/answer exchange, and the set of native and remote candidates afterwards. A peer derived candidate continues to be used as long as its generating parent continues to be used. The agent then pairs up the native and remote candidates which were added or retained. This leads to a set of current candidate pairs. If a candidate pair existed previously, but as a consequence of the offer/answer exchange, it no longer exists, the agent takes the following steps: 1. The candidate pair is removed from the candidate pair priority ordered list. Their corresponding transport address pairs are removed from the transport address pair check ordered list. As a consequence of this, if connectivity checks had not yet begun for the candidate pair, they won't. If a transport address pair had been pruned from the transport address pair check ordered list Rosenberg Expires December 28, 2006 [Page 58] Internet-Draft ICE June 2006 because it was redundant with one of the transport address pairs which was just removed, that transport address pair is added back to the list. 2. If connectivity checks were already in progress for that candidate pair, the agent SHOULD immediately terminate any STUN transactions in progress from that candidate. No further retransmissions take place, and no further transactions from that candidate will be made. 3. If the agent receives a STUN Binding Request for that candidate pair, however, processing occurs as defined in Section 7.8. If a candidate pair existed previously, and continues to exist, no changes are made; any STUN transactions in progress for that candidate pair continue, it remains on the candidate pair priority ordered list, and its transport address pairs remain on the transport address pair check ordered list. If a candidate pair is new (because either its native candidate is new, or its remote candidate is new, or both), the agent takes the role of answerer for this candidate pair. The new candidate pair is inserted into the candidate pair priority ordered list, and the transport address pair check ordered list is rederived. STUN connectivity checks will start for them based on the logic described in Section 7.6. 7.11.3. Receiving the Answer Once the answer is received, the answerer will have the set of native and remote candidates before this offer/answer exchange, and the set of native and remote candidates afterwards. It then follows the same logic described in Section 7.11.2, pairing up the candidate pairs, removing ones that are no longer in use, and beginning of processing for ones that are new. 7.12. Binding Keepalives Once a candidate is promoted to operating, and media begins flowing, it is still necessary to keep the bindings alive at intermediate NATs for the duration of the session. Normally, the media stream packets themselves (e.g., RTP) meet this objective. However, several cases merit further discussion. Firstly, in some RTP usages, such as SIP, the media streams can be "put on hold". This is accomplished by using the SDP "sendonly" or "inactive" attributes, as defined in RFC 3264 [4]. RFC 3264 directs implementations to cease transmission of media in these cases. However, doing so may cause NAT bindings to timeout, and media won't be able to come off hold. Rosenberg Expires December 28, 2006 [Page 59] Internet-Draft ICE June 2006 Secondly, some RTP payload formats, such as the payload format for text conversation [31], may send packets so infrequently that the interval exceeds the NAT binding timeouts. Thirdly, if silence suppression is in use, long periods of silence may cause media transmission to cease sufficiently long for NAT bindings to time out. To prevent these problems, ICE implementations MUST continue to list their operating candidate in a=candidate lines for UDP-based media streams. As a consequence of this, STUN packets will be transmitted periodically independently of the transmission (or lack thereof) of media packets. These will be received on the same IP address and port as the media streams. The agent determines whether the packet is media or STUN by looking for the magic cookie in bits 32-63 of the data. If present, it indicates that the packet is STUN, and if not, indicates that it is media. This provides a media independent, RTP independent, and codec independent solution for keeping the NAT bindings alive. However, an ICE implementation MUST be prepared for the transport address received in an m/c-line to not correspond to any a=candidate attributes. If an ICE implementation is communciating with one that does not support ICE, keepalives MUST still be sent. Indeed, these keepalives are essential even if neither endpoint implements ICE. As such, this specification defines keepalive behavior generally, for endpoints that support ICE, and those that do not. All endpoints MUST send keepalives for each media session. These keepalives MUST be sent regardless of whether the media stream is currently inactive, sendonly, recvonly or sendrecv. The keepalive SHOULD be sent using a format which is supported by its peer. ICE endpoints allow for STUN-based keepalives for UDP streams, and as such, STUN keepalives MUST be used when an agent is communicating with a peer that supports ICE. An agent can determine that its peer supports ICE by the presence of the a=candidate attributes for each media session. If the peer does not support ICE, the choice of a packet format for keepalives is a matter of local implementation. A format which allows packets to easily be sent in the absence of actual media content is RECOMMENDED. Examples of formats which readily meet this goal are RTP No-Op [28] and RTP comfort noise [24]. If the peer doesn't support any formats that are particularly well suited for keepalives, an agent SHOULD send RTP packets with an incorrect version number, or some other form of error which would cause them to be discarded by the peer. STUN-based keepalives will be sent periodically every Tr seconds as a consequence of the rules in in Section 7.7. If STUN keepalives are Rosenberg Expires December 28, 2006 [Page 60] Internet-Draft ICE June 2006 not in use (because the peer does not support ICE), an agent SHOULD ensure that a media packet is sent every Tr seconds. If one is not sent as a consequence of normal media communications, a keepalive packet using one of the formats discussed above SHOULD be sent. 7.13. Sending Media When an agent receives an offer and sends an answer, or when it receives an answer to an offer it sent, it begins connectivity checks. If there is a candidate that corresponds to the m/c-line, these checks will include validation of the operating candidate pair. In that case, an agent SHOULD NOT send media on the operating candidate pair until that candidate pair has reached the Valid or Recv-Valid state. This is to help prevent a denial-of-service attack, described in Section 13. Once the operating candidate pair reaches the Valid or Recv-Valid state, an agent MAY start sending media to that candidate pair. If there is no candidate that corresponds to the m/c-line, the m/c-line cannot be validated, and media is sent to it as described in RFC 3264 [4]. Under normal conditions, there will be a candidate for the m/c-line. Indeed - ICE itself requires that an agent include one. However, actual SIP deployments have seen usage of network intermediaries which manipulate the m/c-line of offers and answers. Should such elements ignore the candidate attributes, it would manifest itself like an agent which did not include a candidate for the m/c-line. For this reason, this use case is explicitly supported by ICE. Offer/answer exchanges are used with protocols, like SIP, which require media to be sent "early", from the answerer to the offer, prior to completion of the initial offer/answer exchange. It is highly desirable (and sometimes necessary) for this early media to use the candidate pair ultimately selected by ICE connectivity checks. For this reason, ICE provides an early media mechanism that allows for a candidate pair to be used in one direction prior to its promotion to operating in a subsequent offer/answer exchange. Note that, with ICE, early media pertains to media sent to a candidate pair until its promotion to operating in a subsequent offer/answer exchange. This is a broader definition than is used in [26], which defines early media as media sent prior to acceptance of a call. As a consequence of the connectivity checks, an agent will change the states for each transport address pair, and consequently, for the candidate pairs. When a candidate pair becomes Valid or Recv-Valid, and there is a candidate pair for the m/c-line, and the candidate pair is not equal to the operating candidate pair, and the agent is in the role of answerer for that candidate pair, the agent checks the position of that pair in the candidate pair priority ordered list. If it is the first, the agent selects this candidate pair for early Rosenberg Expires December 28, 2006 [Page 61] Internet-Draft ICE June 2006 media. If this candidate pair is not the first on the candidate pair priority ordered list, but is higher priority than the operating candidate pair, and the early media wait-state timer has not yet been set, the agent sets this timer to Tws seconds. Though the early media wait state timer has the same value as the wait state timer described in Section 7.9, these are different timers and indeed are set by different entites. The early media wait state timer allows for a higher priority connectivity check to complete, in the event its STUN Binding Request or Response was lost or delayed in the network. If, prior to the early media wait-state timer firing, another connectivity check completes and a candidate pair enters the Valid or Recv-Valid states, there is no need to reset or cancel the timer. Once the timer fires, the agent SHOULD select the highest priority candidate pair in the Valid or Recv-Valid state for which the agent has the role of answerer, and use that candidate pair for early media. ICE processing will ensure that, under almost all circumstances, the candidate pair selected by the answerer for early media will also be the one selected by the offerer for eventual promotion to operating. The early media state implies that the answerer knows that this candidate pair is to be used, but the offerer doesn't know yet that it will eventually be validated. It is for this reason that the candidate pair can be used for early media. If a candidate pair is selected for early media, an agent MAY send media on that candidate pair, even if it is not the same as the operating candidate pair. However, to deal with cases in which the offerer and answerer do not agree on the eventual selection of this candidate for promotion to operating (a rare but possible case), the agent MUST discontinue using the candidate pair for sending media Tlo seconds after the next opportunity its peer would have to send an updated offer. In the case of an answer delivered in a 200 OK to an offer in a SIP INVITE (regardless of whether that same answer appeared in an earlier unreliable provisional response), this would be Tlo seconds after receipt of the ACK. Tlo SHOULD be configurable and SHOULD have a default of 5 seconds. This time represents the amount of time it should take the offerer to perform its connectivity checks, arrive at the same conclusion about the viability of the early candidate, and then generate an updated offer promoting it to operating. If, after Tlo seconds, no updated offer arrives, the answerer MUST cease using the early candidate. Media MAY be sent to the operating candidate pair if it is in the Valid or Recv-Valid state. If an updated offer does arrive prior to the expiration of the timer, the agent MUST execute the procedures in Section 7.11.2, which will result in the selection of a candidate for the m/c-line in the Rosenberg Expires December 28, 2006 [Page 62] Internet-Draft ICE June 2006 answer. At that point, the procedures of this section SHOULD be restarted by the answerer. This implies that the operating candidate pair, if Valid or Recv-Valid, will be used. If a higher priority candidate pair subsequently enters the Valid or Recv-Valid state, it may end up being used as an early candidate. To use a candidate pair, whether it is early or operating, media is sent to the IP addresses and ports of the components in the remote candidate, and sends that media from the IP addresses and ports of the components in the native candidate. Transport addresses are paired up based on component ID. For example, if a remote candidate has two components R1 and R2, and the native candidate has two components L1 and L2, media packets are sent from L1 to R1 and from L2 to R2. This provides a property known as symmetry. This symmetric behavior MUST be followed by an agent even if its peer in the session doesn't support ICE. The definition of sending media "from" a particular transport address depends on the type of transport address. In the case of a server reflexive transport address, this means that the RTP packets are sent from the local transport address used to obtain the STUN address. In the case of a relayed transport address, this means that media packets are sent through the relay server (for STUN relays, this would be using the Send request). For local transport addresses, media is sent from that local transport address. For peer reflexive transport addresses, media is sent from the local transport address used to obtain the reflexive address. ICE has interactions with jitter buffer adaptation mechanisms. An RTP stream can begin using one candidate, and switch to another one. The newer candidate may result in RTP packets taking a different path through the network - one with different delay characteristics. As discussed below, agents are encouraged to re-adjust jitter buffers when there are changes in source or destination address. Furthermore, many audio codecs use the marker bit to signal the beginning of a talkspurt, for the purposes of jitter buffer adaptation. For such codecs, it is RECOMMENDED that the sender change the marker bit when an agent switches transmission of media from one candidate pair to another. 7.14. Receiving Media ICE implementations MUST be prepared to receive media on a candidate pair if it is in the role of offerer for that candidate pair, even if that candidate pair is not currently operating. This is a consequence of the early media mechanism described in the previous section. Rosenberg Expires December 28, 2006 [Page 63] Internet-Draft ICE June 2006 If an agent determines that its peer supports ICE (an offerer knows this when the answer contains a=candidate attributes), it SHOULD discard any media packets received on a candidate pair prior to the candidate pair entering the Send Valid state. This helps eliminate certain attacks, as discussed in Section 13. Note that, in cases of forking, an agent may get multiple answers to its offer, each for a different peer. Consequently, if would only discard media packets received on a candidate pair once it has determined that all forked targets support ICE. It is RECOMMENDED that, when an agent receives an RTP packet with a new source or destination IP address for a particular media stream, that the agent re-adjust its jitter buffers. RFC 3550 [21] describes an algorithm in Section 8.2 for detecting SSRC collisions and loops. These algorithms are based, in part, on seeing different source IP addresses and ports with the same SSRC. However, when ICE is used, such changes will naturally occur as the media streams switch between candidates. An agent will be able to determine that a media stream is from the same peer as a consequence of the STUN exchange that proceeds media transmission. Thus, if there is a change in source IP address and port, but the media packets come from the same peer agent, this SHOULD NOT be treated as an SSRC collision. 8. Guidelines for Usage with SIP SIP [2] makes use of the offer/answer model, and is one of the primary targets for usage of ICE. SIP allows for offer/answer exchanges to occur in many different combinations of messages, including INVITE/200 OK and 200 OK/ACK. When support for reliable provisional responses (RFC 3262 [11]) and UPDATE (RFC 3311 [25]) are added, additional combinations of messages that can be used for offer/answer exchanges are added. As such, this section provides some guidance on good ways to make use of SIP with ICE. ICE requires a series of STUN-based connectivity checks to take place between endpoints. These checks start from the answerer on generation of its answer, and start from the offerer when it receives the answer. These checks can take time to complete, and as such, the selection of messages to use with offers and answers can effect perceived user latency. Two latency figures are of particular interest. These are the post-pickup delay and the post-dial delay. The post-pickup delay refers to the time between when a user "answers the phone" and when any speech they utter can be delivered to the caller. The post-dial delay refers to the time between when a user enters the destination address for the user, and ringback begins as a Rosenberg Expires December 28, 2006 [Page 64] Internet-Draft ICE June 2006 consequence of having succesfully started ringing the phone of the called party. To reduce post-dial delays, it is RECOMMENDED that the caller begin gathering candidates prior to actually sending its initial INVITE. This can be started upon user interface cues that a call is pending, such as activity on a keypad or the phone going offhook. To reduce post-pickup delays, ICE allows for media to be sent from the answerer to the offerer on a candidate pair, prior to its promotion to operating. However, this requires the answerer to have generated its answer and sent it. In most cases, it will require this answer to be received by the offerer. The reason is that connectivity checks or RTP packets from the answerer to the offerer will not be forwarded by NATs towards the offerer until the offerer has established a permission in the NAT by generating a packet towards the answerer. For this reason, if an offer is received in an INVITE request, the UAS SHOULD immediately gather its candidates and then generate an answer in a provisional response. When reliable provisional responses are not used, the SDP in the provisional response is the answer, and that exact same answer reappears in the 200 OK. To deal with possible losses of the provisional response, it SHOULD be retransmitted until some indication of receipt. This indication can either be through PRACK [11], or through the receipt of a STUN Binding Request with a correct username and password. Even if PRACK is not used, the provisional response SHOULD be retransmitted using the exponential backoff described in [11]. Furthermore, once the answer has been sent, the agent SHOULD begin its connectivity checks. Once a candidate reaches the Valid or Recv-Valid state, the UAS has a known-valid path for media packets towards the UAC. This point is called the connected point in ICE. Once the UAS reaches the connected point, media can be sent from the UAS towards the UAC without any additional delays. However, between the receipt of the INVITE and the connected point, any media that needs to be sent towards the caller (such as SIP early media [26] cannot be transmitted. For this reason, implementations MAY choose to delay alerting the called party until the connected point is reached. In the case of a PSTN gateway, this would mean that the setup message into the PSTN is delayed until the connected point. Doing this increases the post-dial delay, but has the effect of eliminating 'ghost rings'. Ghost rings are cases where the called party hears the phone ring, picks up, but hears nothing and cannot be heard. This technique works without requiring support for, or usage of, preconditions [7], since its a localized decision. It also has the benefit of guaranteeing that not a single packet of early media Rosenberg Expires December 28, 2006 [Page 65] Internet-Draft ICE June 2006 will get clipped. If an agent chooses to delay local alerting in this way, it SHOULD generate a 180 response once alerting begins. A slight variation of this approach is to wait for a connectivity check to succeed to a higher priority candidate pair than the operating one. This allows for the agent to only ever send media, early or otherwise, to a single candidate, which will work better with jitter buffers, at the expense of even greater post-dial delays. Note that, prior to the promotion of a candidate pair to operating, the offerer will not be able to send using the candidate pair. When used with SIP, if the initial offer is sent in the INVITE, and the answer is sent in both the provisional and final 200 OK response, the offerer will not be able to send media until it sends a re-INVITE and receives the 200 OK response to that re-INVITE. This can take several hundred milliseconds. If this latency is an issue (it is generally not considered an issue for voice systems), reliable provisional responses [11] MAY be used, in which case an UPDATE [25] can be used to send an updated offer prior to the call being answered. As discussed in Section 13, offer/answer exchanges SHOULD be secured against eavesdropping and man-in-the-middle attacks. To do that, the usage of SIPS [2] is RECOMMENDED when used in concert with ICE. 9. Interactions with Forking SIP allows INVITE requests carrying offers to fork, which means that they are delivered to multiple user agents. Each of those user agents then provides an answer to the offer in the INVITE. The result is that a single offer generated by the UAC produces multiple answers. ICE interacts very well with forking. Indeed, ICE fixes some of the problems associated with forking. Once the offer/answer exchange has completed, the UAC will have an answer from each UAS that received the INVITE. The ICE connectivity checks that ensue will carry transport address pair IDs that correlate each of those checks (and thus their corresponding IP addresses and ports) with a specific remote user agent. As these checks happen before any media is transmitted, ICE allows a UAC to disambiguate subsequent media traffic by looking at the source IP address and port, and then correlate that traffic with a particular remote UA. When SIP is used without ICE, the incoming media traffic cannot be disambiguated without an additional offer/answer exchange. Rosenberg Expires December 28, 2006 [Page 66] Internet-Draft ICE June 2006 10. Interactions with Preconditions Because ICE involves multiple addresses and pre-session activities, its interactions with preconditions merits further discussion. Quality of Service (QoS) preconditions, which are defined in RFC 3312 [7] and RFC 4032 [8], apply only to the IP addresses and ports listed in the m/c lines in an offer/answer. If ICE changes the address and port where media is received, this change is reflected in the m/c lines of a new offer/answer. As such, it appears like any other re- INVITE would, and is fully treated in RFC 3312 and 4032, which applies without regard to the fact that the m/c lines are changing due to ICE negotiations ocurring "in the background". However, usage of early candidates with QoS preconditions is NOT RECOMMENDED, since QoS will only be reserved for the candidate pair in the m/c-line. An agent SHOULD only send to the operating candidate (once it enters the Valid or Recv-Valid states) if QoS preconditions are used for a media session. ICE also has (purposeful) interactions with connectivity preconditions [27]. Those interactions are described there. 11. Examples This section provides two examples. One is a very basic example, and the other is more elaborate. A common configuration and setup is used in both cases. Two agents, L and R, are using ICE. Both agents have a single IPv4 interface. For agent L, it is 10.0.1.1, and for agent R, 192.0.2.1. Both are configured with a single STUN server each (indeed, the same one for each), which is listening for STUN requests at an IP address of 192.0.2.2 and port 3478. This STUN server supports both the Binding Discovery usage and the Relay usage. Agent L is behind a NAT, and agent R is on the public Internet. The public side of the NAT has an IP address of 192.0.2.3. To facilitate understanding, transport addresses are listed using variables that have mnemonic names. This format of the anem is entity-type-seqno, where entity refers to the entity whose interface the transport address is on, and is one of "L", "R", "STUN", or "NAT". The type is either "PUB" for transport addresses that are public, and "PRIV" for transport addresses that are private. Finally, seq-no is a sequence number that is different for each transport address of the same type on a particular entity. Each variable has an IP address and port, denoted by varname.IP and Rosenberg Expires December 28, 2006 [Page 67] Internet-Draft ICE June 2006 varname.PORT, respectively, where varname is the name of the variable. In addition, candidate IDs are also listed using variables that have mnemonic names. Agent L uses candidate ID L1 for its local candidate, L2 for its server reflexive candidate, and L3 for its relayed candidate. Agent R uses R1 for its local candidate and R2 for its relayed candidate. The password is LPASS for each candidate from agent L, and RPASS for each candidate from agent R. The STUN server has advertised transport address STUN-PUB-1 (which is 192.0.2.2:3478) for both the binding discovery usage and the relay usage. In the call flow itself, STUN messages are annotated with several attributes. The "S=" attribute indicates the source transport address of the message. The "D=" attribute indicates the destination transport address of the message. The "MA=" attribute is used in STUN Binding Response messages, STUN Binding Response messages carried in a STUN Send Request or Data Indication, and in a Allocate Response, and refers to the reflexive transport address derived from the XOR-MAPPED-ADDRESS attribute. The "RA=" attribute is used in STUN Data Indications, and refers to the value of the REMOTE-ADDRESS attribute. The "U=" attribute is used in STUN Requests, and corresponds to the STUN USERNAME. The "DA=" attribute is used in STUN Send requests, and refers to the value of the DESTINATION- ADDRESS attribute. The "R=" attribute is used in Allocate responses, and it indicates the value of the RELAY-ADDRESS attribute. The call flow examples omit STUN authentication operations. 11.1. Basic Example In this example, the NAT has an endpoint independent mapping property and an address dependent filtering property. Neither agent is using the STUN relay usage, only the binding discovery usage. As a consequence, agent L will end up with two candidates - a local candidate and a server reflexive candidate. Agent R will have one - a local candidate (the reflexive candidate will be identical to the local one, and thus discarded). The agents are seeking to communicate using a single RTP-based voice stream. RTCP is not used. As a consequence, each candidate has one component. L NAT STUN R |RTP STUN alloc. | | |(1) STUN Req | | | |S=$L-PRIV-1 | | | Rosenberg Expires December 28, 2006 [Page 68] Internet-Draft ICE June 2006 |D=$STUN-PUB-1 | | | |------------->| | | | |(2) STUN Req | | | |S=$NAT-PUB-1 | | | |D=$STUN-PUB-1 | | | |------------->| | | |(3) STUN Res | | | |S=$STUN-PUB-1 | | | |D=$NAT-PUB-1 | | | |MA=$NAT-PUB-1 | | | |<-------------| | |(4) STUN Res | | | |S=$STUN-PUB-1 | | | |D=$L-PRIV-1 | | | |MA=$NAT-PUB-1 | | | |<-------------| | | |(5) Offer | | | |------------------------------------------->| | | | |RTP STUN alloc. | | |(6) STUN Req | | | |S=$R-PUB-1 | | | |D=$STUN-PUB-1 | | | |<-------------| | | |(7) STUN Res | | | |S=$STUN-PUB-1 | | | |D=$R-PUB-1 | | | |MA=$R-PUB-1 | | | |------------->| |(8) answer | | | |<-------------------------------------------| | |(9) Bind Req | | | |S=$R-PUB-1 | | | |D=$NAT-PUB-1 | | | |<----------------------------| | |Dropped | | |(10) Bind Req | | | |S=$L-PRIV-1 | | | |D=$R-PUB-1 | | | |------------->| | | | |(11) Bind Req | | | |S=$NAT-PUB-1 | | | |D=$R-PUB-1 | | | |---------------------------->| | |(12) Bind Res | | | |S=$R-PUB-1 | | | |D=$NAT-PUB-1 | | | |MA=$NAT-PUB-1 | | | |<----------------------------| Rosenberg Expires December 28, 2006 [Page 69] Internet-Draft ICE June 2006 |(13) Bind Res | | | |S=$R-PUB-1 | | | |D=$L-PRIV-1 | | | |MA=$NAT-PUB-1 | | | |<-------------| | | |RTP flows | | | | |(14) Bind Req | | | |S=$R-PUB-1 | | | |D=$NAT-PUB-1 | | | |<----------------------------| |(15) Bind Req | | | |S=$R-PUB-1 | | | |D=$L-PRIV-1 | | | |<-------------| | | |(16) Bind Res | | | |S=$L-PRIV-1 | | | |D=$R-PUB-1 | | | |MA=$R-PUB-1 | | | |------------->| | | | |(17) Bind Res | | | |S=$NAT-PUB-1 | | | |D=$R-PUB-1 | | | |MA=$R-PUB-1 | | | |---------------------------->| | | | |RTP flows Figure 12 First, agent L obtains a server reflexive transport address for its RTP packets (messages 1-4). Recall that the NAT has the address and port independent mapping property. Here, it creates a binding of NAT-PUB-1 for this UDP request, and this becomes the server reflexive transport address for RTP, the sole component of its server reflexive candidate. With its two candidates, agent L prioritizes them, choosing the local candidate as highest priority, followed by the server reflexive candidate. It chooses its server reflexive candidate as the operating candidate, and encodes it into the m/c-line. The resulting offer (message 5) looks like (lines folded for clarity): Rosenberg Expires December 28, 2006 [Page 70] Internet-Draft ICE June 2006 v=0 o=jdoe 2890844526 2890842807 IN IP4 $L-PRIV-1.IP s= c=IN IP4 $NAT-PUB-1.IP t=0 0 a=ice-pwd:$LPASS m=audio $NAT-PUB-1.PORT RTP/AVP 0 a=rtpmap:0 PCMU/8000 a=candidate:$L1 1 UDP 1.0 $L-PRIV-1.IP $L-PRIV-1.PORT typ local a=candidate:$L2 1 UDP 0.7 $NAT-PUB-1.IP $NAT-PUB-1.PORT typ srflx raddr $L-PRIV-1.IP rport $L-PRIV-1.PORT The offer, with the variables replaced with their values, will look like (lines folded for clarity): v=0 o=jdoe 2890844526 2890842807 IN IP4 10.0.1.1 s= c=IN IP4 192.0.2.3 t=0 0 a=ice-pwd:asd88fgpdd777uzjYhagZg m=audio 45664 RTP/AVP 0 a=rtpmap:0 PCMU/8000 a=candidate:8hhY 1 UDP 1.0 10.0.1.1 8998 typ local a=candidate:Bzo8 1 UDP 0.7 192.0.2.3 45664 typ srflx raddr 10.0.1.1 rport 8998 This offer is received at agent R. Agent R will gather its server reflexive transport address (messages 6-7). Since R is not behind a NAT, this address is identical to its local transport address, and was obtained from its local transport address, and thus does not represent a separate candidate. It therefore ends up with a single local candidate with a single component for RTP. Its resulting answer looks like: v=0 o=bob 2808844564 2808844564 IN IP4 $R-PUB-1.IP s= c=IN IP4 $R-PUB-1.IP t=0 0 a=ice-pwd:$RPASS m=audio $R-PUB-1.PORT RTP/AVP 0 a=rtpmap:0 PCMU/8000 a=candidate:$R1 1 UDP 1.0 $R-PUB-1.IP $R-PUB-1.PORT typ local With the variables filled in: Rosenberg Expires December 28, 2006 [Page 71] Internet-Draft ICE June 2006 v=0 o=bob 2808844564 2808844564 IN IP4 192.0.2.1 s= c=IN IP4 192.0.2.1 t=0 0 a=ice-pwd:YH75Fviy6338Vbrhrlp8Yh m=audio 3478 RTP/AVP 0 a=rtpmap:0 PCMU/8000 a=candidate:9uB6 1 UDP 1.0 192.0.2.1 3478 typ local Next, agents L and R form candidate pairs, the candidate pair priority ordered list and transport address pair check ordered list. The candidate pair priority ordered list will have two entries, and be identical for L and R. The highest priority one will be the one containing L2 and R1 (since its the operating candidate pair), and the second one will be L1 and R1. The transport address pair check ordered list initially starts with two entries. For agent L, this will be L2:1:R1:1 and L1:1:R1:1. However, after the trimming operation, agent L will remove the second transport address pair, since it shares the same origination transport address as the first (L-PRIV-1 for both). However, R will keep both transport address pairs. Agent R begins its connectivity check (message 9) for transport address pair L2:1:R1:1 (note that, from its perspective, the transport address pair has the ID R1:1:L2:1, and this ID would appear in the USERNAME of STUN requests it receives). Since the NAT has a filtering policy of address dependent, the connectivity check is discarded. When agent L gets the answer, it begins its connectivity check for L2:1:R1:1 (messages 10-13), which succeed, placing the transport address pair and resulting candidate pair into the Recv-Valid state. L can now send media to R. When agent R receives the connectivity check (message 11), it is a match for the transport address pair, and the state of the transport address pair moves to Send-Valid. Agent R begins its connectivity checks (messages 14-17). When the check arrives at the NAT (message 14), it is permitted to pass since a permission was created towards R-PUB-1 as a consequence of message 10. This check arrives at agent L, which generates a success response (message 16), and updates the state of the transport address pair to Valid. This response arrives at agent R, which also updates the state of the transport address pair to Valid. Now, media can flow from agent R to agent L as well. 11.2. Advanced Example In this more advanced example, The NAT has address and port dependent Rosenberg Expires December 28, 2006 [Page 72] Internet-Draft ICE June 2006 mapping and filtering properties. Both agents use the STUN relay usage in addition to the binding discovery usage. As a consequence, agent L will end up with three candidates - a local candidate, a relayed candidate, and a server reflexive candidate. Agent R will have two - a local candidate and a relayed candidate (the server reflexive candidate will equal the local candidate and thus not be used). The agents are seeking to communicate using a single RTP- based voice stream, but are using RTCP. As a consequence, each candidate has two components - one for RTP and one for RTCP. L NAT STUN R | | | | | | | | | | | | |RTP Alloc. | | | | | | | | | | | | | | | |(1) Alloc Req | | | |S=L-PRIV-1 | | | |D=STUN-PUB-1 | | | |------------->| | | | | | | | | | | | |(2) Alloc Req | | | |S=NAT-PUB-1 | | | |D=STUN-PUB-1 | | | |------------->| | | |(3) Alloc Res | | | |S=STUN-PUB-1 | | | |D=NAT-PUB-1 | | | |R=STUN-PUB-2 | | | |MA=NAT-PUB-1 | | | |<-------------| | |(4) Alloc Res | | | |S=STUN-PUB-1 | | | |D=L-PRIV-1 | | | |R=STUN-PUB-2 | | | |MA=NAT-PUB-1 | | | |<-------------| | | | | | | | | | | | | | | |RTCP Alloc. | | | |Ta secs. later| | | | | | | | | | | Rosenberg Expires December 28, 2006 [Page 73] Internet-Draft ICE June 2006 | | | | |(5) Alloc Req | | | |S=L-PRIV-2 | | | |D=STUN-PUB-1 | | | |------------->| | | | | | | | | | | | |(6) Alloc Req | | | |S=NAT-PUB-2 | | | |D=STUN-PUB-1 | | | |------------->| | | |(7) Alloc Res | | | |S=STUN-PUB-1 | | | |D=NAT-PUB-2 | | | |R=STUN-PUB-3 | | | |MA=NAT-PUB-2 | | | |<-------------| | |(8) Alloc Res | | | |S=STUN-PUB-1 | | | |D=L-PRIV-2 | | | |R=STUN-PUB-3 | | | |MA=NAT-PUB-2 | | | |<-------------| | | | | | | | | | | | | | | | | | | |(9) Offer | | | |------------------------------------------->| | | | | | | | | | | | | | | | | | | | |RTP Alloc. | | | | | | | | | | | | | | |(10) Alloc Req| | | |S=R-PUB-1 | | | |D=STUN-PUB-1 | | | |<-------------| | | |(11) Alloc Res| | | |S=STUN-PUB-1 | | | |D=R-PUB-1 | | | |R=STUN-PUB-4 | | | |MA=R-PUB-1 | | | |------------->| | | | | Rosenberg Expires December 28, 2006 [Page 74] Internet-Draft ICE June 2006 | | | | | | | | | | | |RTCP Alloc. | | | |Ta secs. later | | | | | | | | | | | | | | |(12) Alloc Req| | | |S=R-PUB-2 | | | |D=STUN-PUB-1 | | | |<-------------| | | |(13) Alloc Res| | | |S=STUN-PUB-1 | | | |D=R-PUB-2 | | | |R=STUN-PUB-5 | | | |MA=R-PUB-2 | | | |------------->| | | | | | | | | | | | | | | | | |(14) answer | | | |<-------------------------------------------| | | | | | | | | | | | | | | | |Validate | | | |STUN-PUB-4 to STUN-PUB-2 | | | | | | | | | | |(15) Send Ind | | | |S=R-PUB-1 | | | |D=STUN-PUB-1 | | | |DA=STUN-PUB-2 | | | |<-------------| | | | | | | |Bind Req. | | | |S=STUN-PUB-4 | | | |D=STUN-PUB-2 | | | |U=L3:1:R2:1 | | | | | | | | | | | | | | | | | | | | | | | |Discard | | | | | | | | | Rosenberg Expires December 28, 2006 [Page 75] Internet-Draft ICE June 2006 | | | | | | | | |Validate | | | |STUN-PUB-2 to STUN-PUB-4 | | | | | | | | | | |(16) Send Ind | | | |S=L-PRIV-1 | | | |D=STUN-PUB-1 | | | |DA=STUN-PUB-4 | | | |------------->| | | | | | | | |(17) Send Ind | | | |S=NAT-PUB-1 | | | |D=STUN-PUB-1 | | | |DA=STUN-PUB-4 | | | |------------->| | | | | | | | |Bind Req. | | | |S=STUN-PUB-2 | | | |D=STUN-PUB-4 | | | |U=R2:1:L3:1 | | | | | | | | | | | |(18) Data Ind | | | |S=STUN-PUB-1 | | | |D=R-PUB-1 | | | |RA=STUN-PUB-2 | | | |------------->| | | |(19) Send Ind | | | |S=R-PUB-1 | | | |D=STUN-PUB-1 | | | |DA=STUN-PUB-2 | | | |MA=STUN-PUB-2 | | | |<-------------| | | | | | | |Bind Res. | | | |S=STUN-PUB-4 | | | |D=STUN-PUB-2 | | | |MA=STUN-PUB-2 | | | | | | |(20) Data Ind | | | |S=STUN-PUB-1 | | | |D=NAT-PUB-1 | | | |RA=STUN-PUB-4 | | | |MA=STUN-PUB-2 | | | |<-------------| | |(21) Data Ind | | | Rosenberg Expires December 28, 2006 [Page 76] Internet-Draft ICE June 2006 |S=STUN-PUB-1 | | | |D=L-PRIV-1 | | | |RA=STUN-PUB-4 | | | |MA=STUN-PUB-2 | | | |<-------------| | | | | | | | | | | | | | | | | | |Validate | | | |STUN-PUB-4 to STUN-PUB-2 | | | | | | | | | | |(22) Send Ind | | | |S=R-PUB-1 | | | |D=STUN-PUB-1 | | | |DA=STUN-PUB-2 | | | |<-------------| | | | | | | |Bind Req. | | | |S=STUN-PUB-4 | | | |D=STUN-PUB-2 | | | |U=L3:1:R2:1 | | | | | | | | | | |(23) Data Ind | | | |S=STUN-PUB-1 | | | |D=NAT-PUB-1 | | | |RA=STUN-PUB-4 | | | |<-------------| | | | | | |(24) Data Ind | | | |S=STUN-PUB-1 | | | |D=L-PRIV-1 | | | |RA=STUN-PUB-4 | | | |<-------------| | | |(25) Send Ind | | | |S=L-PRIV-1 | | | |D=STUN-PUB-1 | | | |DA=STUN-PUB-4 | | | |MA=STUN-PUB-4 | | | |------------->| | | | |(26) Send Ind | | | |S=NAT-PUB-1 | | | |D=STUN-PUB-1 | | | |DA=STUN-PUB-4 | | | |MA=STUN-PUB-4 | | | |------------->| | | | | | Rosenberg Expires December 28, 2006 [Page 77] Internet-Draft ICE June 2006 | | |Bind Res. | | | |S=STUN-PUB-2 | | | |D=STUN-PUB-4 | | | |MA=STUN-PUB-4 | | | | | | | |(27) Data Ind | | | |S=STUN-PUB-1 | | | |D=R-PUB-1 | | | |RA=STUN-PUB-2 | | | |MA=STUN-PUB-4 | | | |------------->| | | | | | | | | | | | | | | | |Validate | | | |STUN-PUB-5 to STUN-PUB-3 | | | | | | | | | | |(28) Send Ind | | | |S=R-PUB-2 | | | |D=STUN-PUB-1 | | | |DA=STUN-PUB-3 | | | |<-------------| | | | | | | |Bind Req. | | | |S=STUN-PUB-5 | | | |D=STUN-PUB-3 | | | |U=L3:2:R2:2 | | | | | | | | | | | | | | | | | | | | | | | |Discard | | | | | | | | | | | | | | | | | |Validate | | | |STUN-PUB-3 to STUN-PUB-5 | | | | | | | | | | |(29) Send Ind | | | |S=L-PRIV-2 | | | |D=STUN-PUB-1 | | | |DA=STUN-PUB-5 | | | |------------->| | | | | | | Rosenberg Expires December 28, 2006 [Page 78] Internet-Draft ICE June 2006 | |(30) Send Ind | | | |S=NAT-PUB-2 | | | |D=STUN-PUB-1 | | | |DA=STUN-PUB-5 | | | |------------->| | | | | | | | |Bind Req. | | | |S=STUN-PUB-3 | | | |D=STUN-PUB-5 | | | |U=R2:2:L3:2 | | | | | | | | | | | |(31) Data Ind | | | |S=STUN-PUB-1 | | | |D=R-PUB-2 | | | |RA=STUN-PUB-3 | | | |------------->| | | |(32) Send Ind | | | |S=R-PUB-2 | | | |D=STUN-PUB-1 | | | |DA=STUN-PUB-3 | | | |MA=STUN-PUB-3 | | | |<-------------| | | | | | | |Bind Res. | | | |S=STUN-PUB-5 | | | |D=STUN-PUB-3 | | | |MA=STUN-PUB-3 | | | | | | |(33) Data Ind | | | |S=STUN-PUB-1 | | | |D=NAT-PUB-2 | | | |RA=STUN-PUB-5 | | | |MA=STUN-PUB-3 | | | |<-------------| | |(34) Data Ind | | | |S=STUN-PUB-1 | | | |D=L-PRIV-2 | | | |RA=STUN-PUB-5 | | | |MA=STUN-PUB-3 | | | |<-------------| | | | | | | | | | | | | | | | | | |Validate | | | |STUN-PUB-5 to STUN-PUB-3 | | | | | | | | Rosenberg Expires December 28, 2006 [Page 79] Internet-Draft ICE June 2006 | | |(35) Send Ind | | | |S=R-PUB-2 | | | |D=STUN-PUB-1 | | | |DA=STUN-PUB-3 | | | |<-------------| | | | | | | |Bind Req. | | | |S=STUN-PUB-5 | | | |D=STUN-PUB-3 | | | |U=L3:2:R2:2 | | | | | | | | | | |(36) Data Ind | | | |S=STUN-PUB-1 | | | |D=NAT-PUB-2 | | | |RA=STUN-PUB-5 | | | |<-------------| | | | | | |(37) Data Ind | | | |S=STUN-PUB-1 | | | |D=L-PRIV-2 | | | |RA=STUN-PUB-5 | | | |<-------------| | | |(38) Send Ind | | | |S=L-PRIV-2 | | | |D=STUN-PUB-1 | | | |DA=STUN-PUB-5 | | | |MA=STUN-PUB-5 | | | |------------->| | | | |(39) Send Ind | | | |S=NAT-PUB-2 | | | |D=STUN-PUB-1 | | | |DA=STUN-PUB-5 | | | |MA=STUN-PUB-5 | | | |------------->| | | | | | | | |Bind Res. | | | |S=STUN-PUB-3 | | | |D=STUN-PUB-5 | | | |MA=STUN-PUB-5 | | | | | | | |(40) Data Ind | | | |S=STUN-PUB-1 | | | |D=R-PUB-2 | | | |RA=STUN-PUB-3 | | | |MA=STUN-PUB-5 | | | |------------->| | | | | Rosenberg Expires December 28, 2006 [Page 80] Internet-Draft ICE June 2006 | | | | | | | | | | | | |RTP flows | | | | | | | | | | | |(41) Send Ind | | | |S=L-PRIV-1 | | | |D=STUN-PUB-1 | | | |DA=STUN-PUB-4 | | | |------------->| | | | | | | | |(42) Send Ind | | | |S=NAT-PUB-1 | | | |D=STUN-PUB-1 | | | |DA=STUN-PUB-4 | | | |------------->| | | | | | | | | | | | |RTP | | | |S=STUN-PUB-2 | | | |D=STUN-PUB-4 | | | | | | | | | | | |(43) Data Ind | | | |S=STUN-PUB-1 | | | |D=R-PUB-1 | | | |RA=STUN-PUB-2 | | | |------------->| | | | | | | | | | | | | | | | | | | | |RTP flows | | | | | | | | | | |(44) Send Ind | | | |S=R-PUB-1 | | | |D=STUN-PUB-1 | | | |DA=STUN-PUB-2 | | | |<-------------| | | | | | | | | | | |RTP | | | |S=STUN-PUB-4 | | | |D=STUN-PUB-2 | | | | | | | | | Rosenberg Expires December 28, 2006 [Page 81] Internet-Draft ICE June 2006 | |(45) Data Ind | | | |S=STUN-PUB-1 | | | |D=NAT-PUB-1 | | | |RA=STUN-PUB-4 | | | |<-------------| | | | | | |(46) Data Ind | | | |S=STUN-PUB-1 | | | |D=L-PRIV-1 | | | |RA=STUN-PUB-4 | | | |<-------------| | | | | | | | | | | | | | | |Validate | | | |L-PRIV-1 to R-PUB-1 | | | | | | | | | | |(47) Bind Req.| | | |S=L-PRIV-1 | | | |D=R-PUB-1 | | | |U=R1:1:L1:1 | | | |------------->| | | | | | | | |(48) Bind Req.| | | |S=NAT-PUB-3 | | | |D=R-PUB-1 | | | |U=R1:1:L1:1 | | | |---------------------------->| | | | | | |(49) Bind Res.| | | |S=R-PUB-1 | | | |D=NAT-PUB-3 | | | |MA=NAT-PUB-3 | | | |<----------------------------| | | | | |(50) Bind Res.| | | |S=R-PUB-1 | | | |D=L-PRIV-1 | | | |MA-NAT-PUB-3 | | | |<-------------| | | | | | | | | | | | | | | | | | |Validate | | | |R-PUB-1 to L-PRIV-1 | | | | | | | | Rosenberg Expires December 28, 2006 [Page 82] Internet-Draft ICE June 2006 | |(51) Bind Req.| | | |S=R-PUB-1 | | | |D=L-PRIV-1 | | | |U=L1:1:R1:1 | | | |<----------------------------| | | | | | | | | | | | | | | | | | |Discard | | | | | | | | | | | | | | | | | | | | | |Validate | | | |R-PUB-2 to L-PRIV-2 | | | | | | | | | |(52) Bind Req.| | | |S=R-PUB-2 | | | |D=L-PRIV-2 | | | |U=L1:2:R1:2 | | | |<----------------------------| | | | | | | | | | | | | | | | | | |Discard | | | | | | | | | | | | | | | | | | |Validate | | | |L-PRIV-2 to R-PUB-2 | | | | | | | | | | |(53) Bind Req.| | | |S=L-PRIV-2 | | | |D=R-PUB-2 | | | |U=R1:2:L1:2 | | | |------------->| | | | | | | | |(54) Bind Req.| | | |S=NAT-PUB-4 | | | |D=R-PUB-2 | | | |U=R1:2:L1:2 | | | |---------------------------->| | | | | Rosenberg Expires December 28, 2006 [Page 83] Internet-Draft ICE June 2006 | |(55) Bind Res.| | | |S=R-PUB-2 | | | |D=NAT-PUB-4 | | | |MA=NAT-PUB-4 | | | |<----------------------------| | | | | |(56) Bind Res.| | | |S=R-PUB-2 | | | |D=L-PRIV-2 | | | |MA=NAT-PUB-4 | | | |<-------------| | | | | | | | | | | | | | | | | | |Validate | | | |R-PUB-1 to NAT-PUB-3 | | | | | | | | | |(57) Bind Req.| | | |S=R-PUB-1 | | | |D=NAT-PUB-3 | | | |U=L1R1:1:R1:1 | | | |<----------------------------| | | | | |(58) Bind Req.| | | |S=R-PUB-1 | | | |D=L-PRIV-1 | | | |U=L1R1:1:R1:1 | | | |<-------------| | | | | | | |(59) Bind Res.| | | |S=L-PRIV-1 | | | |D=R-PUB-1 | | | |MA=R-PUB-1 | | | |------------->| | | | | | | | |(60) Bind Res.| | | |S=NAT-PUB-3 | | | |D=R-PUB-1 | | | |MA=R-PUB-1 | | | |---------------------------->| | | | | | | | | | | | | | | | |Validate | | | |R-PUB-2 to NAT-PUB-4 | | | | | | | | Rosenberg Expires December 28, 2006 [Page 84] Internet-Draft ICE June 2006 | |(61) Bind Req.| | | |S=R-PUB-2 | | | |D=NAT-PUB-4 | | | |U=L1R1:2:R1:2 | | | |<----------------------------| | | | | |(62) Bind Req.| | | |S=R-PUB-2 | | | |D=L-PRIV-2 | | | |U=L1R1:2:R1:2 | | | |<-------------| | | | | | | |(63) Bind Res.| | | |S=L-PRIV-2 | | | |D=R-PUB-2 | | | |MA=R-PUB-2 | | | |------------->| | | | | | | | |(64) Bind Res.| | | |S=NAT-PUB-4 | | | |D=R-PUB-2 | | | |MA=R-PUB-2 | | | |---------------------------->| | | | | | | | | | | | | | | | | |(65) Offer | | | |------------------------------------------->| | | | | | | | | | | | | | | | | |(66) Answer | | | |<-------------------------------------------| | | | | | | | | | | | | | | | | | | | | | | | | Figure 17 First, agent L obtains both server reflexive and relayed transport addresses for its RTP packets, using a STUN Allocate request, which will provide it with both types of addresses (messages 1-4). Recall Rosenberg Expires December 28, 2006 [Page 85] Internet-Draft ICE June 2006 that the NAT has the address and port dependent mapping property. Here, it creates a binding of NAT-PUB-1 for this UDP request, and this becomes the server reflexive transport address for RTP. The relayed transport address is STUN-PUB-2, allocated by the STUN server. Agent L repeats this process for RTCP (messages 5-8) Ta seconds later, and obtains NAT-PUB-2 as its server reflexive transport address for RTCP and STUN-PUB-3 for its relayed transport address. With its three candidates, agent L prioritizes them, choosing the local candidate as highest priority, followed by the server reflexive candidate, followed by the relayed candidate. It chooses its relayed candidate as the operating candidate, and encodes it into the m/c- line. The resulting offer (message 17) looks like: v=0 o=jdoe 2890844526 2890842807 IN IP4 $L-PRIV-1.IP s= c=IN IP4 $STUN-PUB-2.IP t=0 0 a=ice-pwd:$LPASS m=audio $STUN-PUB-2.PORT RTP/AVP 0 a=rtpmap:0 PCMU/8000 a=rtcp:$STUN-PUB-3.PORT a=candidate:$L1 1 UDP 1.0 $L-PRIV-1.IP $L-PRIV-1.PORT a=candidate:$L1 2 UDP 1.0 $L-PRIV-2.IP $L-PRIV-2.PORT a=candidate:$L2 1 UDP 0.7 $NAT-PUB-1.IP $NAT-PUB-1.PORT a=candidate:$L2 2 UDP 0.7 $NAT-PUB-2.IP $NAT-PUB-2.PORT a=candidate:$L3 1 UDP 0.3 $STUN-PUB-2.IP $STUN-PUB-2.PORT a=candidate:$L3 2 UDP 0.3 $STUN-PUB-3.IP $STUN-PUB-3.PORT This offer is received at agent R. Agent R will gather its server reflexive and relayed transport addresses for RTP from an Allocate request (messages 10-11). Since the server reflexive transport address matches its local transport address, no separate candidate is used for it. The agent then gathers its server reflexive and relayed transport addresses for RTCP (messages 12-13). It prioritizes the local candidate with higher priority than the relayed candidate, and selects the relayed candidate as the operating candidate. Its resulting answer looks like: Rosenberg Expires December 28, 2006 [Page 86] Internet-Draft ICE June 2006 v=0 o=bob 2808844564 2808844564 IN IP4 $R-PUB-1.IP s= c=IN IP4 $STUN-PUB-4.IP t=0 0 a=ice-pwd:$RPASS m=audio $STUN-PUB-4.PORT RTP/AVP 0 a=rtpmap:0 PCMU/8000 a=rtcp:$STUN-PUB-5.PORT a=candidate:$R1 1 UDP 1.0 $R-PUB-1.IP $R-PUB-1.PORT a=candidate:$R1 2 UDP 1.0 $R-PUB-2.IP $R-PUB-2.PORT a=candidate:$R2 1 UDP 0.3 $STUN-PUB-4.IP $STUN-PUB-4.PORT a=candidate:$R2 2 UDP 0.3 $STUN-PUB-5.IP $STUN-PUB-5.PORT Next, agents L and R form candidate pairs and the transport address pair check ordered list. This list will start with the two components in the currently operating candidate pair - relayed candidates. Agent R begins its checks (message 15). It will check connectivity between the operating candidate pair, starting with the first component, which is STUN-PUB-4 for agent R and STUN-PUB-2 for agent L. The state machine for that transport address pair moves to the Testing state. Since this is a relayed transport address for agent R, it utilizes the STUN Send Indication to deliver the Binding Request. The DESTINATION-ADDRESS is STUN-PUB-2. The STUN server will extract the content of the Send indication, which is a STUN Binding Request, and deliver it to the destination, STUN-PUB-4. This request will be sent from the relayed address allocated to R, which is STUN-PUB-4. As both interfaces are on the STUN server, this message is sent to itself (and thus the lack of a message number in the sequence diagram above). Note that the USERNAME in the Binding Request is L3:1:R2:1, which represents the transport address pair ID. This message gets discarded by the STUN server since, as of yet, there are no permissions established for the STUN-PUB-2 allocation. However, it did have the side effect of establishing a permission on the STUN-PUB-4 binding, allowing incoming packets from STUN-PUB-2. Once L gets the offer, it will attempt to validate the first transport address pair in the transport address pair check ordered list, which will be the operating candidate. The state machine for this transport address pair moves into the Testing state. Like agent R did, it will use the STUN Send Indication to send a STUN Binding Request from its relayed transport address, STUN-PUB-2, to STUN-PUB-4 (message 16). This packet traverses the NAT (message 17) and arrives at the STUN server. The STUN server will unwrap the contents of the packet and send them from STUN-PUB-2 to STUN-PUB-4. It will also, as a consequence, add a permission for STUN-PUB-4. The contents of the Rosenberg Expires December 28, 2006 [Page 87] Internet-Draft ICE June 2006 packet are a STUN Binding Request with USERNAME R2:1:L3:1 (note how this is the flip of the USERNAME in the Binding Request sent by agent R). This is also a packet from the STUN server to itself. However, now, the packet is not discarded, as a permission had been installed as a consequence of the "suicide packet" from agent R (a suicide packet is a packet that has no hope of traversing a far end NAT, but serves the purpose of enabling a permission in a near end NAT so that a packet from the peer can be returned). Thus, the STUN server will relay the received STUN request towards agent R (message 18). This is delivered as a STUN Data Indication. Notice how the REMOTE- ADDRESS is STUN-PUB-2; this is important as it will be used to construct the STUN Binding Response. Agent R will receive the Data Indication, and unwrap its contents to find the Binding Request. The state machine for this transport address pair is currently in the Testing state. It therefore moves into the Send-Valid state, and it generates a Binding Response. However, the XOR-MAPPED-ADDRESS in the Binding Response is constructed using the source IP address and port that were seen by the STUN server when the Binding Request arrived at STUN-PUB-4, which is the looped message between messages 17 and 18. This source address is STUN-PUB-2, which is the value of the REMOTE-ADDRESS attribute in message 18. Thus, the STUN Binding Response will contain STUN-PUB-2 in the XOR-MAPPED-ADDRESS, and is to be sent to STUN-PUB-2. To send the response, agent R takes the STUN Binding Response and encapsulates it in a STUN Send indication, setting the DESTINATION-ADDRESS to STUN-PUB-2. This is shown in message 19. The STUN server will receive this Send Indication, and unwrap its contents to find the STUN Binding Response. It sends it to the value of the DESTINATION-ADDRESS attribute, and sends it from the relayed address allocated to R, which is STUN-PUB-4. This, once again, results in a looped message to itself, and it arrives at STUN-PUB-2. Now, however, there is a permission installed for STUN-PUB-4. The STUN server will therefore forward the packet to agent L. To do so, it constructs a STUN Data Indication containing the contents of the packet. It sets the REMOTE-ADDRESS to the source transport address of the request it received (STUN-PUB-4), and forwards it to agent L (message 20). This traverses the NAT (message 21) and arrives at agent L. As a consequence of the receipt of a Binding Response, the state machine for this transport address pair moves to the Recv-Valid state. The agent also examines the XOR-MAPPED-ADDRESS of the STUN response. It indicates STUN-PUB-2. This is the same as the native transport address of this transport address pair, and thus doesn't represent a new transport address that might have been learned. Because of the receipt of message 18, the transport address pair moved from Testing to Send-Valid, causing R to attempt a Rosenberg Expires December 28, 2006 [Page 88] Internet-Draft ICE June 2006 retransmission of its STUN Binding Request that was lost (the contents of message 15 that were discarded by the STUN server due to lack of permission). This time, however, a permission has been installed and the retransmission will work. So, it sends the Binding Request again (message 22, identical to message 15). This is looped by the STUN server to itself again, but this time there is a permission in place when it arrives at STUN-PUB-2. As such, the request is forwarded towards agent L this time, in a STUN Data Indication (message 23). This traverses the NAT (message 24) and arrives at agent L. Agent L extracts the contents of the request, which are a STUN Binding Request. This causes the state machine to move from Recv-Valid to Valid. It generates a STUN Binding Response, and sets the XOR-MAPPED-ADDRESS based on the value of the REMOTE- ADDRESS in message 24 (STUN-PUB-4). This Binding Response is sent to STUN-PUB-4, which is accomplished through a STUN Send Indication (message 25). This Send Indication traverses the NAT (message 26) and is received by the STUN server. Its contents are decapsulated, and sent to STUN-PUB-4, which is again a loop on the same host. This packet is then sent towards agent R in a Data Indication (message 27). The contents of the DATA Indication are extracted, and the agent sees a successful Binding Response. It therefore moves the state machine from the Send-Valid state to the Valid state. At this point, the transport address pair is in the Valid state for both agents. Approximately Tb seconds after agent R sent message 15, agent R will start checks for the next transport address pair in its transport address pair check ordered list. This is the second component of the same candidate pair, used for RTCP. This sequence, messages 28 through 40, are identical to the ones for RTP, but differ only in the specific transport addresses. Once that validation happens, the second transport address pair has been validated. The candidate pair moves into the valid state, and both candidates are considered valid. The operating candidate has now been validated, and media can begin to flow. It will do so through the STUN server; indeed, it is relayed "twice" through the STUN server. Even though there is a single STUN server, it is logically acting as two separate STUN servers. Indeed, had L and R used two separate STUN servers, media would be relayed through both STUN servers in a trapezoid configuration. The actual media flows are shown as well. It is important to note that, since the ICE checks have not yet concluded on the candidate that will ultimately be used, no STUN Set Active Destinations have been sent. As a consequence, media that is sent through the STUN servers has to be sent using STUN Send indications. This introduces some overhead, but is a transient condition. In message 41, agent L Rosenberg Expires December 28, 2006 [Page 89] Internet-Draft ICE June 2006 sends an RTP packet to agent R using a Send indication. It is sent to STUN-PUB-4. This traverses the NAT (message 42), and arrives at the STUN server. It is decapsulated, looped to itself, and arrives at STUN-PUB-4. From there, it is encapsulated in a Data Indication and sent to agent R (message 43). In the reverse direction, agent R will send an RTP packet using a STUN Send indication (message 42), and send it to STUN-PUB-2. This is received by the STUN server, decapsulated, and sent to STUN-PUB-2 from STUN-PUB-4. This is again a loop within the same host, arriving at STUN-PUB-4. The contents of the packet are sent to agent L through a STUN Data Indication (message 45), which traverses the NAT (message 46) to arrive at agent L. Since this call flow is already long enough, RTCP packet transmission is not shown. Approximately Tb seconds after it sends message 29, agent L goes to the next transport address pair in its transport address pair check ordered list that is in the Waiting state. This will be the RTP candidate for the top priority candidate pair, which is L-PRIV-1 on agent L and R-PUB-1 on agent R. This is a local candidate for each agent. To perform the check, agent L sends a STUN Binding Request from L-PRIV-1 to R-PUB-1 (message 47). Note the USERNAME of R1:1:L1:1, which identifies this transport address pair. This traverses the NAT (message 48). Since the NAT has the address and port dependent mapping property, and this is a new destination IP address, the NAT allocates a new transport address on its public side, NAT-PUB-3, and places this in the source IP address and port. This packet arrives at agent R. Agent R finds a matching transport address pair in the Waiting state. The state machine transitions to the Send-Valid state. It sends the Binding response, with a XOR- MAPPED-ADDRESS indicating NAT-PUB-3 (message 49), which traverses the NAT and arrives at agent L (message 50). Agent R, in addition to sending the response, will also send a Binding Request. It is important to remember that this Binding Request is sent to the remote address in the transport address pair (L-PRIV-1), and NOT to the source IP address and port of the Binding Request (NAT-PUB-3); that will happen later. This attempt is shown in message 51. However, since the L-PRIV-1 is private, the packet is discarded in the network. Now, as a consequence of receiving message 48, agent R will have constructed a peer-derived candidate. The candidate ID for this candidate is L1R1, and it initially contains a single transport address pair, NAT-PUB-3 and R-PUB-1. However, the candidate isn't yet usable until the other component gets added. Similarly, agent L will have constructed the same peer-derived candidate, with the same candidate ID and the same transport address pair. Some Tb seconds after sending message 28, agent R will move to the Rosenberg Expires December 28, 2006 [Page 90] Internet-Draft ICE June 2006 next transport address pair in the transport address pair check ordered list whose state is Waiting. This is the RTCP component of the highest priority candidate pair. It will attempt a connectivity check, from R-PUB-2 to L-PRIV-2 (message 52). Since L-PRIV-1 is private, this message is discarded. Some Tb seconds after sending message 47, agent L will move to the next transport address pair in the transport address pair check ordered list whose state is Waiting. This is the RTCP component of the highest priority candidate pair. It will attempt a connectivity check, from L-PRIV-2 to R-PUB-2 (message 53), which operates nearly identically to messages 47-50, with the exception of the specific addresses. Here, the NAT will create a new binding for the RTCP, NAT-PUB-4, and this transport address is new for both participants. On receipt of this Binding Request at agent R (message 54), agent R constructs the candidate ID for the peer-derived candidate, L1R1, and finds it already exists. As such, this new transport address is added, and the peer-derived candidate becomes complete and usable. Agent L does the same thing on receipt of message 56. This candidate will have the same priority as its generating candidate L1 (1.0), and is paired up with R1 (also at priority 1.0). Since L1R1 has the same priority as L1 itself, the ordering algorithm in Section 7.5 will use the reverse ASCII sort order of the candidate ID iself to determine order. L1R1 is larger than L1, so that the peer-derived candidate will come before its generating candidate. As a consequence, the peer-derived candidate pair will have a higher priority than its generating candidate, and appear just before it in the candidate pair priority ordered list. As a consequence, after agent R sends message 55 and completes the peer-derived candidate, it will move the two transport addresses in the peer derived candidate into the Send-Valid state, and send a Binding Request for each in rapid succession (agent L will have moved both into the Recv-Valid state upon receipt of message 56). The first of these connectivity checks are for the RTP component, from R-PUB-1 to NAT-PUB-3 (message 57). Note the USERNAME in the STUN Binding Request, L1R1:1:R1:1, which identifies the peer-derived transport address pair. This will succesfully traverse the NAT and be delivered to agent L (message 58). The receipt of this request moves the state machine for this transport address pair from Recv- Valid to Valid, and a Binding Response is sent (message 59). This passes through the NAT and arrives at agent R (message 60). This causes its state machine to enter the Valid state as well. The reflexive transport address, R-PUB-1, is not new to agent R and thus does not result in the creation of a new peer-derived candidate. Messages 61 through 64 show the same basic flow for RTCP. Upon receipt of message 64, both transport address pairs are Valid at both Rosenberg Expires December 28, 2006 [Page 91] Internet-Draft ICE June 2006 agents, causing the peer derived candidate to become valid. Timer Tws is set at agent L, and fires without any higher priority candidate pairs becoming validated. At agent R, media can now be sent on this candidate pair from answerer (agent R) to offerer (agent L). Agent L sends an updated offer to promote the peer-derived candidate to operating. This offer (message 65) looks like: v=0 o=jdoe 2890844526 2890842808 IN IP4 $L-PRIV-1.IP s= c=IN IP4 $NAT-PUB-3.IP t=0 0 a=ice-pwd:$LPASS m=audio $NAT-PUB-3.PORT RTP/AVP 0 a=rtpmap:0 PCMU/8000 a=rtcp:$NAT-PUB-4.PORT a=remote-candidate:R1 a=candidate:$L1 1 UDP 1.0 $L-PRIV-1.IP $L-PRIV-1.PORT a=candidate:$L1 2 UDP 1.0 $L-PRIV-2.IP $L-PRIV-2.PORT There are several important things to note in this offer. Firstly, note how the m/c-line now contains NAT-PUB-3 and NAT-PUB-4, the peer derived transport addresses it learned through the ICE processing. Secondly, note how there remains a candidate encoded into the a=candidate attributes. This is candidate L1, NOT candidate L1R1. Recall that the peer-derived candidates are never encoded into the SDP. Rather, their generating candidate is encoded. This will cause keepalives to take place for the generating candidate if valid (though its not) and any of its derived candidates, which is what we want. Finally, notice the inclusion of the a=remote-candidate attribute. Since agent L doesn't know whether agent R received messages 60 or 64, it doesnt know whether the state of the candidate is Send-Valid or Valid at agent R. So, it has to tell agent R that, in case its Send-Valid, to please use it anyway. The answer generated by agent R looks like: v=0 o=bob 2808844564 2808844565 IN IP4 $R-PUB-1.IP s= c=IN IP4 $R-PUB-1.IP t=0 0 a=ice-pwd:$RPASS m=audio $R-PUB-1.PORT RTP/AVP 0 a=rtpmap:0 PCMU/8000 a=rtcp:$R-PUB-2.PORT Rosenberg Expires December 28, 2006 [Page 92] Internet-Draft ICE June 2006 a=candidate:$R1 1 UDP 1.0 $R-PUB-1.IP $R-PUB-1.PORT a=candidate:$R1 2 UDP 1.0 $R-PUB-2.IP $R-PUB-2.PORT With this, media can now flow directly between endpoints. The removal of the relayed candidates from the offer/answer exchange will cause the STUN relay allocations to be removed. 12. Grammar This specification defines three new SDP attributes - the "candidate", "remote-candidate" and "ice-pwd" attributes. The candidate attribute is a media-level attribute only. It contains a transport address for a candidate that can be used for connectivity checks. There may be multiple candidate attributes in a media block. There is no requirement that a=candidate attribute which indicate components for the same candidate appear one right after the other or in component ID order. The syntax of this attribute is defined using Augmented BNF as defined in RFC 4234 [9]: candidate-attribute = "candidate" ":" candidate-id SP component-id SP transport SP qvalue SP ;qvalue from RFC 3261 connection-address SP ;from RFC 4566 port ;port from RFC 4566 [SP cand-type] [SP rel-addr] [SP rel-port] *(SP extension-att-name SP extension-att-value) transport = "UDP" / transport-extension transport-extension = token ; from RFC 3261 candidate-id = 1*base64-char base64-char = ALPHA / DIGIT / "+" / "/" component-id = 1*DIGIT cand-type = "typ" SP candidate-types candidate-types = "local" / "srflx" / "relay" / token rel-addr = "raddr" SP connection-address rel-port = "rport" SP port extension-att-name = byte-string ;from RFC 4566 extension-att-value = byte-string Rosenberg Expires December 28, 2006 [Page 93] Internet-Draft ICE June 2006 The candidate-id is used to group together the transport addresses for a particular candidate. It MUST be constructed with at least 24 bits of randomness. It MUST have the same value for all transport addresses within the same candidate. It MUST have a different value for transport addresses within different candidates for the same media stream. The candidate-id uses a syntax that is defined to be equal to the base64 alphabet [3], which allows the candidate-id to be generated by performing a base64 encoding of a randomly generated value (note, however, that this does not mean that the candidate-id or password is base64 decoded when use in STUN messages). In addition, if content is base64 encoded to generate the candidate-id, it MUST NOT be padded with '='. Section 2.2 of RFC 3548 indicates that some base64 usages do not require padding, and it requests that such usages call out that fact. ICE is one such usage. This is because the data is never decoded. The component-id is a positive integer, which identifies the specific component of the candidate. It MUST start at 1 and MUST increment by 1 for each component of a particular candidate. The addr production is taken from [10], allowing for IPv4 addresses, IPv6 addresses and FQDNs. The port production is taken from RFC 4566 [5]. The token production is taken from RFC 3261 [2]. The transport production indicates the transport protocol for the candidate. This specification only defines UDP. However, extensibility is provided to allow for future transport protocols to be used with ICE, such as TCP or the Datagram Congestion Control Protocol (DCCP) [30]. The cand-type production encodes the type of transport address. This specification defines the values "local" for a local transport address, "srflx" for a server reflexive transport address, and "relay" for a relayed transport address. The set of candidate types is extensible for the future. Note that there is no value defined for peer reflexive transport addresses. This is because these transport addresses are never carried in the SDP itself; they are learned implicitly through connectivity checks. Inclusion of the candidate type is optional. The rel-addr and rel-port productions convey information on related transport addresses. For a server reflexive transport address, the rel-addr and rel-port contain the associated local transport address. For a relayed transport address, the rel-addr and rel-port contain the server reflexive transport address towards the relay. If rel- addr is present, rel-port MUST be present, and if rel-port is present, rel-addr MUST be present. If the candidate type is "local", rel-addr and rel-port MUST NOT be present. If the candidate type is "srflx" or "relayed", both rel-addr and rel-port MUST be present. The a=candidate attribute can itself be extended. The grammar allows Rosenberg Expires December 28, 2006 [Page 94] Internet-Draft ICE June 2006 for new name/value pairs to be added at the end of the attribute. An implementation MUST ignore any name/value pairs it doesn't understand. The syntax of the "remote-candidate" attribute is defined using Augmented BNF as defined in RFC 4234 [9]: remote-candidate-att = "remote-candidate" ":" candidate-id This attribute MUST be present in an offer when the candidate in the m/c-line is part of a candidate pair that is in the valid or partially valid state. The syntax of the "ice-pwd" attribute is defined as: ice-pwd-att = "ice-pwd" ":" password password = 1*base64-char The "ice-pwd" attribute can appear at either the session-level or media-level. When present in both, the value in the media-level takes precedence. Thus, the value at the session level is effectively a default that applies to all media streams, unless overriden by a media-level value. It MUST have at least 128 bits of randomness. Like the candidate ID, its syntax is taken from the base64 alphabet, allowing the password to be generted from a base64 encoding of a 128 bit value. In addition, if content is base64 encoded to generate the candidate ID, it MUST NOT be padded with '='. 13. Security Considerations There are several types of attacks possible in an ICE system. This section considers these attacks and their countermeasures. 13.1. Attacks on Connectivity Checks An attacker might attempt to disrupt the STUN-based connectivity checks. Ultimately, all of these attacks fool an agent into thinking something incorrect about the results of the connectivity checks. The possible false conclusions an attacker can try and cause are: False Invalid: An attacker can fool a pair of agents into thinking a candidate pair is invalid, when it isn't. This can be used to cause an agent to prefer a different candidate (such as one injected by the attacker), or to disrupt a call by forcing all candidates to fail. Rosenberg Expires December 28, 2006 [Page 95] Internet-Draft ICE June 2006 False Valid: An attacker can fool a pair of agents into thinking a candidate pair is valid, when it isn't. This can cause an agent to proceed with a session, but then not be able to receive any media. False Peer-Derived Candidate: An attacker can cause an agent to discover a new peer-derived candidate, when it shouldn't have. This can be used to redirect media streams to a DoS target or to the attacker, for eavesdropping or other purposes. False Valid on False Candidate: An attacker has already convinced an agent that there is a candidate with an address that doesn't actually route to that agent (for example, by injecting a false peer-derived candidate or false STUN-derived candidate). It must then launch an attack that forces the agents to believe that this candidate is valid. Of the various techniques for creating faked STUN messages described in [12], many are not applicable for the connectivity checks. Compromises of STUN servers are not much of a concern, since the STUN servers are embedded in endpoints and distributed throughout the network. Thus, compromising the STUN server is equivalent to comprimising the endpoint, and if that happens, far more problematic attacks are possible than those against ICE. Similarly, DNS attacks are irrelevant since STUN servers are not discovered via DNS, they are signaled via SIP. Injection of fake responses and relaying modified requests all can be handled in ICE with the countermeasures discussed below. To force the false invalid result, the attacker has to wait for the connectivity check for one of the agents to be sent. When it is, the attacker needs to inject a fake response with an unrecoverable error response, such as a 600. This attack only needs to be launched against one of the agents in order to invalidate the candidate pair. However, since the candidate is, in fact, valid, the original request may reach the peer agent, and result in a success response. The attacker needs to force this packet or its response to be dropped, through a DoS attack, layer 2 network disruption, or other technique. If it doesn't do this, the success response will also reach the originator, alerting it to a possible attack. This will cause the agent to abandon the candidate, which is the desired result in any case. Fortunately, this attack is mitigated completely through the STUN message integrity mechanism. The attacker needs to inject a fake response, and in order for this response to be processed, the attacker needs the password. If the offer/answer signaling is secured, the attacker will not have the password. Forcing the fake valid result works in a similar way. The agent Rosenberg Expires December 28, 2006 [Page 96] Internet-Draft ICE June 2006 needs to wait for the Binding Request from each agent, and inject a fake success response. The attacker won't need to worry about disrupting the actual response since, if the candidate is not valid, it presumably wouldn't be received anyway. However, like the fake invalid attack, this attack is mitigated completely through the STUN message integrity and offer/answer security techniques. Forcing the false peer-derived candidate result can be done either with fake requests or responses, or with replays. We consider the fake requests and responses case first. It requires the attacker to send a Binding Request to one agent with a source IP address and port for the false transport address. In addition, the attacker must wait for a Binding Request from the other agent, and generate a fake response with a XOR-MAPPED-ADDRESS attribute. This attack is best launched against a candidate pair that is likely to be invalid, so the attacker doesnt need to contend with the actual responses to the real connectivity checks. Like the other attacks described here, this attack is mitigated by the STUN message integrity mechanisms and secure offer/answer exchanges. Forcing the false peer-derived candidate result with packet replays is different. The attacker waits until one of the agents sends a Binding Request for one of the transport address pairs. It then intercepts this request, and replays it towards the other agent with a faked source IP address. It must also prevent the original request from reaching the remote agent, either by launching a DoS attack to cause the packet to be dropped, or forcing it to be dropped using layer 2 mechanisms. The replayed packet is received at the other agent, and accepted, since the integrity check passes (the integrity check cannot and does not cover the source IP address and port). It is then responded to. This response will contain a XOR-MAPPED- ADDRESS with the false transport address. It is passed to the this false address. The attacker must then intercept it and relay it towards the originator. The other agent will then initiate a connectivity check towards that transport address. This validation needs to succeed. This requires the attacker to force a false valid on a false candidate. Injecting of fake requests or responses to achieve this goal is prevented using the integrity mechanisms of STUN and the offer/answer exchange. Thus, this attack can only be launched through replays. To do that, the attacker must intercept the Binding Request towards this false transport address, and replay it towards the other agent. Then, it must intercept the response and replay that back as well. This attack is very hard to launch unless the attacker themself is identified by the fake transport address. This is because it requires the attacker to intercept and replay packets sent by two Rosenberg Expires December 28, 2006 [Page 97] Internet-Draft ICE June 2006 different hosts. If both agents are on different networks (for example, across the public Internet), this attack can be hard to coordinate, since it needs to occur against two different endpoints on different parts of the network at the same time. If the attacker themself is identified by the fake transport address, the attack is easier to coordinate. However, if SRTP is used [22], the attacker will not be able to play the media packets, they will only be able to discard them, effectively disabling the media stream for the call. However, this attack requires the agent to disrupt packets in order to block the connectivity check from reaching the target. In that case, if the goal is to disrupt the media stream, its much easier to just disrupt it with the same mechanism, rather than attack ICE. 13.2. Attacks on Address Gathering ICE endpoints make use of STUN for gathering addresses from a STUN server in the network. This is corresponds to the binding acquisition use case discussed in Section 10.1 of [12]. As a consequence, the attacks against STUN itself that are described in Section 12 [12] can still be used against the STUN address gathering operations that occur in ICE. However, the additional mechanisms provided by ICE actually counteract such attacks, making binding acquisition with STUN more secure when combined with ICE than without ICE. Consider an attacker which is able to provide an agent with a faked XOR-MAPPED-ADDRESS in a STUN Binding Request that is used for address gathering. This is the primary attack primitive described in Section 12 of [12]. This address will be used as a STUN derived candidate in the ICE exchange. For this candidate to actually be used for media, the attacker must also attack the connectivity checks, and in particular, force a false valid on a false candidate. This attack is very hard to launch if the false address identifies a third party, and is prevented by SRTP if it identifies the attacker themself. If the attacker elects not to attack the connectivity checks, the worst it can do is prevent the STUN-derived address from being used. However, if the peer agent has at least one address that is reachable by the agent under attack, the STUN connectivity checks themselves will provide a STUN-derived address that can be used for the exchange of media. Peer derived candidates are preferred over the candidate they are generated from for this reason. As such, an attack solely on the STUN address gathering will normally have no impact on a call at all. Rosenberg Expires December 28, 2006 [Page 98] Internet-Draft ICE June 2006 13.3. Attacks on the Offer/Answer Exchanges An attacker that can modify or disrupt the offer/answer exchanges themselves can readily launch a variety of attacks with ICE. They could direct media to a target of a DoS attack, they could insert themselves into the media stream, and so on. These are similar to the general security considerations for offer/answer exchanges, and the security considerations in RFC 3264 [4] apply. These require techniques for message integrity and encryption for offers and answers, which are satisfied by the SIPS mechanism [2] when SIP is used. As such, the usage of SIPS with ICE is RECOMMENDED. 13.4. Insider Attacks In addition to attacks where the attacker is a third party trying to insert fake offers, answers or stun messages, there are several attacks possible with ICE when the attacker is an authenticated and valid participant in the ICE exchange. 13.4.1. The Voice Hammer Attack The voice hammer attack is an amplification attack. In this attack, the attacker initiates sessions to other agents, and includes the IP address and port of a DoS target in the m/c-line of their SDP. This causes substantial amplification; a single offer/answer exchange can create a continuing flood of media packets, possibly at high rates (consider video sources). This attack is not speific to ICE, but ICE can help provide remediation. Specifically, if ICE is used, the agent receiving the malicious SDP will first peform connectivity checks to the target of media before sending it there. If this target is a third party host, the checks will not succeed, and media is never sent. Unfortunately, ICE doesn't help if its not used, in which case an attacker could simply send the offer without the ICE parameters. However, in environments where the set of clients are known, and limited to ones that support ICE, the server can reject any offers or answers that don't indicate ICE support. 13.4.2. STUN Amplification Attack The STUN amplification attack is similar to the voice hammer. However, instead of voice packets being directed to the target, STUN connectivity checks are directed to the target. This attack is accomplished by having the offerer send an offer with a large number of candidates, say 50. The answerer receives the offer, and starts its checks, which are directed at the target, and consequently, never Rosenberg Expires December 28, 2006 [Page 99] Internet-Draft ICE June 2006 generate a response. The answerer will start a new connectivity check every 50ms, and each check is a STUN transaction consisting of 9 retransmits of a message 64 bytes in length. This produces a fairly substantial 92 kbps, just in STUN requests. It is impossible to eliminate the amplification, but the volume can be reduced through a variety of heuristics. For example, agents can limit the number of candidates they'll accept in an offer or answer, they can increase the value of Tb, or exponentially increase Tb as time goes on. All of these ultimately trade off the time for the ICE exchanges to complete, with the amount of traffic that gets sent. 14. IANA Considerations This specification defines three new SDP attribute per the procedures of Section 8.2.4 of [5]. The required information for the registrations are included here. 14.1. candidate Attribute Contact Name: Jonathan Rosenberg, jdrosen@jdrosen.net. Attribute Name: candidate Long Form: candidate Type of Attribute: media level Charset Considerations: The attribute is not subject to the charset attribute. Purpose: This attribute is used with Interactive Connectivity Establishment (ICE), and provides one of many possible candidate addresses for communication. These addresses are validated with an end-to-end connectivity check using Simple Traversal of UDP with NAT (STUN). Appropriate Values: See Section 12 of RFC XXXX [Note to RFC-ed: please replace XXXX with the RFC number of this specification]. 14.2. remote-candidate Attribute Contact Name: Jonathan Rosenberg, jdrosen@jdrosen.net. Rosenberg Expires December 28, 2006 [Page 100] Internet-Draft ICE June 2006 Attribute Name: remote-candidate Long Form: remote-candidate Type of Attribute: media level Charset Considerations: The attribute is not subject to the charset attribute. Purpose: This attribute is used with Interactive Connectivity Establishment (ICE), and provides the identity of the remote candidate that the offerer wishes the answerer to use in its answer. Appropriate Values: See Section 12 of RFC XXXX [Note to RFC-ed: please replace XXXX with the RFC number of this specification]. 14.3. ice-pwd Attribute Contact Name: Jonathan Rosenberg, jdrosen@jdrosen.net. Attribute Name: ice-pwd Long Form: ice-pwd Type of Attribute: session level Charset Considerations: The attribute is not subject to the charset attribute. Purpose: This attribute is used with Interactive Connectivity Establishment (ICE), and provides the password used to protect STUN connectivity checks. Appropriate Values: See Section 12 of RFC XXXX [Note to RFC-ed: please replace XXXX with the RFC number of this specification]. 15. IAB Considerations The IAB has studied the problem of "Unilateral Self Address Fixing", which is the general process by which a agent attempts to determine its address in another realm on the other side of a NAT through a collaborative protocol reflection mechanism [20]. ICE is an example of a protocol that performs this type of function. Interestingly, the process for ICE is not unilateral, but bilateral, and the difference has a signficant impact on the issues raised by IAB. The IAB has mandated that any protocols developed for this purpose Rosenberg Expires December 28, 2006 [Page 101] Internet-Draft ICE June 2006 document a specific set of considerations. This section meets those requirements. 15.1. Problem Definition From RFC 3424 any UNSAF proposal must provide: Precise definition of a specific, limited-scope problem that is to be solved with the UNSAF proposal. A short term fix should not be generalized to solve other problems; this is why "short term fixes usually aren't". The specific problems being solved by ICE are: Provide a means for two peers to determine the set of transport addresses which can be used for communication. Provide a means for resolving many of the limitations of other UNSAF mechanisms by wrapping them in an additional layer of processing (the ICE methodology). Provide a means for a agent to determine an address that is reachable by another peer with which it wishes to communicate. 15.2. Exit Strategy From RFC 3424, any UNSAF proposal must provide: Description of an exit strategy/transition plan. The better short term fixes are the ones that will naturally see less and less use as the appropriate technology is deployed. ICE itself doesn't easily get phased out. However, it is useful even in a globally connected Internet, to serve as a means for detecting whether a router failure has temporarily disrupted connectivity, for example. ICE also helps prevent certain security attacks which have nothing to do with NAT. However, what ICE does is help phase out other UNSAF mechanisms. ICE effectively selects amongst those mechanisms, prioritizing ones that are better, and deprioritizing ones that are worse. Local IPv6 addresses can be preferred. As NATs begin to dissipate as IPv6 is introduced, derived transport addresses from other UNSAF mechanisms simply never get used, because higher priority connectivity exists. Therefore, the servers get used less and less, and can eventually be remove when their usage goes to zero. Indeed, ICE can assist in the transition from IPv4 to IPv6. It can be used to determine whether to use IPv6 or IPv4 when two dual-stack hosts communicate with SIP (IPv6 gets used). It can also allow a Rosenberg Expires December 28, 2006 [Page 102] Internet-Draft ICE June 2006 network with both 6to4 and native v6 connectivity to determine which address to use when communicating with a peer. 15.3. Brittleness Introduced by ICE From RFC3424, any UNSAF proposal must provide: Discussion of specific issues that may render systems more "brittle". For example, approaches that involve using data at multiple network layers create more dependencies, increase debugging challenges, and make it harder to transition. ICE actually removes brittleness from existing UNSAF mechanisms. In particular, traditional STUN (as described in [14]) has several points of brittleness. One of them is the discovery process which requires a agent to try and classify the type of NAT it is behind. This process is error-prone. With ICE, that discovery process is simply not used. Rather than unilaterally assessing the validity of the address, its validity is dynamically determined by measuring connectivity to a peer. The process of determining connectivity is very robust. Another point of brittleness in STUN and any other unilateral mechanism is its absolute reliance on an additional server. ICE makes use of a server for allocating unilateral addresses, but allows agents to directly connect if possible. Therefore, in some cases, the failure of a STUN server would still allow for a call to progress when ICE is used. Another point of brittleness in traditional STUN is that it assumes that the STUN server is on the public Internet. Interestingly, with ICE, that is not necessary. There can be a multitude of STUN servers in a variety of address realms. ICE will discover the one that has provided a usable address. The most troubling point of brittleness in traditional STUN is that it doesn't work in all network topologies. In cases where there is a shared NAT between each agent and the STUN server, traditional STUN may not work. With ICE, that restriction can be lifted. Traditional STUN also introduces some security considerations. Fortunately, those security considerations are also mitigated by ICE. Consequently, ICE serves to repair the brittleness introduced in other UNSAF mechanisms, and does not introduce any additional brittleness into the system. Rosenberg Expires December 28, 2006 [Page 103] Internet-Draft ICE June 2006 15.4. Requirements for a Long Term Solution From RFC 3424, any UNSAF proposal must provide: Identify requirements for longer term, sound technical solutions -- contribute to the process of finding the right longer term solution. Our conclusions from STUN remain unchanged. However, we feel ICE actually helps because we believe it can be part of the long term solution. 15.5. Issues with Existing NAPT Boxes From RFC 3424, any UNSAF proposal must provide: Discussion of the impact of the noted practical issues with existing, deployed NA[P]Ts and experience reports. A number of NAT boxes are now being deployed into the market which try and provide "generic" ALG functionality. These generic ALGs hunt for IP addresses, either in text or binary form within a packet, and rewrite them if they match a binding. This interferes with traditional STUN. However, the update to STUN [12] uses an encoding which hides these binary addresses from generic ALGs. Since [12] is required for all ICE implementations, this NAPT problem does not impact ICE. Existing NAPT boxes have non-deterministic and typically short expiration times for UDP-based bindings. This requires implementations to send periodic keepalives to maintain those bindings. ICE uses a default of 15s, which is a very conservative estimate. Eventually, over time, as NAT boxes become compliant to behave [32], this minimum keepalive will become deterministic and well-known, and the ICE timers can be adjusted. Having a way to discover the minimum keepalive interval would be far better still. 16. Acknowledgements The authors would like to thank Flemming Andreasen, Rohan Mahy, Dean Willis, Dan Wing, Douglas Otis, Tim Moore, Francois Audet, Bill May and Philip Matthews for their comments and input. A special thanks goes to Magnus Westerlund for doing several detailed reviews on the various revisions of this specification. His input led to many substantive improvements in this document. Rosenberg Expires December 28, 2006 [Page 104] Internet-Draft ICE June 2006 17. References 17.1. Normative References [1] Huitema, C., "Real Time Control Protocol (RTCP) attribute in Session Description Protocol (SDP)", RFC 3605, October 2003. [2] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002. [3] Josefsson, S., "The Base16, Base32, and Base64 Data Encodings", RFC 3548, July 2003. [4] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with Session Description Protocol (SDP)", RFC 3264, June 2002. [5] Handley, M., "SDP: Session Description Protocol", draft-ietf-mmusic-sdp-new-26 (work in progress), January 2006. [6] Casner, S., "Session Description Protocol (SDP) Bandwidth Modifiers for RTP Control Protocol (RTCP) Bandwidth", RFC 3556, July 2003. [7] Camarillo, G., Marshall, W., and J. Rosenberg, "Integration of Resource Management and Session Initiation Protocol (SIP)", RFC 3312, October 2002. [8] Camarillo, G. and P. Kyzivat, "Update to the Session Initiation Protocol (SIP) Preconditions Framework", RFC 4032, March 2005. [9] Crocker, D. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", RFC 4234, October 2005. [10] Olson, S., Camarillo, G., and A. Roach, "Support for IPv6 in Session Description Protocol (SDP)", RFC 3266, June 2002. [11] Rosenberg, J. and H. Schulzrinne, "Reliability of Provisional Responses in Session Initiation Protocol (SIP)", RFC 3262, June 2002. [12] Rosenberg, J., "Simple Traversal of UDP Through Network Address Translators (NAT) (STUN)", draft-ietf-behave-rfc3489bis-03 (work in progress), March 2006. [13] Rosenberg, J., "Obtaining Relay Addresses from Simple Traversal of UDP Through NAT (STUN)", draft-ietf-behave-turn-00 (work in progress), March 2006. Rosenberg Expires December 28, 2006 [Page 105] Internet-Draft ICE June 2006 17.2. Informative References [14] Rosenberg, J., Weinberger, J., Huitema, C., and R. Mahy, "STUN - Simple Traversal of User Datagram Protocol (UDP) Through Network Address Translators (NATs)", RFC 3489, March 2003. [15] Senie, D., "Network Address Translator (NAT)-Friendly Application Design Guidelines", RFC 3235, January 2002. [16] Rosenberg, J. and H. Schulzrinne, "An RTP Payload Format for Generic Forward Error Correction", RFC 2733, December 1999. [17] Srisuresh, P., Kuthan, J., Rosenberg, J., Molitor, A., and A. Rayhan, "Middlebox communication architecture and framework", RFC 3303, August 2002. [18] Borella, M., Lo, J., Grabelsky, D., and G. Montenegro, "Realm Specific IP: Framework", RFC 3102, October 2001. [19] Borella, M., Grabelsky, D., Lo, J., and K. Taniguchi, "Realm Specific IP: Protocol Specification", RFC 3103, October 2001. [20] Daigle, L. and IAB, "IAB Considerations for UNilateral Self- Address Fixing (UNSAF) Across Network Address Translation", RFC 3424, November 2002. [21] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", RFC 3550, July 2003. [22] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC 3711, March 2004. [23] Carpenter, B. and K. Moore, "Connection of IPv6 Domains via IPv4 Clouds", RFC 3056, February 2001. [24] Zopf, R., "Real-time Transport Protocol (RTP) Payload for Comfort Noise (CN)", RFC 3389, September 2002. [25] Rosenberg, J., "The Session Initiation Protocol (SIP) UPDATE Method", RFC 3311, October 2002. [26] Camarillo, G. and H. Schulzrinne, "Early Media and Ringing Tone Generation in the Session Initiation Protocol (SIP)", RFC 3960, December 2004. [27] Andreasen, F., "Connectivity Preconditions for Session Rosenberg Expires December 28, 2006 [Page 106] Internet-Draft ICE June 2006 Description Protocol Media Streams", draft-ietf-mmusic-connectivity-precon-02 (work in progress), June 2006. [28] Andreasen, F., "A No-Op Payload Format for RTP", draft-ietf-avt-rtp-no-op-00 (work in progress), May 2005. [29] Huitema, C., "Teredo: Tunneling IPv6 over UDP through Network Address Translations (NATs)", RFC 4380, February 2006. [30] Kohler, E., Handley, M., and S. Floyd, "Datagram Congestion Control Protocol (DCCP)", RFC 4340, March 2006. [31] Hellstrom, G. and P. Jones, "RTP Payload for Text Conversation", RFC 4103, June 2005. [32] Audet, F. and C. Jennings, "NAT Behavioral Requirements for Unicast UDP", draft-ietf-behave-nat-udp-07 (work in progress), June 2006. Rosenberg Expires December 28, 2006 [Page 107] Internet-Draft ICE June 2006 Author's Address Jonathan Rosenberg Cisco Systems 600 Lanidex Plaza Parsippany, NJ 07054 US Phone: +1 973 952-5000 Email: jdrosen@cisco.com URI: http://www.jdrosen.net Rosenberg Expires December 28, 2006 [Page 108] Internet-Draft ICE June 2006 Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Copyright Statement Copyright (C) The Internet Society (2006). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. Rosenberg Expires December 28, 2006 [Page 109]