MIDCOM Working Group A. Molitor Internet Draft: Topology Considerations Aravox Technologies, Inc. August 2001 Topology Considerations for IP Telephony MIDCOM Agents draft-molitor-midbox-telephony-topology-00.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. 1. Abstract This document describes several operational scenarios for IP Telephony and some ways in which a suitable MIDCOM Agent might interwork with one or more middleboxes to facilitate those scenarios. From these scenarios, certain minimum requirements for Agent knowledge of network topology will be derived. The object is to demonstrate that: 1) A "pinhole" of policy in a middlebox is not a useful policy device unless some notion of physical interfaces permitted are attached to it. 2) In IP Telephony applications, it is always possible to provide this notion of permitted physical interfaces. Molitor [Page 1] Internet Draft August 2001 2. Introduction 2.1 IP Telephony The operations involved in setting up an IP Telephone Call are, regardless of protocol, more or less as follows: 1) The originating device sends a message to a target device, routing the message through some intermediate devices. This message contains, among other things, an IP address and port number at which the originating device will receive RTP traffic carrying media from the target. 2) The target device responds with an acknowledgement message containing, among other things, an IP address and port number at which the target device will receive RTP traffic carrying media from the originating device. 3) RTP traffic flows between the two devices for the duration of the telephone call. It is important, even vital, to note that none of the messages carry any information whatsoever about the source IP addresses or ports for the media streams. It is, in general, perfectly legal for the media streams to originate from non-obvious IP addresses and ports (for example, an IP address different from any of the message source or destination IP addresses). Worth noting: RTP has the rather unusual property that it is truly a unidirectional stream of packets. There are no ACKs or replies. This means that it has somewhat special properties from a firewalling and packet filtering point of view. 2.2 Some remarks on "direction" The word "direction" has several possible meanings in the context of a middlebox discussion. 1) The "direction" of an individual packet. That is the source and destination IP addresses contained in the packet. 2) The "direction" in which a channel of communication between two IP endpoints was created. For example, one could reasonably consider a TCP connection as "from" a client, "to" the server even though IP packets actually flow both ways. This definition could be Molitor [Page 2] Internet Draft August 2001 considered equivalent to the "direction" (in the sense given in item 1) of the first packet sent. 3) the path a packet takes through some sort of device, for example a middlebox. An individual packet might reasonably be considered as going in an "inwards" direction, for example, if it arrives an interface connected to the global Internet, and leaves in an interface connected to a private network. This document will use the 3rd definition of those given here. In particular, the first two, both of which are perfectly sensible and reasonable definitions, will not be used herein. 3. Terminology The non-RTP messages passing between IP telephony devices (both endpoints and intermediate "IP telephony aware" devices such as SIP proxies or H.323 gatekeepers) will be referred to collectively as "signaling" traffic. The RTP messages carrying media between two endpoints will be referred to collectively as "media traffic." The notion of "direction" when applied to a pinhole installed in a middlebox should be taken to mean a specification of which two middlebox ports are involved, and which side of the pinhole's 5-tuple goes where. In the simple case of a two port middlebox, this may be reduced to specifying a pinhole in terms of a 5-tuple with the two IP addresses and two ports designated "inside" or "outside". In larger middleboxes, the 5-tuple needs more annotation, for example: PortA: 192.168.3.1 port 1234 PortB: 209.46.41.66 port 57871 Protocol: UDP constitutes a 5-tuple annotated with a direction in this sense. An optional addition specifies whether traffic can flow in and out between the two endpoints, or only from one to the other. This optional addition is orthogonal to the core notion of "direction" and not discussed herein. It should be noted that this is by no means the only way to notate this. There are many other scheme, as or more semantically powerful, which cover the same requirements. Molitor [Page 3] Internet Draft August 2001 The author apologizes for this usage of "direction", it should more appropriately be called something with connotations of "path" but the author was unable to invent a suitable word. In general, other terminology will follow the ad hoc standards used by the MIDCOM working group and IP telephony practitioners, to the author's ability. 4. Scenarios In this section some simple usage scenarios will be laid out, with some discussion of the various issue of topology which arise in them. 4.1. Small Enterprise In this scenario, we assume a small IP network viewed as a single administrative domain, for example a small corporate office. This single network is connected to the rest of the world, the "Internet" by a single link passing through a middlebox. On this network are IP telephony devices (which we may freely interpret as simply telephones) and, for this scenario, an combined MIDCOM Agent and IP Telephony registration/routing device (for example a local H.323 Gatekeeper, or a SIP proxy including a local registrar function). Call this central device the "signaling switch" for lack of a better term. All the telephones on the network, when they wish to originate a call, communicate first with this signaling switch. Any incoming call is handled by sending signaling to the signaling switch. Calls from one phone to another within the network are also handled by sending signaling traffic through the signaling switch. There are three basic call types: 1) Outbound calls, originated by a telephone inside the network, terminating somewhere outside it. 2) Inbound calls, originating somewhere outside the network and terminating inside it. 3) Local calls, originating and terminating on the network. Molitor [Page 4] Internet Draft August 2001 In the first two cases, the signaling switch, in its other role as a MIDCOM Agent, must interact with the middlebox to open pinholes for media traffic and/or to obtain NAT bindings which it may use to provide the external endpoint with a public IP address bound to the internal endpoint's actual IP address. In the third case, it is extremely inconvenient and unnecessary for the the middlebox to be involved. In the case of NAT bindings, each of the two internal telephones would be given an external ("public") IP address as the apparent address of the other telephone. This can be made to work by routing media out through the middlebox and back in, but this is complicated, error prone, and unnecessary. The signaling switch, for all practical purposes, must be able to distinguish case 3 from the others. If it can reliably do so, it can further distinguish the other two cases, since it evidently knows which telephones are inside the network. Therefore, it is safe to assume that the signaling switch can determine from signaling traffic which of the 3 cases obtain for any individual call. It can therefore provide directionality information to the middlebox, in the sense of 'this media should be coming from the outside inwards -- not vice versa' and of course the reverse for the other media stream. Furthermore, if the middlebox does NOT have this information available, the signaling switch may be fooled into allowing traffic into the network from the outside which it should not, as follows: The attacker forges a call origination message specifying a vulnerable internal IP address and port as the supposed "outside" telephone's media port. The signaling switch, if unable to distinguish inside from outside, will cause a pinhole (and/or NAT binding) to be placed in the middlebox allowing traffic from the outside inwards to attack the vulnerable device. If the middlebox had, as proposed, the additional information that the pinhole was intended to allow traffic outwards to an "outside" endpoint only, this could not happen. The salient points here are: 1) The Agent must be able to distinguish between telephones inside the network, and those outside, using data carried in the signaling traffic together with its own configuration and other data (e.g. registrations.) Molitor [Page 5] Internet Draft August 2001 2) Specifying pinholes without a 'inside to outside' or 'outside to inside' attribute opens the network up to a variety of fairly simple attacks. 3) the Agent, by virtue of the first bullet, can easily supply the information needed to solve the problem raised in the second. 4.2. Multi-Site Enterprise In this context, we assume that there are two or more networks, all falling into the same administrative domain, each connected to the world, the "Internet," via a single middlebox. For simplicity, we assume that the "Internet" provides the only connectivity between any two of the networks within the administrative domain. We assume further that there are telephones or the equivalent located on each network, and that a single signaling switch device identical to the one described in the earlier scenario is present on one of the networks. There are now 4 types of calls: 1) Outbound calls, originated by a telephone inside one of the networks, terminating somewhere outside all of them. 2) Inbound calls, originating somewhere outside all of the networks and terminating inside one of them. 3) Site-to-Site calls, originating on one of the networks and terminating on a different one. 4) Local calls, originating and terminating within the same network. In this context, there are 3 variations on Agent/middlebox interactions. The signaling switch may need to interact with a single middlebox (cases 1 and 2) which it must determine based on the location of the internal endpoint. The signaling switch may need to interact with 2 middleboxes in the case of a Site-to-Site call. The signaling switch may not need to interact with any middleboxes at all, in the case of a Local call. The Agent (in its role as a signaling switch managing IP Telephony signaling traffic) must be able to reliably determine: Molitor [Page 6] Internet Draft August 2001 1) When a call endpoint (either origination or termination) is located within the Agent's administrative domain (that is, on one of the several networks). 2) Which one of those networks such endpoints reside on. The alternative is to do something like open pinholes on all the known middleboxes (potentially exposing vulnerable hosts as well as telephones). NAT simply will not work at all, since if the Agent cannot determine which of the several middleboxes are involved an endpoint is behind, it cannot know which middlebox to query. A NAT binding from the wrong middlebox will yield a public address routed to the site behind the wrong middlebox, causing the media to flow to the wrong site. This author cannot construct any working gedankenmekanism which will direct the traffic to the right place. Therefore, we must assume, if this scenario is to work at all, that the Agent can determine which telephones are where, in the sense of which of the several networks if any an endpoint resides upon. With this information in hand, of course, the Agent can supply inbound versus outbound directionality to the relevant middleboxes. Not surprisingly, this again prevents the possible attack described above. 4.3 Service Provider In this model, we assume that a single entity operates a provider network upstream from several customers. On the provider network, there is at least one signaling switch entity similar to the ones described above, and possible some IP Telephony endpoints (gateways, IVR systems, voicemail systems and so on). On the customer networks, there are IP telephones and other devices which can originate and terminate IP telephone calls. We assume further that there is, for some customers, a middlebox mediating traffic between the provider network and the customer network. Of course the customer may own and operate one, but for the purposes of this scenario, we assume that the middleboxes we care about are operated by the service provider. In addition, the service provider may operate middleboxes between its own network and "upstream" networks such as carriers. Molitor [Page 7] Internet Draft August 2001 Finally, we assume that any IP telephone calls for which media must pass through any portion of the provider's network will be set up by signaling passing through the provider's signaling equipment. In particular, a provider-owned MIDCOM Agent will have an opportunity to manipulate middleboxes on behalf of each telephone call. In general, we have a situation in which a provider network is connected to other networks. Each inter-network link may or may not be mediated by a middlebox operated by the service provider. IP telephone calls may originate either inside the service provider network, or on any attached network, and may terminate either inside the provider network, or outside it. We have these cases: 1) Telephone call originates on an attached network and terminates on the provider network. 1a) the attached network is attached through a a middlebox. 1b) the attached network is not. 2) Telephone call originates on the provider network, and terminates on an attached network. 2a) the attached network is attached through a a middlebox. 2b) the attached network is not. 3) Telephone call originates on an attached network, and terminates on another attached network. 3a) the origination network is attached through a a middlebox. 3b) the origination network is not. 3c) the termination network is attached through a a middlebox. 3c) the termination network is not. 4) Telephone call originates and terminates on the same attached network. 5) Telephone call originates and terminates within the provider network. This model has essentially the same problem as a multi-site enterprise network, but with more variations possible. In general, problems are worsened by the fact that various customers with various needs are connected, and the service provider is less able to simply define problems away with standardized configurations. In all cases, the signaling switch(es) owned by the service provider must solve the problem of which middleboxes -- if any -- need to be Molitor [Page 8] Internet Draft August 2001 manipulated, and in which ways. It must be able to sort the call into the correct one of the 13 (or possibly more) cases above, and identify the relevant attached networks, and the relevant middleboxes, and thereby deduce the relevant direction information through the middleboxes. 5. Summary and Conclusions In all of the above scenarios, we see that the IP Telephony device _cum_ MIDCOM Agent must know enough about network topology to discover which middleboxes, if any, to manipulate. Since it must know this, in order for the MIDCOM model to work at all, and since these devices therefore can supply a "direction" of sorts to the middlebox, there seems to be no harm in requiring a direction in pinhole specifications -- for IP telephony applications, at any rate. In addition, we see that failing to supply a direction in a pinhole specification allows gross vulnerabilities, exploitable by specifying an "internal" address where an "external" one is expected. Without our notion of directionality applied to pinholes, this pinhole will permit traffic inwards as well as outwards. Finally, all of the scenarios indicate that it is sufficient to model the network as a collection of one or more realms (roughly: sets of IP addresses), together with a single "world" realm into which everything not in the other realms falls. These realms are interconnected by middleboxes. More precisely: An Agent's view of the network topology might be described handily as a graph in which nodes (i.e. realms) are either: 1) A set of specified IP addresses. 2) a distinguished and unique node named "world" into which all IP addresses not explicitly called out in another node are considered to fall. and edges are middleboxes. A moment's consideration will show that two realms connected, but not via a middlebox known to the Agent, may as well be considered as a single realm. Of course, in some cases involving NAT, some additional information may be required since two or more realms may use overlapping ranges of private IP addresses. In this case, a realm is a set of IP addresses together with some additional information (e.g. a domain Molitor [Page 9] Internet Draft August 2001 name) which can be used to identify which realm a given IP telephony endpoint specified in a signaling message resides in. In general the "set of IP addresses" is a little crude, since an Agent may not have an IP address a priori, it may only have a, say, a SIP identifier. The Agent has to map, for example, sip:amolitor@aravox.com, into some realm. Note that it doesn't need to know the IP address of amolitor's telephone, or even of aravox.com. In many cases it may suffice to know that nobody with that identity is registered here, therefore amolitor's phone is in the "world" realm. The details of how realms are specified is, in the opinion of this author, outside the scope of the working group. The entire thrust of the above is to show that some notion of realms, at a fairly coarse level, is necessary for the operation of IP telephony networks with middleboxes. From this discussion, we derive that the additional cost of this memo's notion of direction to a pinhole is more or less free. Further, in the discussion, we demonstrate that the omission of the notion of direction from a pinhole is catastrophic and unworkable. Security Considerations Security considerations are more or less the entire point of this memo. Author's Address: Andrew Molitor amolitor@visi.com Aravox Technologies, Inc. 4201 Lexington Ave. North Suite 1105 Arden Hills, MN 55126 Molitor [Page 10]