MIDCOM Working Group                                        A. Molitor
Internet Draft: Topology Considerations      Aravox Technologies, Inc.
                                                           August 2001


         Topology Considerations for IP Telephony MIDCOM Agents
               draft-molitor-midbox-telephony-topology-00.txt 


Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   This document is an Internet-Draft. Internet-Drafts are working
   documents of the Internet Engineering Task Force (IETF), its areas,
   and its working groups.  Note that other groups may also distribute
   working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.


1. Abstract

   This document describes several operational scenarios for IP
   Telephony and some ways in which a suitable MIDCOM Agent might
   interwork with one or more middleboxes to facilitate those scenarios.
   From these scenarios, certain minimum requirements for Agent
   knowledge of network topology will be derived.

   The object is to demonstrate that:

        1) A "pinhole" of policy in a middlebox is not a useful
           policy device unless some notion of physical interfaces
           permitted are attached to it.

        2) In IP Telephony applications, it is always possible to
           provide this notion of permitted physical interfaces.


Molitor                                                         [Page 1]

Internet Draft                                               August 2001


2. Introduction

2.1 IP Telephony

   The operations involved in setting up an IP Telephone Call are,
   regardless of protocol, more or less as follows:

        1) The originating device sends a message to a target device,
           routing the message through some intermediate devices. This
           message contains, among other things, an IP address and port
           number at which the originating device will receive RTP
           traffic carrying media from the target.

        2) The target device responds with an acknowledgement message
           containing, among other things, an IP address and port
           number at which the target device will receive RTP traffic
           carrying media from the originating device.

        3) RTP traffic flows between the two devices for the duration of
           the telephone call.

   It is important, even vital, to note that none of the messages carry
   any information whatsoever about the source IP addresses or ports for
   the media streams. It is, in general, perfectly legal for the media
   streams to originate from non-obvious IP addresses and ports (for
   example, an IP address different from any of the message source or
   destination IP addresses).

   Worth noting: RTP has the rather unusual property that it is truly a
   unidirectional stream of packets. There are no ACKs or replies.  This
   means that it has somewhat special properties from a firewalling and
   packet filtering point of view.

2.2 Some remarks on "direction"

   The word "direction" has several possible meanings in the context of
   a middlebox discussion.


        1) The "direction" of an individual packet. That is the
           source and destination IP addresses contained in the
           packet.

        2) The "direction" in which a channel of communication
           between two IP endpoints was created. For example,
           one could reasonably consider a TCP connection as
           "from" a client, "to" the server even though IP packets
           actually flow both ways. This definition could be


Molitor                                                         [Page 2]

Internet Draft                                               August 2001


           considered equivalent to the "direction" (in the sense
           given in item 1) of the first packet sent.

        3) the path a packet takes through some sort of device,
           for example a middlebox. An individual packet might
           reasonably be considered as going in an "inwards"
           direction, for example, if it arrives an interface
           connected to the global Internet, and leaves in an
           interface connected to a private network.

   This document will use the 3rd definition of those given here. In
   particular, the first two, both of which are perfectly sensible and
   reasonable definitions, will not be used herein.


3. Terminology


   The non-RTP messages passing between IP telephony devices (both
   endpoints and intermediate "IP telephony aware" devices such as SIP
   proxies or H.323 gatekeepers) will be referred to collectively as
   "signaling" traffic.

   The RTP messages carrying media between two endpoints will be
   referred to collectively as "media traffic."

   The notion of "direction" when applied to a pinhole installed in a
   middlebox should be taken to mean a specification of which two
   middlebox ports are involved, and which side of the pinhole's 5-tuple
   goes where. In the simple case of a two port middlebox, this may be
   reduced to specifying a pinhole in terms of a 5-tuple with the two IP
   addresses and two ports designated "inside" or "outside". In larger
   middleboxes, the 5-tuple needs more annotation, for example:

        PortA: 192.168.3.1 port 1234
        PortB: 209.46.41.66 port 57871
        Protocol: UDP

   constitutes a 5-tuple annotated with a direction in this sense. An
   optional addition specifies whether traffic can flow in and out
   between the two endpoints, or only from one to the other. This
   optional addition is orthogonal to the core notion of "direction" and
   not discussed herein.

   It should be noted that this is by no means the only way to notate
   this. There are many other scheme, as or more semantically powerful,
   which cover the same requirements.


Molitor                                                         [Page 3]

Internet Draft                                               August 2001


        The author apologizes for this usage of "direction", it should
        more appropriately be called something with connotations of
        "path" but the author was unable to invent a suitable word.

   In general, other terminology will follow the ad hoc standards used
   by the MIDCOM working group and IP telephony practitioners, to the
   author's ability.


4. Scenarios

   In this section some simple usage scenarios will be laid out, with
   some discussion of the various issue of topology which arise in them.


4.1. Small Enterprise

   In this scenario, we assume a small IP network viewed as a single
   administrative domain, for example a small corporate office. This
   single network is connected to the rest of the world, the "Internet"
   by a single link passing through a middlebox. On this network are IP
   telephony devices (which we may freely interpret as simply
   telephones) and, for this scenario, an combined MIDCOM Agent and IP
   Telephony registration/routing device (for example a local H.323
   Gatekeeper, or a SIP proxy including a local registrar function).
   Call this central device the "signaling switch" for lack of a better
   term.

   All the telephones on the network, when they wish to originate a
   call, communicate first with this signaling switch. Any incoming call
   is handled by sending signaling to the signaling switch.  Calls from
   one phone to another within the network are also handled by sending
   signaling traffic through the signaling switch.

   There are three basic call types:

        1) Outbound calls, originated by a telephone inside the network,
           terminating somewhere outside it.

        2) Inbound calls, originating somewhere outside the network and
           terminating inside it.

        3) Local calls, originating and terminating on the network.


Molitor                                                         [Page 4]

Internet Draft                                               August 2001


   In the first two cases, the signaling switch, in its other role as a
   MIDCOM Agent, must interact with the middlebox to open pinholes for
   media traffic and/or to obtain NAT bindings which it may use to
   provide the external endpoint with a public IP address bound to the
   internal endpoint's actual IP address.

   In the third case, it is extremely inconvenient and unnecessary for
   the the middlebox to be involved. In the case of NAT bindings, each
   of the two internal telephones would be given an external ("public")
   IP address as the apparent address of the other telephone. This can
   be made to work by routing media out through the middlebox and back
   in, but this is complicated, error prone, and unnecessary.

   The signaling switch, for all practical purposes, must be able to
   distinguish case 3 from the others. If it can reliably do so, it can
   further distinguish the other two cases, since it evidently knows
   which telephones are inside the network.

   Therefore, it is safe to assume that the signaling switch can
   determine from signaling traffic which of the 3 cases obtain for any
   individual call. It can therefore provide directionality information
   to the middlebox, in the sense of 'this media should be coming from
   the outside inwards -- not vice versa' and of course the reverse for
   the other media stream.

   Furthermore, if the middlebox does NOT have this information
   available, the signaling switch may be fooled into allowing traffic
   into the network from the outside which it should not, as follows:

        The attacker forges a call origination message specifying a
        vulnerable internal IP address and port as the supposed
        "outside" telephone's media port. The signaling switch, if
        unable to distinguish inside from outside, will cause a pinhole
        (and/or NAT binding) to be placed in the middlebox allowing
        traffic from the outside inwards to attack the vulnerable
        device.

        If the middlebox had, as proposed, the additional information
        that the pinhole was intended to allow traffic outwards to an
        "outside" endpoint only, this could not happen.

   The salient points here are:

        1) The Agent must be able to distinguish between telephones
           inside the network, and those outside, using data carried
           in the signaling traffic together with its own configuration
           and other data (e.g. registrations.)


Molitor                                                         [Page 5]

Internet Draft                                               August 2001


        2) Specifying pinholes without a 'inside to outside' or
           'outside to inside' attribute opens the network up to
           a variety of fairly simple attacks.

        3) the Agent, by virtue of the first bullet, can easily
           supply the information needed to solve the problem raised
           in the second.


4.2. Multi-Site Enterprise

   In this context, we assume that there are two or more networks, all
   falling into the same administrative domain, each connected to the
   world, the "Internet," via a single middlebox. For simplicity, we
   assume that the "Internet" provides the only connectivity between any
   two of the networks within the administrative domain.

   We assume further that there are telephones or the equivalent located
   on each network, and that a single signaling switch device identical
   to the one described in the earlier scenario is present on one of the
   networks.

   There are now 4 types of calls:

        1) Outbound calls, originated by a telephone inside one of
           the networks, terminating somewhere outside all of them.

        2) Inbound calls, originating somewhere outside all of the
           networks and terminating inside one of them.

        3) Site-to-Site calls, originating on one of the networks
           and terminating on a different one.

        4) Local calls, originating and terminating within the same
           network.

   In this context, there are 3 variations on Agent/middlebox
   interactions.  The signaling switch may need to interact with a
   single middlebox (cases 1 and 2) which it must determine based on the
   location of the internal endpoint.  The signaling switch may need to
   interact with 2 middleboxes in the case of a Site-to-Site call. The
   signaling switch may not need to interact with any middleboxes at
   all, in the case of a Local call.

   The Agent (in its role as a signaling switch managing IP Telephony
   signaling traffic) must be able to reliably determine:


Molitor                                                         [Page 6]

Internet Draft                                               August 2001


        1) When a call endpoint (either origination or termination)
           is located within the Agent's administrative domain (that
           is, on one of the several networks).

        2) Which one of those networks such endpoints reside on.


   The alternative is to do something like open pinholes on all the
   known middleboxes (potentially exposing vulnerable hosts as
   well as telephones). NAT simply will not work at all, since if
   the Agent cannot determine which of the several middleboxes are
   involved an endpoint is behind, it cannot know which middlebox
   to query.

   A NAT binding from the wrong middlebox will yield a public address
   routed to the site behind the wrong middlebox, causing the
   media to flow to the wrong site. This author cannot construct
   any working gedankenmekanism which will direct the traffic to
   the right place.

   Therefore, we must assume, if this scenario is to work at all, that the
   Agent can determine which telephones are where, in the sense of which
   of the several networks if any an endpoint resides upon. With this
   information in hand, of course, the Agent can supply inbound versus
   outbound directionality to the relevant middleboxes. Not surprisingly,
   this again prevents the possible attack described above.


4.3 Service Provider

   In this model, we assume that a single entity operates a provider
   network upstream from several customers. On the provider network,
   there is at least one signaling switch entity similar to the
   ones described above, and possible some IP Telephony endpoints
   (gateways, IVR systems, voicemail systems and so on). On the
   customer networks, there are IP telephones and other devices which
   can originate and terminate IP telephone calls.

   We assume further that there is, for some customers, a middlebox
   mediating traffic between the provider network and the customer
   network. Of course the customer may own and operate one, but for
   the purposes of this scenario, we assume that the middleboxes
   we care about are operated by the service provider.

   In addition, the service provider may operate middleboxes between
   its own network and "upstream" networks such as carriers.


Molitor                                                         [Page 7]

Internet Draft                                               August 2001


   Finally, we assume that any IP telephone calls for which media
   must pass through any portion of the provider's network will
   be set up by signaling passing through the provider's signaling
   equipment. In particular, a provider-owned MIDCOM Agent will
   have an opportunity to manipulate middleboxes on behalf of
   each telephone call.

   In general, we have a situation in which a provider network
   is connected to other networks. Each inter-network link may or may not
   be mediated by a middlebox operated by the service provider. IP
   telephone calls may originate either inside the service provider
   network, or on any attached network, and may terminate either inside
   the provider network, or outside it.

   We have these cases:

        1) Telephone call originates on an attached network and
           terminates on the provider network.
        1a) the attached network is attached through a a middlebox.
        1b) the attached network is not.

        2) Telephone call originates on the provider network, and
           terminates on an attached network.
        2a) the attached network is attached through a a middlebox.
        2b) the attached network is not.

        3) Telephone call originates on an attached network, and
           terminates on another attached network.
        3a) the origination network is attached through a a middlebox.
        3b) the origination network is not.
        3c) the termination network is attached through a a middlebox.
        3c) the termination network is not.

        4) Telephone call originates and terminates on the same
           attached network.

        5) Telephone call originates and terminates within the
           provider network.


   This model has essentially the same problem as a multi-site
   enterprise network, but with more variations possible. In general,
   problems are worsened by the fact that various customers with various
   needs are connected, and the service provider is less able to simply
   define problems away with standardized configurations.

   In all cases, the signaling switch(es) owned by the service provider
   must solve the problem of which middleboxes -- if any -- need to be


Molitor                                                         [Page 8]

Internet Draft                                               August 2001


   manipulated, and in which ways. It must be able to sort the call into
   the correct one of the 13 (or possibly more) cases above, and
   identify the relevant attached networks, and the relevant
   middleboxes, and thereby deduce the relevant direction information
   through the middleboxes.


5. Summary and Conclusions

   In all of the above scenarios, we see that the IP Telephony device
   _cum_ MIDCOM Agent must know enough about network topology to
   discover which middleboxes, if any, to manipulate. Since it must know
   this, in order for the MIDCOM model to work at all, and since these
   devices therefore can supply a "direction" of sorts to the middlebox,
   there seems to be no harm in requiring a direction in pinhole
   specifications -- for IP telephony applications, at any rate.

   In addition, we see that failing to supply a direction in a pinhole
   specification allows gross vulnerabilities, exploitable by specifying
   an "internal" address where an "external" one is expected. Without
   our notion of directionality applied to pinholes, this pinhole will
   permit traffic inwards as well as outwards.

   Finally, all of the scenarios indicate that it is sufficient to model
   the network as a collection of one or more realms (roughly: sets of
   IP addresses), together with a single "world" realm into which
   everything not in the other realms falls. These realms are
   interconnected by middleboxes. More precisely:

   An Agent's view of the network topology might be described handily as
   a graph in which nodes (i.e. realms) are either:

        1) A set of specified IP addresses.

        2) a distinguished and unique node named "world" into which
           all IP addresses not explicitly called out in another
           node are considered to fall.

   and edges are middleboxes. A moment's consideration will show that
   two realms connected, but not via a middlebox known to the Agent, may
   as well be considered as a single realm.

   Of course, in some cases involving NAT, some additional information
   may be required since two or more realms may use overlapping ranges
   of private IP addresses. In this case, a realm is a set of IP
   addresses together with some additional information (e.g. a domain


Molitor                                                         [Page 9]

Internet Draft                                               August 2001


   name) which can be used to identify which realm a given IP telephony
   endpoint specified in a signaling message resides in.

   In general the "set of IP addresses" is a little crude, since an
   Agent may not have an IP address a priori, it may only have a, say, a
   SIP identifier. The Agent has to map, for example,
   sip:amolitor@aravox.com, into some realm. Note that it doesn't need
   to know the IP address of amolitor's telephone, or even of
   aravox.com. In many cases it may suffice to know that nobody with
   that identity is registered here, therefore amolitor's phone is in
   the "world" realm.

   The details of how realms are specified is, in the opinion of this
   author, outside the scope of the working group. The entire thrust of
   the above is to show that some notion of realms, at a fairly coarse
   level, is necessary for the operation of IP telephony networks with
   middleboxes.

   From this discussion, we derive that the additional cost of this
   memo's notion of direction to a pinhole is more or less free.

   Further, in the discussion, we demonstrate that the omission of the
   notion of direction from a pinhole is catastrophic and unworkable.


Security Considerations

   Security considerations are more or less the entire point of this
   memo.

Author's  Address:

   Andrew Molitor
   amolitor@visi.com
   Aravox Technologies, Inc.
   4201 Lexington Ave. North
   Suite 1105
   Arden Hills, MN 55126


Molitor                                                        [Page 10]