Internet Draft                                        Yakov Rekhter
Expiration date:July 1997                               Bruce Davie
                                                          Dave Katz
                                                         Eric Rosen
                                                     George Swallow
                                                     Dino Farinacci
                                                      cisco Systems
                                                       January 1997


                 Tag Switching Architecture - Overview

                  draft-rekhter-tagswitch-arch-00.txt


1. Status of this Memo

   This document is an Internet Draft. Internet Drafts are working
   documents of the Internet Engineering Task Force (IETF), its Areas,
   and its Working Groups. Note that other groups may also distribute
   working documents as Internet Drafts.

   Internet Drafts are draft documents valid for a maximum of six
   months. Internet Drafts may be updated, replaced, or obsoleted by
   other documents at any time. It is not appropriate to use Internet
   Drafts as reference material or to cite them other than as a "working
   draft" or "work in progress."

   Please check the 1id-abstracts.txt listing contained in the
   internet-drafts Shadow Directories on nic.ddn.mil, nnsc.nsf.net,
   nic.nordu.net, ftp.nisc.sri.com, or munnari.oz.au to learn the
   current status of any Internet Draft.


2. Abstract

   This document provides an overview of tag switching. Tag switching is
   a way to combine the label-swapping forwarding paradigm with network
   layer routing. This has several advantages. Tags can have a wide
   spectrum of forwarding granularities, so at one end of the spectrum a
   tag could be associated with a group of destinations, while at the
   other a tag could be associated with a single application flow. At
   the same time forwarding based on tag switching, due to its
   simplicity, is well suited to high performance forwarding. These
   factors facilitate the development of a routing system which is both
   functionally rich and scalable. Finally, tag switching simplifies
   integration of routers and ATM switches by employing common


                                                                [Page 1]


Internet Draft    draft-rekhter-tagswitch-arch-00.txt       January 1997


   addressing, routing, and management procedures.


3. Introduction

   Continuous growth of the Internet demands higher bandwidth within the
   Internet Service Providers (ISPs). However, growth of the Internet is
   not the only driving factor for higher bandwidth - demand for higher
   bandwidth also comes from emerging multimedia applications. Demand
   for higher bandwidth, in turn, requires higher forwarding performance
   for both multicast and unicast traffic.

   The growth of the Internet also demands improved scaling properties
   of the Internet routing system. The ability to contain the volume of
   routing information maintained by individual routers and the ability
   to build a hierarchy of routing knowledge are essential to support a
   high quality, scalable routing system.

   While the destination-based forwarding paradigm is adequate in many
   situations, we already see examples where it is no longer adequate.
   The ability to overcome the rigidity of destination-based forwarding
   and to have more flexible control over how traffic is routed is
   likely to become more and more important.

   We see the need to improve forwarding performance while at the same
   time adding routing functionality to support multicast, allowing more
   flexible control over how traffic is routed, and providing the
   ability to build a hierarchy of routing knowledge. Moreover, it
   becomes more and more crucial to have a routing system that can
   support graceful evolution to accommodate new and emerging
   requirements.

   Tag switching is a technology that provides an efficient solution to
   these challenges. Tag switching blends the flexibility and rich
   functionality provided by Network Layer routing with the simplicity
   provided by the label swapping forwarding paradigm. The simplicity of
   the tag switching forwarding paradigm (label swapping) enables
   improved forwarding performance, while maintaining competitive
   price/performance. By associating a wide range of forwarding
   granularities with a tag, the same forwarding paradigm can be used to
   support a wide variety of routing functions, such as destination-
   based routing, multicast, hierarchy of routing knowledge, and
   flexible routing control. Finally, a combination of simple
   forwarding, a wide range of forwarding granularities, and the ability
   to evolve routing functionality while preserving the same forwarding
   paradigm enables a routing system that can gracefully evolve to
   accommodate new and emerging requirements.


                                                                [Page 2]


Internet Draft    draft-rekhter-tagswitch-arch-00.txt       January 1997


4. Tag Switching components

   Tag switching consists of two components: forwarding and control. The
   forwarding component uses the tag information (tags) carried by
   packets and the tag forwarding information maintained by a tag switch
   to perform packet forwarding. The control component is responsible
   for maintaining correct tag forwarding information among a group of
   inter- connected tag switches.

   Segregating control and forwarding into separate components promotes
   modularity, which in turn enables to build a system that can
   gracefully evolve to accommodate new and emerging requirements.


5. Forwarding component

   The fundamental forwarding paradigm employed by tag switching is
   based on the notion of label swapping. When a packet with a tag is
   received by a tag switch, the switch uses the tag as an index in its
   Tag Information Base (TIB). Each entry in the TIB consists of an
   incoming tag, and one or more sub-entries of the form <outgoing tag,
   outgoing interface, outgoing link level information>. If the switch
   finds an entry with the incoming tag equal to the tag carried in the
   packet, then for each <outgoing tag, outgoing interface, outgoing
   link level information> in the entry the switch replaces the tag in
   the packet with the outgoing tag, replaces the link level information
   (e.g MAC address) in the packet with the outgoing link level
   information, and forwards the packet over the outgoing interface.

   From the above description of the forwarding component we can make
   several observations. First, the forwarding decision is based on the
   exact match algorithm using a fixed length, fairly short tag as an
   index. This enables a simplified forwarding procedure, relative to
   longest match forwarding traditionally used at the network layer.
   This in turn enables higher forwarding performance (higher packets
   per second). The forwarding procedure is simple enough to allow a
   straightforward hardware implementation.

   A second observation is that the forwarding decision is independent
   of the tag's forwarding granularity. For example, the same forwarding
   algorithm applies to both unicast and multicast - a unicast entry
   would just have a single (outgoing tag, outgoing interface, outgoing
   link level information) sub-entry, while a multicast entry may have
   one or more (outgoing tag, outgoing interface, outgoing link level
   information) sub-entries. (For multi-access links, the outgoing link
   level information in this case would include a multicast MAC
   address.) This illustrates how with tag switching the same forwarding
   paradigm can be used to support different routing functions (e.g.,


                                                                [Page 3]


Internet Draft    draft-rekhter-tagswitch-arch-00.txt       January 1997


   unicast, multicast, etc...)

   The simple forwarding procedure is thus essentially decoupled from
   the control component of tag switching. New routing (control)
   functions can readily be deployed without disturbing the forwarding
   paradigm. This means that it is not necessary to re-optimize
   forwarding performance (by modifying either hardware or software) as
   new routing functionality is added.

   In the tag switching architecture, various implementation options are
   acceptable. For example, support for network layer forwarding by a
   tag switch (i.e., forwarding based on the network layer header as
   opposed to a tag) is optional. Moreover, use of network layer
   forwarding may be constrained to handling network layer control
   traffic only. (Note, however, that a tag switch must be able to
   source and sink network layer packets, e.g. to participate in network
   layer routing protocols)

   For the purpose of handling network layer hop count (time-to-live)
   the architecture allows two alternatives: network layer hops may
   correspond directly to hops formed by tag switches, or one network
   layer hop may correspond to several tag switched hops.

   When a switch receives a packet with a tag, and the TIB maintained by
   the switch has no entry with the incoming tag equal to the tag
   carried by the packet, or the entry exists, the outgoing tag entry is
   entry, and the entry does not indicate local delivery to the switch,
   the switch may either (a) discard the packet, or (b) strip the tag
   information, and submit the packet for network layer processing.
   Support for the latter is optional (as support for network layer
   forwarding is optional). Note that it may not always be possible to
   successfully forward a packet after stripping a tag even if a tag
   switch supports network layer forwarding.

   The architecture allows a tag switch to maintain either a single TIB
   per tag switch, or a TIB per interface. Moreover, a tag switch could
   mix both of these options - some tags could be maintained in a single
   TIB, while other tags could be maintained in a TIB associated with
   individual interfaces.


5.1. Tag encapsulation

   Tag switching clearly requires a tag to be carried in each packet.
   The tag information can be carried in a variety of ways:


      - as a small "shim" tag header inserted between the layer 2 and


                                                                [Page 4]


Internet Draft    draft-rekhter-tagswitch-arch-00.txt       January 1997


      the Network Layer headers;

      - as part of the layer 2 header, if the layer 2 header provides
      adequate semantics (e.g., Frame Relay, or ATM);

      - as part of the Network Layer header (e.g., using the Flow Label
      field in IPv6 with appropriately modified semantics).


   It is therefore possible to implement tag switching over virtually
   any media type including point-to-point links, multi-access links,
   and ATM. At the same time the forwarding component allows specific
   optimizations for particular media (e.g., ATM).

   Observe also that the tag forwarding component is Network Layer
   independent. Use of control component(s) specific to a particular
   Network Layer protocol enables the use of tag switching with
   different Network Layer protocols.


6. Control component

   Essential to tag switching is the notion of binding between a tag and
   Network Layer routing (routes). The control component is responsible
   for creating tag bindings, and then distributing the tag binding
   information among tag switches. Creating a tag binding involves
   allocating a tag, and then binding a tag to a route. The distribution
   of tag binding information among tag switches could be accomplished
   via several options:

      - piggybacking on existing routing protocols

      - using a separate Tag Distribution Protocol (TDP)

   While the architecture supports distribution of tag binding
   information that is independent of the underlying routing protocols,
   the architecture acknowledges that considerable optimizations can be
   achieved in some cases by small enhancements of existing protocols to
   enable piggybacking tag binding information on these protocols.

   One important characteristic of the tag switching architecture is
   that creation of tag bindings is driven primarily by control traffic
   rather than by data traffic. Control traffic driven creation of tag
   bindings has several advantages, as compared to data traffic driven
   creation of tag bindings. For one thing, it minimizes the amount of
   additional control traffic needed to distribute tag binding
   information, as tag binding information is distributed only in
   response to control traffic, independent of data traffic. It also


                                                                [Page 5]


Internet Draft    draft-rekhter-tagswitch-arch-00.txt       January 1997


   makes the overall scheme independent of and insensitive to the data
   traffic profile/pattern. Control traffic driven creation of tag
   binding improves forwarding performance, as tags are precomputed
   (prebound) before data traffic arrives, rather than being created as
   data traffic arrives. It also simplifies the overall system behavior,
   as the control plane is controlled solely by control traffic, rather
   than by a mix of control and data traffic.

   Another important characteristic of the tag switching architecture is
   that distribution and maintenance of tag binding information is
   consistent with distribution and maintenance of the associated
   routing information. For example, distribution of tag binding
   information for tags associated with unicast routing is based on the
   technique of incremental updates with explicit acknowledgment. This
   is very similar to the way unicast routing information gets
   distributed by such protocols as OSPF and BGP. In contrast,
   distribution of tag binding information for tags associated with
   multicast routing is based on period updates/ refreshes, without any
   explicit acknowledgments. This is consistent with the way multicast
   routing information is distributed by such protocols as PIM.

   To provide good scaling characteristics, while also accommodating
   diverse routing functionality, tag switching supports a wide range of
   forwarding granularities. At one extreme a tag could be associated
   (bound) to a group of routes (more specifically to the Network Layer
   Reachability Information of the routes in the group). At the other
   extreme a tag could be bound to an individual application flow (e.g.,
   an RSVP flow). A tag could also be bound to a multicast tree. In
   addition, a tag may be bound to a path that has been selected for a
   certain set of packets based on some policy (e.g. an explicit route).

   The control component is organized as a collection of modules, each
   designed to support a particular routing function. To support new
   routing functions, new modules can be added. The architecture does
   not mandate a prescribed set of modules that have to be supported by
   every tag switch.

   The following describes some of the modules.


6.1. Destination-based routing

   In this section we describe how tag switching can support
   destination-based routing. Recall that with destination-based routing
   a router makes a forwarding decision based on the destination address
   carried in a packet and the information stored in the Forwarding
   Information Base (FIB) maintained by the router. A router constructs
   its FIB by using the information it receives from routing protocols


                                                                [Page 6]


Internet Draft    draft-rekhter-tagswitch-arch-00.txt       January 1997


   (e.g., OSPF, BGP).

   To support destination-based routing with tag switching, a tag
   switch, just like a router, participates in routing protocols (e.g.,
   OSPF, BGP), and constructs its FIB using the information it receives
   from these protocols.

   There are three permitted methods for tag allocation and Tag
   Information Base (TIB) management: (a) downstream tag allocation, (b)
   downstream tag allocation on demand, and (c) upstream tag allocation.
   In all cases, a switch allocates tags and binds them to address
   prefixes in its FIB. In downstream allocation, the tag that is
   carried in a packet is generated and bound to a prefix by the switch
   at the downstream end of the link (with respect to the direction of
   data flow). On demand allocation means that tags will only be
   allocated and distributed by the downstream switch when it is
   requested to do so by the upstream switch. Method (b) is most useful
   in ATM networks (see Section 8). In upstream allocation, tags are
   allocated and bound at the upstream end of the link. Note that in
   downstream allocation, a switch is responsible for creating tag
   bindings that apply to incoming data packets, and receives tag
   bindings for outgoing packets from its neighbors. In upstream
   allocation, a switch is responsible for creating tag bindings for
   outgoing tags, i.e. tags that are applied to data packets leaving the
   switch, and receives bindings for incoming tags from its neighbors.

   The downstream tag allocation scheme operates as follows: for each
   route in its FIB the switch allocates a tag, creates an entry in its
   Tag Information Base (TIB) with the incoming tag set to the allocated
   tag, and then advertises the binding between the (incoming) tag and
   the route to other adjacent tag switches. The advertisement could be
   accomplished by either piggybacking the binding on top of the
   existing routing protocols, or by using a separate Tag Distribution
   Protocol (TDP). When a tag switch receives tag binding information
   for a route, and that information was originated by the next hop for
   that route, the switch places the tag (carried as part of the binding
   information) into the outgoing tag of the TIB entry associated with
   the route. This creates the binding between the outgoing tag and the
   route.

   With the downstream on demand tag allocation scheme, operation is as
   follows. For each route in its FIB, the switch identifies the next
   hop for that route. It then issues a request (via TDP) to the next
   hop for a tag binding for that route. When the next hop receives the
   request, it allocates a tag, creates an entry in its TIB with the
   incoming tag set to the allocated tag, and then returns the binding
   between the (incoming) tag and the route to the switch that sent the
   original request. When the switch receives the binding information,


                                                                [Page 7]


Internet Draft    draft-rekhter-tagswitch-arch-00.txt       January 1997


   the switch creates an entry in its TIB, and sets the outgoing tag in
   the entry to the value received from the next hop. Handling of data
   packets is as for downstream allocation. The main application for
   this mode of operation is with ATM switches, as described in Section
   8.

   The upstream tag allocation scheme is used as follows. If a tag
   switch has one or more point-to-point interfaces, then for each route
   in its FIB whose next hop is reachable via one of these interfaces,
   the switch allocates a tag, creates an entry in its TIB with the
   outgoing tag set to the allocated tag, and then advertises to the
   next hop (via TDP) the binding between the (outgoing) tag and the
   route. When a tag switch that is the next hop receives the tag
   binding information, the switch places the tag (carried as part of
   the binding information) into the incoming tag of the TIB entry
   associated with the route.

   Note that, while we have described upstream allocation for the sake
   of completeness, we have found the two downstream allocation methods
   adequate for all practical purposes so far.

   Independent of which tag allocation method is used, once a TIB entry
   is populated with both incoming and outgoing tags, the tag switch can
   forward packets for routes bound to the tags by using the tag
   switching forwarding algorithm (as described in Section 5).

   When a tag switch creates a binding between an outgoing tag and a
   route, the switch, in addition to populating its TIB, also updates
   its FIB with the binding information. This enables the switch to add
   tags to previously untagged packets.

   So far we have described how a tag could be bound to a single route,
   creating a one-to-one mapping between routes and tags. However, under
   certain conditions it is possible to bind a tag not just to a single
   route, but to a group of routes, creating a many-to-one mapping
   between routes and tags. Consider a tag switch that is connected to a
   router.  It is quite possible that the switch uses the router as the
   next hop not just for one route, but for a group of routes. Under
   these conditions the switch does not have to allocate distinct tags
   to each of these routes - one tag would suffice. The distribution of
   tag binding information is unaffected by whether there is a one-to-
   one or one-to-many mapping between tags and routes. Now consider a
   tag switch that receives from one of its neighbors (tag switching
   peers) tag binding information for a set of routes, such that the set
   is bound to a single tag. If the switch decides to use some or all of
   the routes in the set, then for these routes the switch does not need
   to allocate individual tags - one tag would suffice. Such an approach
   may be valuable when tags are a precious resource. Note that the


                                                                [Page 8]


Internet Draft    draft-rekhter-tagswitch-arch-00.txt       January 1997


   ability to support many-to-one mapping makes no assumptions about the
   routing protocols being used.

   When a tag switch adds a tag to a previously untagged packet the tag
   could be either associated with the route to the destination address
   carried in the packet, or with the route to some other tag switch
   along the path to the destination (in some cases the address of that
   other tag switch could be gleaned from network layer routing
   protocols). The latter option provides yet another way of mapping
   multiple routes into a single tag. However, this option is either
   dependent on particular routing protocols, or would require a
   separate mechanism for discovering tag switches along a path.

   To understand the scaling properties of tag switching in conjunction
   with destination-based routing, observe that the total number of tags
   that a tag switch has to maintain can not be greater than the number
   of routes in the switch's FIB. Moreover, as we have just seen, the
   number of tags can be much less than the number of routes. Thus, much
   less state is required than would be the case if tags were allocated
   to individual flows.

   In general, a tag switch will try to populate its TIB with incoming
   and outgoing tags for all routes to which it has reachability, so
   that all packets can be forwarded by simple label swapping. Tag
   allocation is thus driven by topology (routing), not data traffic -
   it is the existence of a FIB entry that causes tag allocations, not
   the arrival of data packets.

   Use of tags associated with routes, rather than flows, also means
   that there is no need to perform flow classification procedures for
   all the flows to determine whether to assign a tag to a flow. That,
   in turn, simplifies the overall scheme, and makes it more robust and
   stable in the presence of changing traffic patterns.

   Note that when tag switching is used to support destination-based
   routing, tag switching does not completely eliminate the need to
   perform normal Network Layer forwarding at some network elements.
   First of all, to add a tag to a previously untagged packet requires
   normal Network Layer forwarding. This function could be performed by
   the first hop router, or by the first router on the path that is able
   to participate in tag switching. In addition, whenever a tag switch
   aggregates a set of routes (e.g., by using the technique of
   hierarchical routing), into a single tag, and the routes do not share
   a common next hop, the switch needs to perform Network Layer
   forwarding for packets carrying that tag. However, one could observe
   that the number of places where routes get aggregated is smaller than
   the total number of places where forwarding decisions have to be
   made. Moreover, quite often aggregation is applied to only a subset


                                                                [Page 9]


Internet Draft    draft-rekhter-tagswitch-arch-00.txt       January 1997


   of the routes maintained by a tag switch. As a result, on average a
   packet can be forwarded most of the time using the tag switching
   algorithm. Note that many tag switches may not need to perform any
   network layer forwarding.


6.2. Hierarchy of routing knowledge

   The IP routing architecture models a network as a collection of
   routing domains. Within a domain, routing is provided via interior
   routing (e.g., OSPF), while routing across domains is provided via
   exterior routing (e.g., BGP). However, all routers within domains
   that carry transit traffic (e.g., domains formed by Internet Service
   Providers) have to maintain information provided by not just interior
   routing, but exterior routing as well, even if only some of these
   routers participate in exterior routing. That creates certain
   problems. First of all, the amount of this information is not
   insignificant. Thus it places additional demand on the resources
   required by the routers.  Moreover, increase in the volume of routing
   information quite often increases routing convergence time. This, in
   turn, degrades the overall performance of the system.

   Tag switching allows complete decoupling of interior and exterior
   routing. With tag switching only tag switches at the border of a
   domain would be required to maintain routing information provided by
   exterior routing - all other switches within the domain would just
   maintain routing information provided by the domains interior routing
   (which is usually significantly smaller than the exterior routing
   information), with no "leaking" of exterior routing information into
   interior routing. This, in turn, reduces the routing load on non-
   border switches, and shortens routing convergence time.

   To support this functionality, tag switching allows a packet to carry
   not one but a set of tags, organized as a stack. A tag switch could
   either swap the tag at the top of the stack, or pop the stack, or
   swap the tag and push one or more tags into the stack.

   Consider a tag switch that is at the border of a routing domain. This
   switch maintains both exterior and interior routes. The interior
   routes provide routing information and tags to all the other tag
   switches within the domain. For each exterior route that the switch
   receives from some other border tag switch that is in the same domain
   as the local switch, the switch maintains not just a tag associated
   with the route, but also a tag associated with the route to that
   other border tag switch. Moreover, for inter-domain routing protocols
   that are capable of passing the "third-party" next hop information
   the switch would maintain a tag associated with the route to the next
   hop, rather than with the route to the border tag switch from whom


                                                               [Page 10]


Internet Draft    draft-rekhter-tagswitch-arch-00.txt       January 1997


   the local switch received the exterior route.

   When a packet is forwarded between two (border) tag switches in
   different domains, the tag stack in the packet contains just one tag
   (associated with an exterior route). However, when a packet is
   forwarded within a domain, the tag stack in the packet contains not
   one, but two tags (the second tag is pushed by the domain's ingress
   border tag switch). The tag at the top of the stack provides packet
   forwarding to an appropriate egress border tag switch (or the
   "third-party" next hop), while the next tag in the stack provides
   correct packet forwarding at the egress switch (or at the "third-
   party" next hop). The stack is popped by either the egress switch (or
   the "third-party" next hop) or by the penultimate (with respect to
   the egress switch/"third-party" next hop) switch.

   One could observe that when tag switching is confined to a single
   routing domain, the above still could be used to decouple interior
   from exterior routing, similar to what was described above. However,
   in this case a border tag switch wouldn't maintain tags associated
   with each exterior route, and forwarding between domains would be
   performed at the network layer.

   The control component used in this scenario is fairly similar to the
   one used with destination-based routing. In fact, the only essential
   difference is that in this scenario the tag binding information is
   distributed both among physically adjacent tag switches, and among
   border tag switches within a single domain. One could also observe
   that the latter (distribution among border switches) could be
   trivially accommodated by very minor extensions to BGP.

   The notion of supporting hierarchy of routing knowledge with tag
   switching is not limited to the case of exterior/interior routing,
   but could be applicable to other cases where the hierarchy of routing
   knowledge is possible. Moreover, while the above describes only a
   two-level hierarchy of routing knowledge, the tag switching
   architecture does not impose limits on the depth of the hierarchy.


6.3. Multicast

   Essential to multicast routing is the notion of spanning trees.
   Multicast routing procedures (e.g., PIM) are responsible for
   constructing such trees (with receivers as leafs), while multicast
   forwarding is responsible for forwarding multicast packets along such
   trees. Thus, to support a multicast forwarding function with tag
   switching we need to be able to associate a tag with a multicast
   tree.  The following describes the procedures for allocation and
   distribution of tags for multicast.


                                                               [Page 11]


Internet Draft    draft-rekhter-tagswitch-arch-00.txt       January 1997


   When tag switching is used for multicast, it is important that tag
   switching be able to utilize multicast capabilities provided by the
   Data Link layer (e.g., multicast capabilities provided by Ethernet).
   To be able to do this, an (upstream) tag switch connected to a given
   Data Link subnetwork should use the same tag when forwarding a
   multicast packet to all of the (downstream) switches on that
   subnetwork. This way the packet will be multicasted at the Data Link
   layer over the subnetwork. To support this, all tag switches that are
   part of a given multicast tree and are on a common subnetwork must
   agree on a common tag that would be used for forwarding multicast
   packets along the tree over the subnetwork. Moreover, since multicast
   forwarding is based on Reverse Path Forwarding (RPF), it is crucial
   that, when a tag switch receives a multicast packet, a tag carried in
   a packet must enable the switch to identify both (a) a particular
   multicast group, as well as (b) the previous hop (upstream) tag
   switch that sent the packet.

   To support the requirements outlined in the previous paragraph, the
   tag switching architecture assumes that (a) multicast tags are
   associated with interfaces on a tag switch (rather than with a tag
   switch as a whole), (b) the tag space that a tag switch could use for
   allocating tags for multicast is partitioned into non-overlapping
   regions among all the tag switches connected to a common Data Link
   subnetwork, and (c) there are procedures by which tag switches that
   belong to a common multicast tree and are on a common Data Link
   subnetwork agree on the tag switch that is responsible for allocating
   a tag for the tree.

   One possible way of partitioning tag space into non-overlapping
   regions among tag switches connected to a common subnetwork is for
   each tag switch to claim a region of the space and announce this
   region to its neighbors. Conflicts are resolved based on the IP
   address of the contending switches (the higher address wins, the
   lower retries). Once the tag space is partitioned among tag switches,
   the switches may create bindings between tags and multicast trees
   (routes).

   At least in principle there are two possible ways to create bindings
   between tags and multicast trees (routes). With the first alternative
   for a set of tag switches that share a common Data Link subnetwork,
   the tag switch that is upstream with respect to a particular
   multicast tree allocates a tag (out of its own region that does not
   overlap with the regions of other switches on the subnetwork), binds
   the tag to a multicast route, and then advertises the binding to all
   the (downstream) switches on the subnetwork. With the second
   alternative, one of the tag switches that is downstream with respect
   to a particular multicast tree allocates a tag (out of its own region
   that does not overlap with the regions of other switches on the


                                                               [Page 12]


Internet Draft    draft-rekhter-tagswitch-arch-00.txt       January 1997


   subnetwork), binds the tag to a multicast route, and then advertises
   the binding to all the switches (both downstream and upstream) on the
   subnetwork. Usually the first tag switch to join the group is the one
   that performs the allocation.

   Each of the above alternatives has its own trade-offs. The first
   alternative is fairly simple - one upstream router does the tag
   binding and multicasts the binding downstream. However, the first
   alternative may create uneven distribution of allocated tags, as some
   tag switches on a common subnetwork may have more upstream multicast
   sources than the others. Also, changes in topology could result in
   upstream neighbor changes, which in turn would require tag re-
   binding. Finally, one could observe that distributing tag binding
   from upstream towards downstream is inconsistent with the direction
   of multicast routing information distribution (from downstream
   towards upstream).

   The second alternative, even if more complex that the first one, has
   its own advantages. For one thing, it makes distribution of multicast
   tag binding consistent with the distribution of unicast tag binding.
   It also makes distribution of multicast tag binding consistent with
   the distribution of multicast routing information. This, in turn,
   allows the piggybacking of tag binding information on existing
   multicast routing protocols (PIM). This alternative also avoids the
   need for tag re-binding when there are changes in upstream neighbor.
   Finally it is more likely to provide more even distribution of
   allocated tags, as compared to the first alternative. Note that this
   approach does require a mechanism to choose the tag allocator from
   among the downstream tag switches on the subnetwork.


6.4. Quality of service

   Two mechanisms are needed for providing a range of qualities of
   service to packets passing through a router or a tag switch. First,
   we need to classify packets into different classes. Second, we need
   to ensure that the handling of packets is such that the appropriate
   QOS characteristics (bandwidth, loss, etc.) are provided to each
   class.

   Tag switching provides an easy way to mark packets as belonging to a
   particular class after they have been classified the first time.
   Initial classification could be done using configuration information
   (e.g., all traffic from a certain interface) or using information
   carried in the network layer or higher layer headers (e.g., all
   packets between a certain pair of hosts). A tag corresponding to the
   resultant class would then be applied to the packet. Tagged packets
   can then be efficiently handled by the tag switching routers in their


                                                               [Page 13]


Internet Draft    draft-rekhter-tagswitch-arch-00.txt       January 1997


   path without needing to be reclassified. The actual scheduling and
   queueing of packets is largely orthogonal - the key point here is
   that tag switching enables simple logic to be used to find the state
   that identifies how the packet should be scheduled.

   Tag switching can, for example, be used to support a small number of
   classes of service in a service provider network (e.g. premium and
   standard). On frame-based media, the class can be encoded by a field
   in the tag header. On ATM tag switches, additional tags can be
   allocated to differentiate the different classes. For example, rather
   than having one tag for each destination prefix in the FIB, an ATM
   tag switch could have two tags per prefix, one to be used by premium
   traffic and one by standard. Thus a tag binding in this case is a
   triple consisting of <prefix, QOS class, tag>. Such a tag would be
   used both to make a forwarding decision and to make a scheduling
   decision, e.g., by selecting the appropriate queue in a weighted fair
   queueing (WFQ) scheduler.

   To provide a finer granularity of QOS, tag switching can be used with
   RSVP. We propose a simple extension to RSVP in which a tag object is
   defined. Such an object can be carried in an RSVP reservation message
   and thus associated with a session. Each tag capable router assigns a
   tag to the session and passes it upstream with the reservation
   message. Thus the association of tags with RSVP sessions works very
   much like the binding of tags to routes with downstream allocation.
   Note, however, that binding is accomplished using RSVP rather than
   TDP. (It would be possible to use TDP, but it is simpler to extend
   RSVP to carry tags and this ensures that tags and reservation
   information are communicated in a similar manner.)

   When data packets are transmitted, the first router in the path that
   is tag-capable applies the tag that it received from its downstream
   neighbor. This tag can be used at the next hop to find the
   corresponding reservation state, to forward and schedule the packet
   appropriately, and to find the suitable outgoing tag value provided
   by the next hop.  Note that tag imposition could also be performed at
   the sending host.


6.5. Flexible routing (explicit routes)

   One of the fundamental properties of destination-based routing is
   that the only information from a packet that is used to forward the
   packet is the destination address. While this property enables highly
   scalable routing, it also limits the ability to influence the actual
   paths taken by packets. This, in turn, limits the ability to evenly
   distribute traffic among multiple links, taking the load off highly
   utilized links, and shifting it towards less utilized links. For


                                                               [Page 14]


Internet Draft    draft-rekhter-tagswitch-arch-00.txt       January 1997


   Internet Service Providers (ISPs) who support different classes of
   service, destination-based routing also limits their ability to
   segregate different classes with respect to the links used by these
   classes. Some of the ISPs today use Frame Relay or ATM to overcome
   the limitations imposed by destination-based routing. Tag switching,
   because of the flexible granularity of tags, is able to overcome
   these limitations without using either Frame Relay or ATM.

   Another application where destination-based routing is no longer
   adequate is routing with resource reservations (QOS routing).
   Increasing the number of ways by which a particular reservation could
   traverse a network may improve the success of the reservation.
   Increasing the number of ways, in turn, requires the ability to
   explore paths that are not constrained to the ones constructed solely
   based on destination.

   To provide forwarding along paths that are different from the paths
   determined by destination-based routing, the control component of tag
   switching allows installation of tag bindings in tag switches that do
   not correspond to the destination-based routing paths.

   One possible alternative for supporting explicit routes is to allow
   TDP to carry information about an explicit route, where such a route
   could be expressed as a sequence of tag switches. Another alternative
   is to use tag-capable RSVP (see Section 6.4) as a mechanism to
   distribute tag bindings, and to augment RSVP with the ability to
   steer the PATH message along a particular (explicit) route. Finally,
   it is also possible in principle to use some form of source route
   (e.g., SDRP, GRE) to steer RSVP PATH messages carrying tag bindings
   along a particular path. Note, however, that this would require a
   change to the way in which RSVP handles PATH messages, as it would be
   necessary to store the source route as part of the PATH state.


7. Tag Forwarding Granularities and Forwarding Equivalence Classes

   A conventional router has some sort of structure or set of structures
   which may be called a "forwarding table", which has a finite number
   of entries. Whenever a packet is received, the router applies a
   classification algorithm which maps the packet to one of the
   forwarding table entries. This entry specifies how to forward the
   packet.

   We can think of this classification algorithm as a means of
   partitioning the universe of possible packets into a finite set of
   "Forwarding Equivalence Classes" (FECs).

   Each router along a path must have some way of determining the next


                                                               [Page 15]


Internet Draft    draft-rekhter-tagswitch-arch-00.txt       January 1997


   hop for that FEC. For a given FEC, the corresponding entry in the
   forwarding table may be created dynamically, by operation of the
   routing protocols (unicast or multicast), or it might be created by
   configuration, or it might be created by some combination of
   configuration and protocol.

   In tag switching, if a pair of tag switches are adjacent along a tag
   switched path, they must agree on an assignment of tags to FECs. Once
   this agreement is made, all tag switches on the tag switched path
   other than the first are spared the work of actually executing the
   classification algorithm. In fact, subsequent tag switches need not
   even have the code which would be necessary to do this.

   There are a large number of different ways in which one may choose to
   partition a set of packets into FECs. Some examples:

      1. Consider two packets to be in the same FEC if there is a single
      address prefix in the routing table which is the longest match for
      the destination address of each packet;

      2. Consider two packets to be in the same FEC if these packets
      have to traverse through a common router/tag switch;

      3. Consider two packets to be in the same FEC if they have the
      same source address and the same destination address;

      4. Consider two packets to be in the same FEC if they have the
      same source address, the same destination address, the same
      transport protocol, the same source port, and the same destination
      port.

      5. Consider two packets to be in the same FEC if they are alike in
      some arbitrary manner determined by policy. Note that the
      assignment of a packet to a FEC by policy need not be done solely
      by examining the network layer header. One might want, for
      example, all packets arriving over a certain interface to be
      classified into a single FEC, so that those packets all get
      tunnelled through the network to a particular exit point.


   Other examples can easily be thought of.

   In case 1, the FEC can be identified by an address prefix (as
   described in Section 6.1). In case 2, the FEC can be identified by
   the address of a tag switch (as described in Section 6.1). Both 1 and
   2 are useful for binding tags to unicast routes - tags are bound to
   FECs, and an address prefix, or an address identifies a particular
   FEC. Case 3 is useful for binding tags to multicast trees that are


                                                               [Page 16]


Internet Draft    draft-rekhter-tagswitch-arch-00.txt       January 1997


   constructed by protocols such as PIM (as described in Section 6.3).
   Case 4 is useful for binding tags to individual flows, using, say,
   RSVP (as described in Section 6.4). Case 5 is useful as a way of
   connecting two pieces of a private network across a public backbone
   (without even assuming that the private network is an IP network) (as
   described in Section 6.5).

   Any number of different kinds of FEC can co-exist in a single tag
   switch, as long as the result is to partition the universe of packets
   seen by that tag switch. Likewise, the procedures which different tag
   switches use to classify (hitherto untagged) packets into FECs need
   not be identical.

   Networks could be organized around a hierarchy of FECs. For example,
   (non-adjacent) tag switches TSa and TSb may classify packets into
   some set of FECs FEC1,...,FECn.  However from the point of view of
   the intermediate tag switches between TSa and TSb, all of these FECs
   may be treated indistinguishably. That is, as far as the intermediate
   tag switches are concerned, the union of the FEC1,...,FECn is a
   single FEC.  Each intermediate tag switch may then prefer to use a
   single tag for this union (rather than maintaining individual tags
   for each member of this union). Tag switching accommodates this by
   providing a hierarchy of tags, organized in a stack.

   Much of the power of tag switching arises from the facts that:

      - there are so many different ways to partition the packets into
      FECs,

      - different tag switches can partition the hitherto untagged
      packets in different ways,

      - the route to be used for a particular FEC can be chosen in
      different ways,

      - a hierarchy of tags, organized as a stack, can be used to
      represent the network's hierarchy of FECs.

   Note that tag switching does not specify, as an element of any
   particular protocol, a general notion of "FEC identifier". Even if it
   were possible to have such a thing, there is no need for it, since
   there is no "one size fits all" setup protocol which works for any
   arbitrary combination of packet classifier and routing protocol.
   That's why tag distribution is sometimes done with TDP, sometimes
   with BGP, sometimes with PIM, sometimes with RSVP.


                                                               [Page 17]


Internet Draft    draft-rekhter-tagswitch-arch-00.txt       January 1997


8. Tag switching with ATM

   Since the tag switching forwarding paradigm is based on label
   swapping, and since ATM forwarding is also based on label swapping,
   tag switching technology can readily be applied to ATM switches by
   implementing the control component of tag switching.

   The tag information needed for tag switching can be carried in the
   VCI field. If two levels of tagging are needed, then the VPI field
   could be used as well, although the size of the VPI field limits the
   size of networks in which this would be practical. However, for most
   applications of one level of tagging the VCI field is adequate.

   To obtain the necessary control information, the switch should be
   able to support the tag switching control component. Moreover, if the
   switch has to perform routing information aggregation, then to
   support destination-based unicast routing the switch should be able
   to perform Network Layer forwarding for some fraction of the traffic
   as well.

   Supporting the destination-based routing function with tag switching
   on an ATM switch may require the switch to maintain not one, but
   several tags associated with a route (or a group of routes with the
   same next hop). This is necessary to avoid the interleaving of
   packets which arrive from different upstream tag switches, but are
   sent concurrently to the same next hop.

   If an ATM switch has built-in mechanism(s) to suppress cell
   interleave, then the switch could implement the destination-based
   routing function precisely the way it was described in Section 6.1.
   This would eliminate the need to maintain several tags per route.
   Note, however, that suppressing cell interleave is not part of the
   ATM User Plane, as defined by the ATM Forum.

   Yet another alternative that eliminates the need to maintain several
   tags per route is to carry the tag information in the VPI field, and
   use the VCI field for identifying cells that were sent by different
   tag switches. Note, however, that the scalability of this alternative
   is constrained by the size of the VPI space (4096 tags total).
   Moreover, this alternative assumes that for a set of ATM tag switches
   that form a contiguous segment of a network topology there exists a
   mechanism to assign to each ATM tag switch around the edge of the
   segment a set of unique VCIs that would be used by this switch alone.

   The downstream tag allocation on demand scheme is likely to be a
   preferred scheme for the tag allocation and TIB maintenance
   procedures with ATM switches, as this scheme allows efficient use of
   entries in the cross-connect tables maintained by ATM switches.


                                                               [Page 18]


Internet Draft    draft-rekhter-tagswitch-arch-00.txt       January 1997


   Implementing tag switching on an ATM switch simplifies integration of
   ATM switches and routers. From a routing peering point of view an ATM
   switch capable of tag switching would appear as a router to an
   adjacent router; this reduces the number of routing peers a router
   would have to maintain (relative to the common arrangement where a
   large number of routers are fully meshed over an ATM cloud). Tag
   switching enables better routing, as it exposes the underlying
   physical topology to the Network Layer routing. Finally tag switching
   simplifies overall operations by employing common addressing,
   routing, and management procedures among both routers and ATM
   switches. That could provide a viable, more scalable alternative to
   the overlay model. Because creation of tag binding is driven by
   control traffic, rather than data traffic, application of this
   approach to ATM switches does not produce high call setup rates, nor
   does it depend on the longevity of flows.

   Implementing tag switching on an ATM switch does not preclude the
   ability to support a traditional ATM control plane (e.g., PNNI) on
   the same switch. The two components, tag switching and the ATM
   control plane, would operate in a Ships In the Night mode (with
   VPI/VCI space and other resources partitioned so that the components
   do not interact).


9. Tag switching migration strategies

   Since tag switching is performed between a pair of adjacent tag
   switches, and since the tag binding information can be distributed on
   a pairwise basis, tag switching could be introduced in a fairly
   simple, incremental fashion. For example, once a pair of adjacent
   routers are converted into tag switches, each of the switches would
   tag packets destined to the other, thus enabling the other switch to
   use tag switching. Since tag switches use the same routing protocols
   as routers, the introduction of tag switches has no impact on
   routers. In fact, a tag switch connected to a router acts just as a
   router from the router's perspective.

   As more and more routers are upgraded to enable tag switching, the
   scope of functionality provided by tag switching widens. For example,
   once all the routers within a domain are upgraded to support tag
   switching, in becomes possible to start using the hierarchy of
   routing knowledge function.


                                                               [Page 19]


Internet Draft    draft-rekhter-tagswitch-arch-00.txt       January 1997


10. Summary

   In this paper we described the tag switching technology. Tag
   switching is not constrained to a particular Network Layer protocol -
   it is a multiprotocol solution. The forwarding component of tag
   switching is simple enough to facilitate high performance forwarding,
   and may be implemented on high performance forwarding hardware such
   as ATM switches. The control component is flexible enough to support
   a wide variety of routing functions, such as destination-based
   routing, multicast routing, hierarchy of routing knowledge, and
   explicitly defined routes. By allowing a wide range of forwarding
   granularities that could be associated with a tag, we provide both
   scalable and functionally rich routing. A combination of a wide range
   of forwarding granularities and the ability to evolve the control
   component fairly independently from the forwarding component results
   in a solution that enables graceful introduction of new routing
   functionality to meet the demands of a rapidly evolving computer
   networking environment.


11. Security Considerations

   Security considerations are not addressed in this document.


12. Intellectual Property Considerations

   Cisco Systems may seek patent or other intellectual property
   protection for some or all of the technologies disclosed in this
   document. If any standards arising from this document are or become
   protected by one or more patents assigned to Cisco Systems, Cisco
   intends to disclose those patents and license them under openly
   specified and non-discriminatory terms, for no fee.


13. Acknowledgments

   Significant contributions to this work have been made by Anthony
   Alles, Fred Baker, Paul Doolan, Guy Fedorkow, Jeremy Lawrence, Arthur
   Lin, Morgan Littlewood, Keith McCloghrie, and Dan Tappan.


                                                               [Page 20]


Internet Draft    draft-rekhter-tagswitch-arch-00.txt       January 1997


14. References


15. Authors' Addresses


      Yakov Rekhter
      Cisco Systems, Inc.
      170 Tasman Drive
      San Jose, CA, 95134
      E-mail: yakov@cisco.com

      Bruce Davie
      Cisco Systems, Inc.
      250 Apollo Drive
      Chelmsford, MA, 01824
      E-mail: bsd@cisco.com

      Dave Katz
      Cisco Systems, Inc.
      170 Tasman Drive
      San Jose, CA, 95134
      E-mail: dkatz@cisco.com

      Eric Rosen
      Cisco Systems, Inc.
      250 Apollo Drive
      Chelmsford, MA, 01824
      E-mail: erosen@cisco.com

      George Swallow
      Cisco Systems, Inc.
      250 Apollo Drive
      Chelmsford, MA, 01824
      E-mail: swallow@cisco.com

      Dino Farinacci
      Cisco Systems, Inc.
      170 West Tasman Drive
      San Jose, CA 95134
      E-mail: dino@cisco.com


                                                               [Page 21]