Network Working Group                                                
   Internet Draft                                             C. Proust 
   Document: draft-proust-flow-routing-00.txt            France Telecom 
   Expires: August 2004                                   February 2004 
    
    
    
    
                   An approach for Routing at Flow level 
    
                     draft-proust-flow-routing-00.txt 
    
    
    
    
    
Status of this Memo 
    
   This document is an Internet-Draft and is in full conformance with 
   all provisions of Section 10 of RFC2026 [RFC2026]. 
    
   Internet-Drafts are working documents of the Internet Engineering 
   Task Force (IETF), its areas, and its working groups.  Note that      
   other groups may also distribute working documents as Internet-
   Drafts. 
    
   Internet-Drafts are draft documents valid for a maximum of six months 
   and may be updated, replaced, or obsoleted by other documents at any 
   time.  It is inappropriate to use Internet-Drafts as reference 
   material or to cite them other than as "work in progress." 
    
   The list of current Internet-Drafts can be accessed at 
        http://www.ietf.org/ietf/1id-abstracts.txt 
   The list of Internet-Draft Shadow Directories can be accessed at 
        http://www.ietf.org/ietf/1shadow-sites.txt 
    
    
Abstract 
    
   This document proposes an approach that aims at load balancing flows 
   among multiple routes that are estimated to be slightly loaded.  
   Mainly, it designs a router-based, end-to-end, congestion control 
   mechanism by means of a piggybacking procedure which is distributed 
   both in routers and terminals.  This document proposes a set of 
   functional extensions that might be applied to the IPv4 protocol and 
   to the ISIS routing protocol.   
     



 
 
   Proust               Expires - August 2004                [Page 1] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
    
Table of Contents 
    
   1. Introduction...................................................3 
      1.1 A Brief History............................................3 
      1.2 Common Understanding of the Requirements...................4 
      1.3 Some Strong Architectural Beliefs..........................4 
      1.4 Design Philosophy..........................................5 
      1.5 Overview of the Architecture...............................6 
      1.6 Review of an Example......................................10 
   2. Identification of IP flow forwarding states...................16 
      2.1 Overview..................................................16 
      2.2 Format of the IP option header............................18 
      2.3 Role of the Terminal......................................20 
      2.4 Role of the Router........................................23 
      2.5 Review of an Example......................................26 
   3. Extension of the Forwarding plane.............................31 
      3.1 Overview..................................................31 
      3.2 FIB structure.............................................33 
      3.3 FIB update................................................33 
      3.4 Forwarding algorithm......................................36 
   4. Extension of the Routing Control plane........................38 
      4.1 Overview..................................................38 
      4.2 Statistic module..........................................41 
      4.3 RIB structure.............................................42 
      4.4 LLP control messages......................................43 
      4.5 Synchronization of the LLDB...............................47 
      4.6 RIB update................................................50 
      4.7 Route calculation algorithm...............................53 
   5. Ongoing Experiment............................................53 
   6. Conclusion....................................................54 
   IANA Considerations..............................................55 
      New IP Option.................................................55 
      New PDU Types.................................................55 
      New TLV Code..................................................55 
      New sub-TLV Codes.............................................56 
   Security Considerations..........................................56 
   Normative References.............................................56 
   Informative References...........................................57 
   Acknowledgments..................................................58 
   Author's Address.................................................58 
   IPR Notice.......................................................58 
   Full Copyright Statement.........................................58 
    





 
 
   Proust               Expires - August 2004                [Page 2] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
    
1. 
  Introduction 
    
1.1 
    A Brief History 
    
   While the Internet has dramatically evolved during the last decade in 
   terms of services, number of users, network sizes, or bandwidth 
   rates, there have not been any fundamental changes in the field of 
   routing apart from MPLS developments.  To quote from [IAB-FUTURE], 
   "The deployed Internet routing system assumes that the shortest path 
   is always the best path.  This is probably false, however it is a 
   reasonable compromise given the routing protocols currently 
   available.  The Internet lacks deployable approaches for policy-based 
   routing or routing with alternative metrics."   
    
   Hence today, it is envisioned that there is still an important need 
   for improvement in this area.   
   As an indication, one can underscore the past activity that has been 
   done in the working group QOSR (QOS Routing).  It delivered a 
   framework document [QOSRFRWK] that still serves as a reference in 
   this field.  Moreover, during this period (1996-1999) of focused 
   research two protocols, [QOSPF] and [OMP] that address part of the 
   requirements defined in the QOSR working group were designed.  
   Despite the interest of the IETF in these approaches, they were not 
   brought into alignment with the standard tracks mainly because they 
   were not appropriate for a large-scale deployment.  From the point of 
   view of the author of this draft, the experimental protocol [QOSPF] 
   seems not to be scalable enough, since it relies on a resource 
   signaling protocol that operates at flow level, such as, the Resource 
   reSerVation Protocol (RSVP), while the approach known as OMP [OMP] 
   seems to lack the required level of granularity and to be prone to 
   some instability.   
    
   As another argument, one could also cite the long term ongoing work 
   being conducted within the Routing Research Group of the IRTF [IRTF-
   RR], which is on the verge of publishing a set of routing 
   requirements for next generation Internet [IRTF-ROUTREQ]. 
    
   Moreover, a recent framework document [QOSARCFRWK] which is supported 
   by the IAB and that deals with QOS architectural considerations 
   applied to IP networks emphasizes the need for enhanced routing 
   capabilities.  On page 11 of this document, G. Huston states: "There 
   is a more fundamental issue here concerning resource management and 
   traffic engineering.  The approach of single path selection with 
   static load characteristics does not match a networked environment 
   which contains a richer mesh of connectivity and dynamic load 
   characteristics.  In order to make efficient use of a rich 
   connectivity mesh, it is necessary to be able to direct traffic with 
   a common ingress and egress point across a set of available network 
 
 
   Proust               Expires - August 2004                [Page 3] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
   paths, spreading the load across a broader collection of network 
   links."  
    
   All the remainder of this document addresses this problem statement.   
    
    
1.2 
    Common Understanding of the Requirements 
    
   Several documents served as the foundations to derive the 
   requirements that apply to this document. First of all, there is the 
   already mentioned framework document about QOS routing [QOSRFRWK].   
   Second, there is the more recent framework document about QOS 
   architecture [QOSARCFRWK].  Third, there are the specifications 
   documents of two experimental protocols that broaden the 
   understanding of the above problem statement, namely, protocols known 
   as the QOS Routing Mechanisms and OSPF Extensions (QOSPF), and the 
   Optimized Multi Path (OMP) approach.  Finally, the BCP 41 [BCP41] 
   about congestion control principles serves to check that the solution 
   is coherent with TCP and other elaborated end-to-end congestion 
   control mechanisms.   
    
   The remainder of this paragraph serves as a reminder of the 
   fundamental requirements that this solution needs to meet. 
    
      - the solution must provide for multiple paths between any node 
        pair 
       
      - the solution must optimize the use of the network, based on 
        knowledge of available resources (mainly link bandwidth) 
       
      - the solution must exhibit good properties in terms of 
        scalability, stability and performance 
       
      - the solution must not place a new security breach upon the 
        Internet architecture 
       
      - the solution must allow for a fine granularity of routing 
       
      - the solution may provide support for policy routing and may 
        provide management mechanisms to ease its deployment and 
        supervision 
    
    
1.3 
    Some Strong Architectural Beliefs 
    
   The solution described hereafter is based on several strong 
   assumptions.  


 
 
   Proust               Expires - August 2004                [Page 4] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
   First, "stateless" switching technology is considered necessary to 
   build an Internet architecture that exhibits good properties, such 
   as, scalability and robustness. 
   Second, the finer the level of granularity of network services, the 
   more value added the resulting network. 
   Third, resource access control mechanisms are keys to avoid 
   contention of resources.  Besides, over-provisioning technique is a 
   common practice among network operators.  This technique is 
   considered adapted to IP networks where IP traffic is unpredictable.   
   Furthermore, there is a fundamental lack of router-based end-to-end 
   congestion control mechanism [BCP41] in the Internet. 
   Finally, it is generally admitted that stability and simplicity [INT-
   ARCH2] are key concerns when promoting an architecture whose scope 
   challenges that of the Internet.   
    
    
1.4 
    Design Philosophy 
    
   As a consequence of the stateless argument, every datagram conveys 
   some form of signaling information about its ongoing IP 
   communication.  This is a characteristic of the IP protocol where 
   every datagram contains an information header enabling routers to 
   forward independently every datagram towards its final destination. 
   In order to cope with this characteristic, the solution that is 
   depicted, hereafter, relies on this stateless paradigm.  It defines a 
   new option for the IP protocol that allows dynamic flow forwarding 
   states information to be conveyed within every datagram.   
    
   With regard to the level of granularity to which this network service 
   applies, the solution examines routing at flow level in order to 
   increase the added value of those services.  Hence the definition of 
   a flow that applies to this architecture is that of an Internet 
   socket. 
    
   With regard to the third assumption, the architecture relies on a 
   resource access control mechanism that operates at flow level and 
   takes into account the availability level of network resources, 
   mainly link bandwidth.  However, over a long period of time, the 
   network resources would remain over-provisioned.   
    
   There are numerous RFCs [BCP41], [QOSRFRWK] that stress the need for 
   a router-based end-to-end congestion control mechanism within the 
   Internet.  As a result, it would be opportunistic to design a routing 
   solution that not only provides for IP connectivity but also that 
   controls congestion at the scale of the network.   
    
   With regard to the stability criteria, the routing solution must 
   first include mechanisms that ensure convergence within a reasonable 
   timeframe given the dynamic of the flows. 
 
 
   Proust               Expires - August 2004                [Page 5] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
   Second, the routing system must not generate an amount of information 
   about resource availability that is unmanageable. 
   Third, the routing system must guarantee that the flows are routed in 
   a stable way.  That is to say, that the system must not allow for 
   oscillations of traffic between different routes except when topology 
   changes.   
    
   To comply with the simplicity criteria the solution must keep the 
   overall architecture on a rather uncomplicated level.  Hence their 
   underlying mechanisms must be kept as simple as possible in order to 
   be implemented and operated on large scale.   
    
    
1.5 
   Overview of the Architecture 
    
   This architecture allows any IP datagram to be routed at flow level. 
   With that regard a flow is defined as an Internet socket. Moreover, 
   this routing model is able to load balance traffic efficiently, 
   taking into account real availability of network resources.   
   For this purpose, the architecture is composed of three complementary 
   functional building blocks. The first one is in charge of the routing 
   control. The second one takes care of the forwarding data plane. 
   Finally the third one is responsible for identifying flow forwarding 
   states and makes use of a marking function that affects the IP option 
   header.   
    
   The routing control part is based on any standard Interior Gateway 
   Protocol (IGP) of type Link State such as OSPF [OSPF] or ISIS [ISIS].  
   Those protocols are enhanced in order to take into account 
   information about network resource availability in their route 
   calculation algorithm.  For this purpose, their routing process 
   embeds a statistical function that periodically computes the level of 
   incoming and outgoing traffic circulating over their active 
   transmission links.  This aggregated information is generated by 
   every router and distributed among all routers, using specific Link 
   State Packets (LSP) and databases synchronization procedures.  For 
   convenience, these specific LSPs are called Link Load Packets (LLP).  
   As a consequence, this augmented Link State DataBase (LSDB) enables 
   every router to figure out the topology of the network, as well as, 
   the distribution of its traffic load.  Given that information, every 
   router is able to determine a set of routes that comply with multiple 
   criteria, such as, connectivity and availability, i.e. short routes 
   that are estimated to be free of congestion.  Moreover, when a 
   primary route seems to be overloaded, the routing system is able to 
   determine and activate an alternative route in order to capture some 
   of the overloading traffic. 
   To this end, the route calculation algorithm is based on an ordered 
   multi-criteria search algorithm of the "widest-enough shortest" type.  
   Furthermore, this path calculation algorithm may also take into 
 
 
   Proust               Expires - August 2004                [Page 6] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
   account some routing policy constraints.  For instance, one could set 
   up constraints such as: the cost of a route should be inferior to 
   some maximum threshold, a route should not be longer than 32 hops, or 
   no more than 15 different routes associated to a given destination 
   prefix should be allowed, etc).   
    
   To reflect this ability, the structure of the Routing Information 
   dataBase (RIB) associated to an IGP would be extended to include 
   three new attributes.  The first one called load, defines the level 
   of traffic load along the corresponding route for a given period.  
   The second one called index, allows multiple routes associated to a 
   destination prefix to be identified. The third one called path, 
   allows the routing process to efficiently refresh the load attribute, 
   associated to its routes when a set of new LLP is received.   
    
   Hence, given that sort of enhancement, the role of the routing 
   process becomes threefold: first, to calculate a set of short routes 
   free of congestion, second, to refresh the load attribute, associated 
   with the set of active routes, when a new set of LLP is received, and 
   third, to trigger an alternate route calculation algorithm when a 
   maximum load threshold is reached.   
    
   By means of symmetry in relation to the control plane, this 
   architecture enhances the data forwarding plane model in various 
   directions.   
   First, the structure of the Forwarding Information dataBase (FIB) is 
   extended in order to reflect the additional information provided by 
   the routing control plane.  Hence, every route of the FIB possesses 
   two more attributes: an indication of traffic load and an index 
   parameter.   
   Second, the forwarding algorithm is modified in order to take into 
   account several criteria, when transmitting any datagram towards its 
   final destination.  This route lookup algorithm is based on an 
   ordered list of criteria that, first of all, examines the destination 
   prefix, then the index parameter.  Therefore, this two-stage 
   procedure performs, first, a longest prefix matching operation 
   followed by an exact matching operation.   
   When the index parameter is invalid or equal to zero, the forwarding 
   process completes its route lookup operation using a selection 
   algorithm of the "widest-enough shortest" type.  This optional 
   procedure is applied to the subset of the FIB routes that were 
   identified at the preliminary stage of the longest match search.  
   This algorithm takes into account both the load attribute and the 
   cost attribute.  Besides, it is designed to be applied when 
   transmitting the first datagram of a bi-directional IP communication 
   or when some major topology changes affect the route followed by the 
   flow.  The role of this optional stage is to load balance new flows 
   among multiple routes that exhibit some good properties: being free 
   of congestion (e.g. traffic load attribute is under a maximum 
 
 
   Proust               Expires - August 2004                [Page 7] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
   threshold that could correspond to some over-provisioning 
   limitations), being the shortest (e.g. in relation to the cost 
   metric) among active under-loaded routes and satisfying some routing 
   policy constraints.   
    
   Hence, it is a route lookup processing that adapts to the type of the 
   datagram being forwarded.  
    
   When the first datagram of a new flow is transmitted through a 
   network, every router carries out the load-balancing procedure and 
   selects a specific index value that, from its viewpoint, optimizes 
   the use of the network at the time of the datagram is being 
   processed.   
    
   In order to cope with stability and performance criteria, once a 
   choice of load balancing is made, it should be maintained for all 
   subsequent datagrams belonging to the flow.  This function is known 
   in the literature as route pinning.  In this architecture it is 
   intimately intertwined with the forwarding processing.  To cope with 
   the strong assumption that scaling IP networks are stateless, the 
   route pinning function is carried out through a marking process that 
   is distributed both in routers and in terminals.  In brief, in this 
   functional decomposition, the role of the router is to determine 
   load-balancing index values (i.e., to calculate adequate flow 
   forwarding states), while the role of the terminal is to store and 
   remind the network of the previous sequence of index values (i.e., to 
   store the flow forwarding state information generated by every flow).  
   The desired result is obtained by means of a piggybacking procedure 
   that operates in a closed loop.  Every datagram that belongs to a bi-
   directional IP communication conveys within its header, both the 
   previous valid sequence of flow forwarding states, associated with 
   the forward flow being transmitted and the previous valid sequence of 
   flow forwarding states associated with the backward flow that would 
   be sent in response.  When the network is running smoothly, i.e. no 
   major outage   is affecting its topology, both sequences of flow 
   forwarding states relating to the forward and backward flows of a bi-
   directional IP communication, should not experience any change over 
   the duration of the communication, once they are calculated.   
    
   With regard to security considerations, the knowledge of both 
   sequences of flow forwarding states, relating to the forward and 
   backward flows of a bidirectional IP communication, is of no use from 
   the perspective of an end user.  In effect, since those values have 
   only some local meaning in relation to the FIB of the routers, 
   without the knowledge of all the FIBs of the routers that take place 
   along the route used by a flow, an end user cannot take advantage of 
   this information to supersede the routing decision enforced by the 
   network.  At worst, an end user that would inadvertently change any 
   value of one sequence of flow forwarding states, enforced by the 
 
 
   Proust               Expires - August 2004                [Page 8] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
   network, would lose the benefit of being transmitted efficiently 
   along the same route.  In terms of performance, that change could 
   affect the delivery timeframe, due to the reprocessing of a valid 
   sequence of flow forwarding state values, as well as, the reordering 
   of some datagrams at the receiving endpoint.   












































 
 
   Proust               Expires - August 2004                [Page 9] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
    
1.6 
   Review of an Example 
    
   In this example the topology of the network is represented in the 
   figure below.  It is composed of two IP terminals whose IP addresses 
   are S1 and D1.  Moreover, four interconnected routers named R1, R2, 
   R3 and R4 compose the IP network.  
    
            subnet A                                      subnet B 
    
     +----+    |                                             |   +----+ 
     | S1 |    |      ,---.          ,---.         ,---.     |   | D1 | 
     +----+____|     /     \        /     \       /     \    |___+----+ 
     +----+    |____(  R1   )......(   R2  ).....(   R3  )___|   +----+ 
               |     \     /        \     /       \     /    | 
                      `---'.         `---'       .'`---' 
                            `.                 .' 
                              `.            .-' 
                                `.        .' 
                                  `,---..' 
                                  /     \ 
                                 (   R4  ) 
                                  \     / 
                                   `---' 
    
                  Figure 1: Topology of the Example Network 
    
   In this scenario, the terminal S1 is first initializing a bi-
   directional IP communication with the terminal D1.   
    
   As this time, the distribution of the network traffic load is within 
   well acceptable boundaries.  In other words, there is no transmission 
   link in the network that experiences a traffic overload.   
   Therefore, in this context, the routing control plane of the network 
   has converged to form the simplest routing state where every router 
   has determined a unique route per destination prefix.   
    
   In a first approximation, a simplified version of the four RIBs and 
   therefore FIBs of the related network are represented by the four 
   figures below.  This approximation assimilates the router designation 
   names as the destination prefixes of the network.  It does not change 
   the logic presented hereafter.  Furthermore, the RIB tables include 
   two additional columns compared to a more conventional RIB table.  
   The first one called index, designates the rank of the route in 
   relation to its uniqueness to a given destination prefix, while the 
   second one called load, designates the short term traffic load 
   observed over the related path.  The maximum traffic load threshold 
   is arbitrarily fixed to 30 percent of the absolute available 
   bandwidth calculated over a route.  One can calculate such bandwidth 
 
 
   Proust               Expires - August 2004               [Page 10] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
   by considering the maximum of the elementary absolute bandwidths 
   associated with the transmission links that compose a given route.   
   The maximum threshold that identifies an acceptable level of network 
   congestion might be determined by following some over-provisioning 
   engineering rules. 
    
    +--------+-----+-------+------+     +--------+-----+-------+------+ 
    |dst pref|index|nexthop| load |     |dst pref|index|nexthop| load | 
    +--------+-----+-------+------+     +--------+-----+-------+------+ 
    |   R2   |  1  |  R2   | <30% |     |   R1   |  1  |  R1   | <30% | 
    +--------+-----+-------+------+     +--------+-----+-------+------+ 
    |   R3   |  1  |  R2   | <30% |     |   R3   |  1  |  R3   | <30% | 
    +--------+-----+-------+------+     +--------+-----+-------+------+ 
    |   R4   |  1  |  R4   | <30% |     |   R4   |  1  |  R1   | <30% | 
    +--------+-----+-------+------+     +--------+-----+-------+------+ 
    |   B    |  1  |  R2   | <30% |     |   B    |  1  |  R3   | <30% | 
    +--------+-----+-------+------+     +--------+-----+-------+------+ 
    |   A    |  1  |  A    | <30% |     |   A    |  1  |  R1   | <30% | 
    +--------+-----+-------+------+     +--------+-----+-------+------+ 
    
      Figure 2: RIB of router R1          Figure 3: RIB of router R2 
    
    
    +--------+-----+-------+------+     +--------+-----+-------+------+ 
    |dst pref|index|nexthop| load |     |dst pref|index|nexthop| load | 
    +--------+-----+-------+------+     +--------+-----+-------+------+ 
    |   R2   |  1  |  R2   | <30% |     |   R1   |  1  |  R1   | <30% | 
    +--------+-----+-------+------+     +--------+-----+-------+------+ 
    |   R1   |  1  |  R2   | <30% |     |   R2   |  1  |  R1   | <30% | 
    +--------+-----+-------+------+     +--------+-----+-------+------+ 
    |   R4   |  1  |  R4   | <30% |     |   R3   |  1  |  R3   | <30% | 
    +--------+-----+-------+------+     +--------+-----+-------+------+ 
    |   B    |  1  |  B    | <30% |     |   B    |  1  |  R3   | <30% | 
    +--------+-----+-------+------+     +--------+-----+-------+------+ 
    |   A    |  1  |  R2   | <30% |     |   A    |  1  |  R1   | <30% | 
    +--------+-----+-------+------+     +--------+-----+-------+------+ 
    
      Figure 4: RIB of router R3          Figure 5: RIB of router R 
    
   Hence at first, as the network is lightly loaded, the bidirectional 
   IP communication that takes place between terminal S1 and D1, is 
   routed along the primary route composed of router R1, R2 and R3. 
   The figure below represents the two flows f1 and f1' of that bi-
   directional IP communication. 
    
   For convenience, flow f1 is characterized by the quintuple S1, D1, p, 
   p' and protonum, where proto designates the number of higher end-to-
   end transport protocols, and p and p' designate the port numbers 
   identifying the communication points of terminals S1 and D1, 
 
 
   Proust               Expires - August 2004               [Page 11] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
   respectively.  By means of symmetry, the flow f1' is characterized by 
   the quintuple D1, S1, p', p and protonum. 
    
                                          flow f1' 
            subnet A    _.-----------------------.       subnet B 
                    ,--''   flow f1                `--. 
               | ,-'  ,----------------------------.   `-----| 
     +----+ <--+'  ,-'                              `------. |`- +----+ 
     | S1 | ---+--'   ,---.          ,---.         ,---.    `+-->| D1 | 
     +----+____|     /     \        /     \       /     \    |___+----+ 
     +-+-++    |____(  R1   )......(   R2  ).....(   R3  )___|   ++-+-+ 
               |     \     /        \     /       \     /    | 
                      `---'.         `---'       .'`---' 
                            `.                 .' 
                              `.            .-' 
                                `.        .' 
                                  `,---..' 
                                  /     \ 
                                 (   R4  ) 
                                  \     / 
                                   `---' 
    
      Figure 6: mapping of flows f1 and f1' on the primary route 
    
    
   A moment later, the network starts to experiment some traffic 
   overload at the transmission link located between router R2 and R3.  
   The figure below depicts this event. 
    
                                          flow f1' 
                         _.-----------------------.       subnet B 
                    ,--''   flow f1                `--.    
               | ,-'  ,----------------------------.   `-----| 
     +----+ <--+'  ,-'                              `------. |`- +----+ 
     | S1 | ---+--'   ,---.          ,---.         ,---.    `+-->| D1 | 
     +----+____|     /     \        /     \       /     \    |___+----+ 
     +-+-++    |____(  R1   )......(   R2  .XXXXX.   R3  )___|   ++-+-+ 
               |     \     /        \     /  |\`  \     /    | 
                      `---'.         `---'     \ .'`---' 
          subnet A          `.                 .\ 
                              `.            .-'  \ 
                                `.        .'      \ 
                                  `,---..'         \______________ 
                                  /     \            overloaded link 
                                 (   R4  ) 
                                  \     / 
                                   `---' 
    
      Figure 7: network experiencing a light traffic overload 
 
 
   Proust               Expires - August 2004               [Page 12] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
    
   At that point, the routing control plane detects such an event by 
   means of a newly received set of LLP messages.  As defined before, 
   LLP messages characterize the distribution of a network traffic load 
   in close relationship with its topology. 
   Hence this event triggers the calculation of secondary routes. 
    
   From the viewpoint of each router, only the routes that are affected 
   by the temporary unavailability of some transmission links are 
   susceptible to be reinforced by means of an alternative route.   
    
   In the present case, the overload of the transmission link located 
   between routers R2 and R3 has a different impact depending on the 
   considered router.  For instance from the viewpoint of router R3, 
   this occurrence affects all routes that rely on this specific 
   transmission link. Therefore, the router R3 calculates and activates 
   three secondary routes in its updated RIB table of router R3, which 
   is represented below on figure 12.  Meanwhile, from the viewpoint of 
   router R4, none of its routes are affected by this minor outage. 
   The figure 13 represents the RIB table of router R4 at this moment. 
    
    +--------+-----+-------+------+     +--------+-----+-------+------+ 
    |dst pref|index|nexthop| load |     |dst pref|index|nexthop| load | 
    +--------+-----+-------+------+     +--------+-----+-------+------+ 
    |   R2   |  1  |  R2   | <30% |     |   R1   |  1  |  R1   | <30% | 
    +--------+-----+-------+------+     +--------+-----+-------+------+ 
    |        |  1  |  R2   | >30% |     |        |  1  |  R3   | >30% | 
    |   R3   |-----+-------+------+     +   R3   +-----+-------+------| 
    |        |  2  |  R4   | <30% |     |        |  2  |  R1   | <30% | 
    +--------+-----+-------+------+     +--------+-----+-------+------+ 
    |   R4   |  1  |  R4   | <30% |     |   R4   |  1  |  R1   | <30% | 
    +--------+-----+-------+------+     +--------+-----+-------+------+ 
    |        |  1  |  R2   | >30% |     |        |  1  |  R3   | >30% | 
    |   B    |-----+-------+------+     +   B    +-----+-------+------| 
    |        |  2  |  R4   | <30% |     |        |  2  |  R1   | <30% | 
    +--------+-----+-------+------+     +--------+-----+-------+------+ 
    |   A    |  1  |  A    | <30% |     |   A    |  1  |  R1   | <30% | 
    +--------+-----+-------+------+     +--------+-----+-------+------+ 
    
      Figure 10: RIB of router R1         Figure 11: RIB of router R2 
    
    
    
    
    
    
    
    
    
 
 
   Proust               Expires - August 2004               [Page 13] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
    
    +--------+-----+-------+------+ 
    |dst pref|index|nexthop| load | 
    +--------+-----+-------+------+ 
    |        |  1  |  R2   | >30% | 
    |   R2   |-----+-------+------+ 
    |        |  2  |  R4   | <30% | 
    +--------+-----+-------+------+     +--------+-----+-------+------+ 
    |        |  1  |  R2   | >30% |     |dst pref|index|nexthop| load | 
    |   R1   |-----+-------+------+     +--------+-----+-------+------+ 
    |        |  2  |  R4   | <30% |     |   R1   |  1  |  R1   | <30% | 
    +--------+-----+-------+------+     +--------+-----+-------+------+ 
    |   R4   |  1  |  R4   | <30% |     |   R2   |  1  |  R1   | <30% | 
    +--------+-----+-------+------+     +--------+-----+-------+------+ 
    |   B    |  1  |  B    | <30% |     |   R3   |  1  |  R3   | <30% | 
    +--------+-----+-------+------+     +--------+-----+-------+------+ 
    |        |  1  |  R2   | >30% |     |   B    |  1  |  R3   | <30% | 
    |   A    |-----+-------+------+     +--------+-----+-------+------+ 
    |        |  2  |  R4   | <30% |     |   A    |  1  |  R1   | <30% | 
    +--------+-----+-------+------+     +--------+-----+-------+------+ 
    
      Figure 12: RIB of router R3         Figure 13: RIB of router R 
    
   Finally, later on, terminal S1 and D1 are engaged in a secondary 
   bidirectional IP communication characterized by two flows f2 and f2'.  
   This time, as the primary route is already slightly loaded, it is 
   expected that both routers R3 and R1 will route the datagrams of 
   flows f2 and f2' along the secondary route.  This secondary route is 
   composed of the sequence of routers R1, R4 and R3. 
    
   The main advantage of the design of this architecture is that it 
   allows flows f1 and f1' to still be forwarded along the primary route 
   which is estimated to be lightly overloaded, while allowing more 
   recent flows f2 and f2' to be forwarded along a secondary route which 
   is believed to be under loaded.   
    
   This capability is enabled through the enhancement of the forwarding 
   plane of all routers.   
   First, their RIB tables are extended to include two new attributes 
   per route: an index parameter and the traffic load attribute.   
   Second, their forwarding algorithm is extended to take into account 
   an index value, in addition to, the destination address of the IP 
   header when processing an IP datagram, received on one of its 
   interfaces.   
   Moreover, the route lookup processing is extended to operate 
   differently according to the type of request.   
   When the request concerns the forwarding of a new flow (e.g. one that 
   concerns a datagram that has not been marked as such by routers), the 
   procedure is carried out entirely.  That means that the router 
 
 
   Proust               Expires - August 2004               [Page 14] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
   selects the route, whose prefix destination matches the address 
   destination contained in the IP header, and for which the traffic 
   load value is under the maximum threshold and exhibits the lowest 
   cost (in relation to the metric being considered by the routing 
   protocol).   
   When the request concerns the forwarding of a flow that belongs to an 
   already ongoing bi-directional IP communication (e.g. one that 
   concerns a datagram that has already been marked as such by routers) 
   the procedure is shortcut.  It is only based on the destination 
   address and the index values read in the IP header of the 
   datagram...The index value serves as a refinement of the lookup 
   processing.  It indicates which load-balancing choice was assigned to 
   the forwarding of this flow.   
   The way the terminal can store such information is through the use of 
   a piggybacking procedure that operates in a closed loop.  Every IP 
   header conveys both the sequence of choices made by the routers 
   located on the route used by a forwarded datagram, therefore, 
   relating to a forward flow, as well as, the sequence of choices made 
   by the routers located on the route used by a prior datagram that 
   relates to the backward flow.  Since by design, this architecture 
   operates in a closed loop, it applies first to bi-directional IP 
   communication. 
    
   The figure 14 below represents the routing of all flows depicted in 
   this scenario among the available resources. 
    
                                          flow f1' 
          subnet A       _.-----------------------.       subnet B 
                    ,--''   flow f1                `--. 
               | ,-'  ,----------------------------.   `-----| 
     +----+ <--+'  ,-'                   overloaded `------. |`- +----+ 
     | S1 | ---+--'   ,---.          ,---.   link  ,---.    `+-->| D1 | 
     +----+____|     /     \        /     \       /     \    |___+----+ 
     +-+-++    |____(  R1   )......(   R2  .XXXXX.   R3  )__,+-- ++-+-+ 
            <--+---. \     /        \     /       \     /  / |--> 
            ---|--. `.`---'.         `---'       .'`-,----',' 
                   `-.`----.`.                 .  ,-' ,---' 
              flow f2 `--.  \ `. flow f2Æ   .-'  / ,-' 
                          `. `---.        -' _.-' / 
                            `--.  `,---.'.-''_.--' 
                                ` /-----\--'' 
                                 (  R4   ) 
                                  \     / 
                                   `---' 
    
   Figure 14: mapping of all flows f1, f1', f2 and f2' on the routes 
    
    
    
 
 
   Proust               Expires - August 2004               [Page 15] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
    
    
2. 
  Identification of IP flow forwarding states 
    
   The overall architecture aims at load balancing efficiently IP flows 
   among multiple routes that are believed to be free of congestion.   
   This architecture supports such a service due to a combination of 
   three functional building blocks that operate in a complementary way. 
    
   This chapter describes the low level one entitled "identification of 
   IP flow forwarding states" in comparison with the control plane which 
   is said to be on a higher level in the load-balancing processing 
   chain. 
    
   This low level functional building block first serves to pin the 
   flows on the selected routes. 
    
2.1 
   Overview 
    
   This functional building block is intimately intertwined with the 
   forwarding data plane, since it identifies the index attribute of a 
   route of the FIB as a flow forwarding state.  The assumption, here, 
   is that the combination of a destination prefix and of a sequence of 
   index attributes might be used to uniquely characterize a route.   
   This assumption is reasonably acceptable on condition that the 
   routing control plane of the network has converged.   
    
   Every router that contributes to the forwarding of a datagram has to 
   determine its next-hop router according to the load-balancing 
   service.  Hence, as the route lookup processing is applied to the 
   FIB, an index value is identified at each hop during the journey of a 
   datagram that crosses the network.   
    
   By analogy with record routing, the basic idea consists in writing 
   such a sequence of flow forwarding states in the IP option header of 
   a datagram, while it is transmitted through the network.   
    
   As mentioned before, the support of such load-balancing service 
   requires IP flows to relate to some bi-directional IP communications. 
   Hence, this supplementary assumption is used to design a piggybacking 
   procedure that would make both terminal endpoints of an IP 
   communication aware of the recorded sequence of flow forwarding 
   states which would subsequently contribute to the efficient 
   forwarding of their own flow.   
    
   To this end, a new option for the IP protocol is provided.   
   It is intentionally called "Identification of IP Flow Forwarding 
   States". 

 
 
   Proust               Expires - August 2004               [Page 16] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
   The data structure of this option header is designed to contain two 
   vectors of flow forwarding state values (e.g. index values that 
   identify routes in the FIB) and two index fields that allow both 
   vectors to be addressed.   
    
   Furthermore, this IP option is designed to operate differently 
   according to the type of system being addressed.   
    
   In short, the role of the routers is to identify, mark and validate 
   the sequence of flow forwarding states, included in the IP option 
   header.  To this effect, each router makes use of the first vector 
   and of the related index field, included in the IP option header. 
    
   To the contrary, the role of the terminals is, on one hand, to record 
   the complete sequence of flow forwarding states extracted from 
   received datagrams, and, on the other hand, to initialize the various 
   fields of the IP option header when sending new data.  To this end, 
   every terminal that engages in a bi-directional IP communication 
   creates a context of communication session that enables it to store 
   all needed information of interest.  Anytime a terminal receives a 
   new datagram, it might update this IP option-based information (the 
   two vectors and the two index values), stored in the related context 
   of communication that, in particular, characterizes the route 
   followed by the flows.   
    
   Due to this functional decomposition, the number of states created in 
   the routers remains independent of the number of flows.  Therefore, 
   this solution addresses some scalability concerns.   
    
   Therefore, the core processing of this IP option articulates around 
   two simple sets of functions. 
    
   When processing this IP option, a router needs to: 
      - identify a flow forwarding state 
      - mark a flow forwarding state in the first vector of the IP 
        option header 
      - validate a flow forwarding state  
    
   When processing this IP option a terminal needs to: 
      - record both sequences of flow forwarding states from received 
        datagrams 
      - manage a context of an IP communication session 
      - initialize both sequences of flow forwarding states in sent 
        datagrams 
    
   The subsequent paragraphs gives details of the processing carried out 
   by each system.   
    
    
 
 
   Proust               Expires - August 2004               [Page 17] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
    
    
2.2 
   Format of the IP option header 
    
   This paragraph describes the format of this IP option header that 
   complies with version 4 of the Internet Protocol. 
    
    0                   1                   2                   3    
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1  
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
   |Version|  IHL  |Type of Service|          Total Length         | 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
   |         Identification        |Flags|      Fragment Offset    | 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
   |  Time to Live |    Protocol   |         Header Checksum       | 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
   |                       Source Address                          | 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
   |                    Destination Address                        | 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
   |  Option Type |  Option Length |      FVI      |      BVL      | 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
   |                                                               | 
   +                                                               + 
   |                                                               | 
   +                                                               + 
   |                          FFIV                                 | 
   +                                                               + 
   |                                                               | 
   +                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
   |                               |                               | 
   |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|                               | 
   |                                                               | 
   +                                                               + 
   |                                                               | 
   +                          BFIV                                 + 
   |                                                               | 
   +                                                               + 
   |                                                               | 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
 
               Figure 15: Format of the IP option Header 
    

 
 
   Proust               Expires - August 2004               [Page 18] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
   The Internet Protocol specification allows for the definition of any 
   optional IP processing.  To this end, it provides the definition of a 
   generic optional header in its header format.  The former is defined 
   using TLV style coding.  Hence, the first two bytes identify the 
   option type and the option value.  Moreover, the maximum size of an 
   optional header is limited to 40 bytes.  
    
   Given that information, the IP option header of type ôIdentification 
   of Flow Forwarding Statesö is designed to use the maximum size 
   allowed.  It contains two vectors and two index fields.   
    
   Thus, the IHL field has a value of 15 when the Option Length field 
   has a value of 40.   
    
   The data field called Forwarding Vector Index (FVI) is one byte long.  
   It represents the index of the first following vector.  The routers 
   use this index value to point to the flow forwarding state of the 
   first vector that is to be associated with them when processing this 
   IP option.  It is incremented at each hop.  When the datagram reaches 
   its final destination, this field represents the length of the route 
   followed by the present datagram.   
    
   The data field called Backward Vector Length (BVL) is one byte long.  
   It represents the length of the route followed by the prior datagram 
   of the backward flow in relation to the present datagram.   
    
   The data field called Forward Flow Information Vector (FFIV) is 18 
   bytes long.  The routers use it to characterize/refine the route 
   followed by the present datagram that relates to the forward flow.   
   The characterization of the route is based on a combination of a 
   destination prefix and a sequence of route indexes.  In the context 
   of this functional building block, the route index is called ôflow 
   forwarding stateö.  Each value is encoded using 4 bits.  Hence the 
   FFIV vector might record the characterization of a route of 36 hops 
   maximum length.  Due to the 4 bits provision, a route index value 
   (i.e. a flow forwarding state) might vary between 0 and 15 integer 
   values.  The first element of the vector would reference the index 
   value of the route selected by the first router when forwarding a 
   datagram.  The second element of the vector would reference the index 
   value of the route selected by the second router when forwarding a 
   datagram and so forth.   
    
   The data field called Backward Flow Information Vector (BFIV)is 18 
   bytes long.  It serves the purpose of the control closed loop that 
   operates on the highest level of the routing control plane.  It is 
   piggybacked on every forwarded datagram to convey the 
   characterization of the route followed by a recent datagram which 
   relates to the backward flow.  It uses the same encoding convention 
   as that described for the FFIV vector.  In accordance with the inner 
 
 
   Proust               Expires - August 2004               [Page 19] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
   logical reasoning of this control closed loop, each time a terminal 
   sends a datagram, it should initialize the BFIV vector and BVL index 
   field of its optional header with the related information contained 
   in its created context of the communication session.  This context of 
   the communication session allows previous FFIV vector and FVI index 
   value, consigned to the optional header of a received datagram in 
   relation to the present communication session, to be stored as BFIV 
   vector and BVL index value for future datagrams sending. 
    
2.3 
   Role of the Terminal 
 
2.3.1 Creation of a context of a communication session 
    
   When a terminal engages in a bi-directional IP communication it 
   should create an associated context of a communication session for 
   the purpose of managing the data of the routing control loop that are 
   consigned in the IP option header of all datagrams.  Namely, FFIV and 
   BFIV vectors as well as the FVI index and BVL length values. 
   Such a context of a communication session might be as simple as a 
   combination of information that defines an Internet socket along with 
   the content of the four fields defined for this IP option.   
    
   To give an example, the context of a communication session, in 
   relation to a bi-directional IP communication between two endpoint 
   terminals whose IP addresses are S and D, which, furthermore, relies 
   on the transport protocol TCP and that uses port numbers p and pÆ, 
   might be defined from the viewpoint of terminal S by the following 
   combination of information fields. 
      - forward flow 
           o source address: S 
                . should be 4 bytes long 
                . its initial value is set to the IP address of the 
                  local host 
           o destination address: D 
                . should be 4 bytes long 
                . its initial value is set to the IP address of the 
                  remote host 
           o source port number: p 
                . should be 2 bytes long 
                . its initial value is set by the application protocol 
           o destination port number: pÆ 
                . should be 2 bytes long 
                . its initial value is set by the application protocol 
           o transport protocol number: TCP 
                . should be 1 byte long 
                . its initial value is set by the transport protocol 
                  over which runs the application protocol 
           o forward flow information vector: FFIV 
                . should be 18 bytes long 
 
 
   Proust               Expires - August 2004               [Page 20] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
                . its initial value is set to the null vector, that is a 
                  vector full of zero integer values 
           o forward vector index: FVI 
                . should be 1 byte long 
                . it is a constant whose initial value is set to one 
                  integer value 
           o forward vector length: FVL 
                . should be 1 byte long 
                . its initial value is set to zero integer value 
                 
      - backward flow 
           o source address: D 
                . should be 4 bytes long 
                . its initial value is set to the IP address of the 
                  local host 
           o destination address: S 
                . should be 4 bytes long 
                . its initial value is set to the IP address of the 
                  remote host 
           o source port number: pÆ 
                . should be 2 bytes long 
                . its initial value is set by the application protocol 
           o destination port number: p 
                . should be 2 bytes long 
                . its initial value is set by the application protocol 
           o transport protocol number: TCP 
                . should be 1 byte long 
                . its initial value is set by the transport protocol 
                  over which runs the application protocol 
           o backward flow information vector: BFIV 
                . should be 18 bytes long 
                . its initial value is set to the null vector, that is a 
                  vector full of zero integer values 
           o backward vector length: BVL 
                . should be 1 byte long 
                . its initial value is set to zero integer value 
    
   From the viewpoint of terminal D, the context of communication 
   session is obtained by inversion of the roles of receiver and sender.  
    
    
2.3.2 Receiving a datagram 
    
   Whenever a terminal receives a datagram with this IP option enabled 
   it should copy the content of the optional header in the related 
   context of the communication session.  If such a context does not 
   exist at the time the datagram is received, the terminal should 
   create one prior to the copying operation.   
    
 
 
   Proust               Expires - August 2004               [Page 21] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
   Whatever the case, the copy of the content of the optional header 
   should be done as depicted hereafter.  
    
   The FFIV vector consigned to the optional header of the received 
   datagram should be copied in the BFIV vector data element of the 
   related context of communication session.   
    
   The FVI index value consigned to the optional header of the received 
   datagram should be copied in the BVL integer data element of the 
   related context of communication session.   
    
   The BFIV vector consigned to the optional header of the received 
   datagram should be copied in the FFIV vector data element of the 
   related context of communication session.   
    
   The BVL length value consigned to the optional header of the received 
   datagram should be copied in the FVL integer data element of the 
   related context of communication session.   
    
   The FVI integer data element of the related context of a 
   communication session is a constant equal to the one integer value.   
    
    
2.3.3 Sending a datagram 
    
   Whenever a terminal sends a datagram with this IP option enabled, it 
   should initialize the four fields of the optional header with the 
   more related up-to-date data elements recorded in the related context 
   of communication session.   
   If such a context does not exist at the time a datagram is sent, the 
   terminal should create one prior to this initializing operation. 
    
   Whatever the case, the initialization of the four fields of the 
   optional header should be done as described hereafter. 
    
   The FFIV vector consigned to the optional header of the datagram 
   about to be sent, should be initialized with the most up-to-date FFIV 
   vector data element of the related context of the communication 
   session.   
    
   The FVI index value consigned to the optional header of the datagram 
   about to be sent should be initialized with the 1 integer value.  
    
   The BFIV vector consigned to the optional header of the datagram 
   about to be sent should be initialized with the most up-to-date BFIV 
   vector data element of the related context of the communication 
   session.   
    

 
 
   Proust               Expires - August 2004               [Page 22] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
   The BVL length value consigned to the optional header of the datagram 
   about to be sent should be initialized with the most up-to-date BVL 
   integer data element of the related context of the communication 
   session.   
    
    
2.3.4 Managing a context of a communication session 
    
   For statistical purposes, it might be interesting to integrate, in 
   the processing of this IP option, a session manager that would record 
   all past values of the FFIV, FVL, BFIV and BVL data elements of the 
   related context of the session.   
   For instance, the evolution of the FFIV vector might be an indication 
   with regard to the stability of the network topology being used.   
   Also, the evolution of the FVL fields might indicate the variation of 
   the length of the route being used. 
    
    
2.4 
   Role of the Router 
    
   From the viewpoint of a router, this IP option specifies a forwarding 
   mechanism for network resource optimization purposes.   
   Its algorithm proceeds as described hereafter. 
   Upon receiving a datagram on one of its interfaces, a router first 
   checks, as usual, the validity of the IP header.  Then it proceeds to 
   the read the destination address consigned to the IP header.. 
    
   In case the datagram is intended for the router itself, its 
   forwarding algorithm should deliver it to the related endpoint 
   process.   
    
   Conversely, the router should proceed to a modified route lookup 
   operation.   
   In a first stage, given the destination address of the datagram, the 
   router proceeds to the longest match search of the destination prefix 
   in its FIB.  Having said this, the FIB may contain several routes for 
   a given destination prefix.  For this reason, the router proceeds to 
   the second stage of the route lookup procedure to refine the routing 
   selection process.   
   Hence, the router should identify the flow forwarding state of this 
   datagram that relates to the state of the forwarding plane of this 
   router.  To do so, it reads the FVI index value consigned to the IP 
   option header.  This value indicates the row of the FFIV vector, 
   where the flow forwarding state that relates to this router, should 
   be.  Therefore, the router identifies this flow forwarding state 
   (FFS) by reading the FFIV[FVI] indexed value which is coded using 4 
   bits. 
    

 
 
   Proust               Expires - August 2004               [Page 23] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
   In case the FFS read value is zero, this means the datagram belongs 
   to a new flow for which no previous FFS value has been set.   
   Therefore the router selects the best route among those that were 
   pre-selected at the previous longest match stage.  The multi-criteria 
   selection algorithm is of type widest-enough shortest.. In other 
   words, as the routes possess two metrics, cost and traffic load, the 
   selection process should consider the shortest route in relation to 
   the cost metric, which has enough bandwidth.   
   Finally, the index value of the selected route defines the new value 
   of the flow forwarding state for that datagram.  Thus, the forwarding 
   process marks this index value in the FFIV vector of the optional 
   header at the row indicated by the FVI index value.  Then the FVI 
   index value is incremented by one unity in the optional header and 
   the datagram is forwarded along the selected route. 
    
   In case the FFS read value is different from zero, this means the 
   datagram belongs to a flow for which a previous FFS value has been 
   set.  Therefore, the router refines the routing selection process.  
   First, it checks the validity of the FFS read value by verifying that 
   an associated route does still exist.  In relation to a network 
   topology change, it may happen that an index value is not valid 
   anymore.   
   If this is the case, the datagram should be directed along this 
   route.   
   In doing so, the flow remains stuck to the route that was selected at 
   the beginning of the communication.  Then the FVI index value is 
   incremented by one unity in the optional header and the datagram is 
   forwarded along the selected route. 
    
   If this is not the case, because of some network topology change or 
   because of some misbehaving or malfunctioning terminal, the router 
   should apply the selection procedure described above which relates to 
   the forwarding of the initial datagram of a flow.   
















 
 
   Proust               Expires - August 2004               [Page 24] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
    
   The second stage of this forwarding algorithm is depicted in figure 
   16. 
    
      +---------+ 
      |read FVI | 
      +----+----+ 
           | 
    +------v-------+ 
    |read FFIV[FIV]| 
    +------+-------+ 
           | 
         ,-v---. 
        /  is   \ 
       /FFIV[FVI]\  yes 
      (   equal   )-----------------------+ 
       \  to 0 ? /                        | 
        \       /                         | 
         `--+--'                          | 
            |no                           | 
            |                             | 
         ,--v--.                          | 
        /  does \             +-----------v-----------+ 
       /FFIV[FVI]\ no         | widest-enough shortest| 
      ( belongs to)---------->| selection of a route  | 
       \ the sub /            |    in the sub FIB     | 
        \ FIB ? /             +-----------+-----------+ 
         `--+--'                          | 
            |yes                          | 
            |                 +-----------v------------+ 
            |                 | write the index of the | 
            |                 | selected route in the  | 
            |                 | FFIV[FIV] vector field | 
            |                 | of the IP option header| 
            |                 +-----------+------------+ 
            |                             | 
            |                             | 
            |                  +----------v---------+ 
            +----------------->|increment FVI in the| 
                               |  IP option header  | 
                               +----------+---------+ 
                                          | 
                                 +--------v--------+ 
                                 |transmit datagram| 
                                 +-----------------+ 
    
      Figure 16: refinement step of the routing selection process 
    
    
 
 
   Proust               Expires - August 2004               [Page 25] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
    
2.5 
   Review of an Example 
    
   The network topology, to which this example applies, is the one 
   described in the introductory chapter. It is represented in figure 1. 
    
   As mentioned earlier, the design of this architecture only applies to 
   bi-directional IP communications.  In that respect, we consider in 
   this example a bi-directional IP communication that takes place 
   between terminals S1 and D1.  As mentioned before, the network is 
   globally under loaded. Hence, the routing control plane has converged 
   around a simple network routing state, where there exists only one 
   available route at that time.  The primary route calculated to 
   transport both flows of the newly created IP communication is built 
   on the sequence of routers R1, R2 and R3. 
    
   The figure 17 below details the processing stages of the IP option, 
   as and when the datagram crosses the network.  The higher part of the 
   diagram represents the network topology, whereas, the lower part 
   depicts the evolution of the content of the optional header while a 
   datagram is, little by little, forwarded through the network.   
    
         +----+        ,---.         ,---.        ,---.         +----+ 
         | S1 |-------(  R1 )-------(  R2 )------(  R3 )--------| D1 | 
         +----+        `---'         `---'        `---'         +----+ 
       ^ +----+    ^             ^            ^            ^    +----+ ^ 
       |           |             |            |            |           | 
       |           |             |            |            |           | 
       v           v             v            v            v           v 
       +---------+   +---------+  +---------+  +---------+   +---------+ 
       |IP header|   |IP header|  |IP header|  |IP header|   |IP header| 
   FVI |   1     |   |    2    |  |    3    |  |   4     |   |   4     | 
   FFIV| 0,...,0 |-->| 1,0,..,0|->| 1,1,0,..|->|1,1,1,0,.|-->| 1,1,1,0,| 
   BVL |   0     |   |    0    |  |    0    |  |   0     |   |   0     | 
   BFIV| 0,...,0 |   | 0,...,0 |  | 0,...,0 |  | 0,...,0 |   | 0,...,0 | 
       +---------+   +---------+  +---------+  +---------+   +---------+ 
       |         |   |         |  |         |  |         |   |         | 
       | payload |   | payload |  | payload |  | payload |   | payload | 
       |         |   |         |  |         |  |         |   |         | 
       +---------+   +---------+  +---------+  +---------+   +---------+ 
          (A)            (B)          (C)          (D)           (E) 
    
   Figure 17: forwarding of the first datagram of flow f1 along the 
   primary route built on the sequence of R1, R2 and R3 routers 
    
   At stage labeled (A), the terminal S1 is initiating the bi-
   directional IP communication with terminal D1.  As such, the terminal 
   creates a context of the communication session, which is 
   characterized by the set of values given below.   
 
 
   Proust               Expires - August 2004               [Page 26] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
    
      - forward flow 
           o source address: S1 
           o destination address: D1 
           o source port number: p 
           o destination port number: pÆ 
           o transport protocol number: TCP 
           o forward flow information vector: FFIV = (0, 0,à, 0) 
           o forward vector index: FVI = 1 
           o forward vector length: FVL = 0 
            
      - backward flow 
           o source address: D1 
           o destination address: S1 
           o source port number: pÆ 
           o destination port number: p 
           o transport protocol number: TCP 
           o forward flow information vector: BFIV = (0, 0,à, 0) 
           o forward vector index: BVL = 0 
    
   Given that initialization step, the terminal packages the datagram to 
   be sent with the initial values defined for that context and 
   transmits it towards its access router, namely R1.  The figure above 
   shows at that time the content of the four fields that relates to 
   this IP option. 
    
   At stage (B), the router R1 identifies that the datagram relates to a 
   new flow since the FFVI[1] value is equal to zero.  Thus, R1 carries 
   out a full route lookup operation, independently.  As its FIB 
   contains only one route that matches the destination address of the 
   datagram, it is unambiguously selected.  Its index value is one.  
   According to the IP option processing, this index value is written in 
   the FFIV vector at a place indicated by the FVI index value.  Hence, 
   at that time FFIV[FVI] is set to one in the IP option header.  
   Moreover, the FVI index value is incremented by one unity and the 
   datagram is transmitted towards the next-hop router R2.   
    
   At stages (C) and (D), router R2 and R3 successively perform the same 
   processing chain as depicted in stage (B) on that datagram.   
    
   At stage (E), the terminal D1 that receives this first datagram 
   creates its own context of communication session.  This time it is 
   characterized by the set of values given below.   
    
      - forward flow 
           o source address: D1 
           o destination address: S1 
           o source port number: pÆ 
           o destination port number: p 
 
 
   Proust               Expires - August 2004               [Page 27] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
           o transport protocol number: TCP 
           o forward flow information vector: FFIV = (0, 0,à, 0) 
           o forward vector index: FVI = 1 
           o forward vector length: FVL = 0 
            
      - backward flow 
           o source address: S1 
           o destination address: D1 
           o source port number: p 
           o destination port number: pÆ 
           o transport protocol number: TCP 
           o forward flow information vector: BFIV = (1, 1, 1, 0,à, 0) 
           o forward vector index: BVL = 4 
    
   Doing so, it has first copied the FFIV vector and FVI index value, 
   contained in the option header of the received datagram, to 
   initialize the BFIV and BVL data elements that relate to the backward 
   flow of the newly created context of communication session.  Then, it 
   has copied the BFIV vector and BVL length value, contained in the 
   option header of the received datagram, to initialize the FFIV and 
   FVL data elements that relate to the forward flow of the context of 
   communication session.  Moreover, the constant FVI is set to the one 
   integer value.   
    
   At stage (F), the terminal D1 sends a datagram in response to 
   terminal S1.  In turn, the terminal D1 packages the datagram to be 
   sent with the initial values defined for that context and transmits 
   it towards its access router, namely R3. 
    
   At stages (G), (H) and (I), router R3, R2 and R1 successively perform 
   the same processing chain as depicted in stage (B) on that datagram. 
    
   At stage (J), the terminal S1 that receives this first datagram 
   updates its related context of communication session.  Thus, at that 
   time it is characterized by the set of values given below. 
    
      - forward flow 
           o source address: S1 
           o destination address: D1 
           o source port number: p 
           o destination port number: pÆ 
           o transport protocol number: TCP 
           o forward flow information vector: FFIV = (1, 1, 1, 0,à, 0) 
           o forward vector index: FVI = 1 
           o forward vector length: FVL = 4 
            
      - backward flow 
           o source address: D1 
           o destination address: S1 
 
 
   Proust               Expires - August 2004               [Page 28] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
           o source port number: pÆ 
           o destination port number: p 
           o transport protocol number: TCP 
           o forward flow information vector: BFIV = (1, 1, 1, 0,à, 0) 
           o forward vector index: BVL = 4 
    
   Doing so, it has first copied the FFIV vector and FVI index value 
   contained in the option header of the received datagram to update the 
   BFIV and BVL data elements that relate to the backward flow of the 
   context of communication session.  Then, it has copied the BFIV 
   vector and BVL length value, contained in the option header of the 
   received datagram, to update the FFIV and FVL data elements that 
   relate to the forward flow of the context of communication session.   
    
         +----+        ,---.         ,---.        ,---.         +----+ 
         | S1 |-------(  R1 )-------(  R2 )------(  R3 )--------| D1 | 
         +----+        `---'         `---'        `---'         +----+ 
       ^ +----+    ^             ^            ^            ^    +----+ ^ 
       |           |             |            |            |           | 
       |           |             |            |            |           | 
       v           v             v            v            v           v 
       +---------+   +---------+  +---------+  +---------+   +---------+ 
       |IP header|   |IP header|  |IP header|  |IP header|   |IP header| 
   FVI |   4     |   |    4    |  |    3    |  |   2     |   |   1     | 
   FFIV|1,1,1,0,.|<--|1,1,1,0,.|<-|1,1,0,.. |<-|1,0,..,0 |<--| 0,...,0 | 
   BVL |   4     |   |    4    |  |    4    |  |   4     |   |   4     | 
   BFIV|1,1,1,0,.|   |1,1,1,0,.|  |1,1,1,0,.|  |1,1,1,0,.|   |1,1,1,0,.| 
       +---------+   +---------+  +---------+  +---------+   +---------+ 
       |         |   |         |  |         |  |         |   |         | 
       | payload |   | payload |  | payload |  | payload |   | payload | 
       |         |   |         |  |         |  |         |   |         | 
       +---------+   +---------+  +---------+  +---------+   +---------+ 
          (J)            (I)          (H)          (G)           (F) 
    
   Figure 18: forwarding of the first datagram of flow f1Æ along the 
   primary route built on the sequence of R3, R2 and R1 routers. 
    
   At stage (K), the terminal S1 is sending a second datagram to the 
   terminal D1.  Thus, it packages the datagram to be sent with the 
   initial values defined for that context at that time and transmits it 
   towards its access router, namely R1. 
    
   At stage (L), the router R1 identifies that the datagram may relate 
   to an already routed flow, since the FFVI[1] value is different from 
   zero.  Thus, R1 carries out a validation of this index value by 
   performing the two stage route lookup operation.  As its FIB still 
   contains a valid route that, first, matches the destination address 
   of the datagram and that second, possesses an index value, that 

 
 
   Proust               Expires - August 2004               [Page 29] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
   equals to the FFIV[1] value of that datagram, it is unambiguously 
   selected as the route on which the flow is pinned.   
   Once this is done, the router R1 only has to increment, by one unity, 
   the FVI index value of the IP option header and transmit the datagram 
   towards the next-hop router R2. 
    
   At stages (M) and (N), router R2 and R3 successively perform the same 
   processing chain as depicted in stage (I) on that datagram.   
    
   At stage (O), the terminal D1 that receives this second datagram 
   updates its related context of communication session.  Thus, at that 
   time it is characterized by the set of values given below. 
    
      - forward flow 
           o source address: D1 
           o destination address: S1 
           o source port number: pÆ 
           o destination port number: p 
           o transport protocol number: TCP 
           o forward flow information vector: FFIV = (1, 1, 1, 0,à, 0) 
           o forward vector index: FVI = 1 
           o forward vector length: FVL = 4 
            
      - backward flow 
           o source address: S1 
           o destination address: D1 
           o source port number: p 
           o destination port number: pÆ 
           o transport protocol number: TCP 
           o forward flow information vector: BFIV = (1, 1, 1, 0,à, 0) 
           o forward vector index: BVL = 4 
    
   Doing so, it has first copied the FFIV vector and FVI index value, 
   contained in the option header of the received datagram, to update 
   the BFIV and BVL data elements that relate to the backward flow of 
   the context of communication session.  Then, it has copied the BFIV 
   vector and BVL length value contained in the option header of the 
   received datagram to update the FFIV and FVL data elements that 
   relate to the forward flow of the context of communication session.   
    
    
    
    
    
    
    
    
    
    
 
 
   Proust               Expires - August 2004               [Page 30] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
    
         +----+        ,---.         ,---.        ,---.         +----+ 
         | S1 |-------(  R1 )-------(  R2 )------(  R3 )--------| D1 | 
         +----+        `---'         `---'        `---'         +----+ 
       ^ +----+    ^             ^            ^            ^    +----+ ^ 
       |           |             |            |            |           | 
       |           |             |            |            |           | 
       v           v             v            v            v           v 
       +---------+   +---------+  +---------+  +---------+   +---------+ 
       |IP header|   |IP header|  |IP header|  |IP header|   |IP header| 
   FVI |   1     |   |    2    |  |    3    |  |   4     |   |   4     | 
   FFIV|1,1,1,0,.|-->|1,1,1,0,.|->|1,1,1,0,.|->|1,1,1,0,.|-->|1,1,1,0,.| 
   BVL |   4     |   |    4    |  |    4    |  |   4     |   |   4     | 
   BFIV|1,1,1,0,.|   |1,1,1,0,.|  |1,1,1,0,.|  |1,1,1,0,.|   |1,1,1,0,.| 
       +---------+   +---------+  +---------+  +---------+   +---------+ 
       |         |   |         |  |         |  |         |   |         | 
       | payload |   | payload |  | payload |  | payload |   | payload | 
       |         |   |         |  |         |  |         |   |         | 
       +---------+   +---------+  +---------+  +---------+   +---------+ 
          (K)            (L)          (M)          (N)           (O) 
    
   Figure 19: forwarding of the subsequent datagrams of flow f1 along 
   the primary route built on the sequence of R1, R2 and R3 routers 
    
    
3. 
  Extension of the Forwarding plane 
    
3.1 
   Overview 
    
   The very schematic representation of a router might be depicted by 
   the figure 20 below.  The functional role of a router aims at 
   forwarding datagrams towards their final destinations via the 
   transmission interfaces that interconnect it with other systems.   
   For that purpose, any router operates at the same time on two 
   functional levels: the control plane and the transmission plane.   
   Usually, the control plane operates in a shared time processing 
   environment while the transmission plane operates in near real time.   
   In short, the control plane allows a router to adapt to network 
   changes, in due time, with minimum impact on the transport of network 
   traffic.   
   The transmission plane of a router is in charge of applying all 
   needed processing, alongside the forwarding function.  For instance, 
   it may include some network filtering mechanisms or some quality of 
   service (QOS) management functions.   
    
   During the last decade, the core technology of routers has been 
   improved by a factor of several orders of magnitude.  Today, routers 
   are flexible enough to apply all sorts of elementary processing while 
   still forwarding datagrams at transmission line rates.  However, with 
 
 
   Proust               Expires - August 2004               [Page 31] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
   regard to the principle of forwarding, there has been little change 
   in the past.  Most of the time the routing decision relies on a 
   longest match search operation, which is based on the destination 
   address consigned in the header of a datagram.  While this forwarding 
   mechanism has proven to be robust and scalable, it lacks some 
   flexibility with regard to policy routing and load balancing.   
    
                 +---------------------------------------+ 
                 |                                       | 
                 |   +-------------------+               | 
                 |   |                   |               | 
                 |   |  routing protocol |               | 
       control   |   |                   |----+          | 
        plane    |   |       (RIB)       |    |          | 
                 |   |                   |    |          | 
                 |   +-------------------+    |          | 
                 |         ^                  |          | 
                 |---------|------------------|----------| 
                 |         |                  |          | 
    transmission |         | yes              v          | 
       plane     |      ,-----.       +--------------+   |   FORWARDED 
                 |     /is the \      |              |   |   DATAGRAM 
   +------+--+   |    /datagram \ no  |  forwarding  |   |   +------+--+ 
   |      |  |---+-->(intended to)--->|    module    |---+-->|      |  | 
   |      |  |   |    \ a local /     |              |   |   |      |  | 
   +------+--+   |     \process/      |    (FIB)     |   |   +------+--+ 
      RECEIVED   |      \  ?  /       |              |   | 
      DATAGRAM   |       `---'        +--------------+   | 
                 |                                       | 
                 +---------------------------------------+ 
    
      Figure 20: schematic functional representation of a router 
    
   As described in the introductory chapter, the support of a fair load 
   balancing network service, implies a modification of the forwarding 
   algorithm, the forwarding information database (FIB), and the 
   updating mechanism of the FIB.   
    
    
    
    
    
    
    
    
    
    


 
 
   Proust               Expires - August 2004               [Page 32] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
    
3.2 
   FIB structure 
    
   The support of the IP option described in chapter 2.4 requires a new 
   structure of the FIB.  It is represented below on figure 21. 
    
    +--------------------+-------+----------+-----------+------+------+ 
    | Destination Prefix | Index | Next-Hop | Interface | Load | Cost | 
    +--------------------+-------+----------+-----------+------+------+ 
    |  192.168.1.0/24    |  1    | 10.1.1.1 |  GE1/0    | 42%  | 100  | 
    +--------------------+-------+----------+-----------+------+------+ 
    |  192.168.1.0.24    |  2    | 10.2.1.1 |  GE1/1    | 32%  | 150  | 
    +--------------------+-------+----------+-----------+------+------+ 
    |  192.168.1.0/24    |  3    | 10.3.1.1 |  GE1/2    |  8%  | 200  | 
    +--------------------+-------+----------+-----------+------+------+ 
    |  192.168.2.0/24    |  1    | 10.3.1.1 |  GE1/2    | 20%  | 100  | 
    +--------------------+-------+----------+-----------+------+------+ 
    
               Figure 21: extended structure of the FIB table 
    
   The "Destination Prefix" field represents an IP prefix associated 
   with a reachable underlying sub-network known within a routing 
   domain.  
   The "Index" field represents an identifier that unambiguously 
   identifies a route, among the subset of FIB routes that share the 
   same destination prefix. 
   The "Next-Hop" field represents the IP address of the next-hop router 
   to be used along the route. 
   The "Interface" field identifies the output transmission link of the 
   router to be used along the route. 
   The "Load" field represents the maximum of the elementary traffic 
   load that has been measured for each transmission link along the 
   calculated route.  This quantity is expressed as a percentage of an 
   absolute bandwidth.   
   The "Cost" field represents the total usage cost of resources along 
   the route.  It is calculated for a given topology by means of the 
   Dijkstra's shortest path first (SPF) algorithm.   
    
    
3.3 
   FIB update 
    
3.3.1 RIB/FIB relationship 
    
   At the control plane level, in addition to static routing a router 
   can activate several dynamic routing processes like ISIS or BGP.  
   According to the definition of a specific metric each routing process 
   calculates independently the best routes that are stored in separate 
   RIB tables.  Due to the diversity of routing information sources, two 

 
 
   Proust               Expires - August 2004               [Page 33] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
   routes in different RIB tables may exist at the same time for a given 
   destination prefix.   
   In consequence of such an eventuality, the control plane of a router 
   also possesses a central routing process that manages a central RIB.  
   The latter contains the final set of valid routes that are used in 
   the transmission plane.  The central routing process selects the best 
   routes by comparing their relative degrees of trustworthiness.  This 
   static criterion is defined according to some conventional 
   engineering rules.  It is assigned to the different routing processes 
   by means of a manual configuration.   
    
   The routes contained in the FIB table are a reflection of the set of 
   the routes contained in the central RIB.   
   The figure 22 below depicts the relationships between processes that 
   control the updating of the different routing tables. 
    
      +-----------------------------------------------------------+ 
      | control plane                                             | 
      |                                                           | 
      |           +---------+  +--------+  +---------+            | 
      | ,---.     | dynamic |  | static |  | dynamic |     ,---.  | 
      |( RIB )<-->| routing |  |routing |  | routing |<-->( RIB ) | 
      | `---'     | process |  |process |  | process |     `---'  | 
      |           +----.----+  +---+----+  +----.----+            | 
      |                 `-._       |        _.-'                  | 
      |                     `-._   |    _.-'                      | 
      |                     +---v--v---v---+                      | 
      |                     |   central    |     ,---.            | 
      |                     |   routing    |<-->( RIB )           | 
      |                     |   process    |     `---'            | 
      |                     +------^-------+                      | 
      |                            |                              | 
      +----------------------------+------------------------------+ 
      | transmission plane         |                              | 
      |                     +------v-------+                      | 
      |                     |  forwarding  |     ,---.            | 
      |                     |   module     |<-->( FIB )           | 
      |                     |              |     `---'            | 
      |                     +--------------+                      | 
      |                                                           | 
      +-----------------------------------------------------------+ 
    
   Figure 22: relations between processes that control routing tables 
    
   As a result of this functional organization, each time a route is 
   added, modified or deleted from the central RIB, a similar updating 
   action should be applied to the FIB.   
    

 
 
   Proust               Expires - August 2004               [Page 34] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
   Furthermore, according to the proposed architecture, it is envisioned 
   that the load attribute of the routes must be updated on a periodic 
   basis.   
   The regular updating of this attribute is fundamental to enable the 
   whole system to keep the global network traffic load under control.  
   In this way, some feedback is introduced into this closed control 
   loop.  In a first step, the dynamic routing protocol measures the 
   distribution of network traffic load and controls the state of the 
   topology.  In a second step, it updates its routing table according 
   to the current state of both the distribution of traffic and the 
   topology.  In a third step, the central routing process updates the 
   central RIB then the FIB table.  In a forth step, the forwarding 
   module transmits the traffic according to the updated routes of the 
   FIB.   
   This cycle should occur on a regular basis.  How often depends on the 
   dynamic level of the traffic and of the ratio of the bandwidth pipes 
   in relation to the average bandwidth of flows.   
   It is assumed that the bandwidth of individual flows should be very 
   small compared to the available bandwidth provisioned in the core of 
   the network.   
    
   Thus, the routes of the FIB might be updated by means of two types of 
   mechanism.  The first one is invoked on a periodic basis.  The second 
   one is triggered when a topology change is detected or when the 
   measured distribution of network traffic load requires a modification 
   of the routing table.   
    
    
3.3.2 Periodic updates 
    
   One can distinguish two types of periodic updates of the FIB routing 
   table.  
   The first one occurs over long periods of time and is mandated by the 
   standard operating of any link state dynamic routing protocol.   
    
   The second one occurs over short periods of time (lasting only a few 
   minutes), that is to say, when a new measure of some traffic load is 
   received from a part of the network.  It may modify the load 
   attributes that relate to an existing set of calculated routes.   
   To do so, the combination of the destination prefix and of the index 
   attribute is used as a key identifier to unambiguously address a 
   specific route. 
    
3.3.3 Triggered updates 
    
   As it is already the case with the standard specifications, a 
   triggered update of the FIB may occur when some topology change is 
   detected.  

 
 
   Proust               Expires - August 2004               [Page 35] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
   Besides, as a result of a periodic update of the load attribute in 
   the FIB, some alternative routes might be added or suppressed.  
   Therefore, it is another case of triggered FIB update.   
    
    
3.4 
   Forwarding algorithm 
    
   With regard to the forwarding mechanism needed, it is fully explained 
   in paragraph 2.4, which defines a new option for the IP protocol that 
   extends the traditional forwarding algorithm by adding a refinement 
   stage to the route lookup procedure.   
   In addition to the longest match search (LMS), this optional IP 
   processing examines the value of a flow forwarding state that should 
   have been marked in the optional header by every processing router 
   during the forwarding of a prior datagram of that flow.  
    
   When the network topology remains stable, this refinement stage is an 
   exact match search (EMS) applied to a short list of pre-selected 
   routes. 
    
   When the network topology changes or at the beginning of a bi-
   directional IP communication, this refinement stage is an ordered 
   multi-criteria search of the "widest-enough longest" type, which is 
   applied to a short list of pre-selected routes.  In a first step, 
   this procedure selects the routes from the pre-selected subset, whose 
   traffic load attribute still remains below a maximum threshold.  In a 
   second step, it selects the shortest one in relation to the cost 
   attribute. 
    
   The complete forwarding algorithm is depicted below in the flow chart 
   of figure 23. 


















 
 
   Proust               Expires - August 2004               [Page 36] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
      +----------------------+ 
      |read the IP header of | 
      |the received datagram | 
      +----------+-----------+ 
                 | 
               ,-v---. 
              /  is   \        +---------------------+ 
             / the IP  \ no    |transmit the datagram| 
            (  option   )----->|along the SPF route  | 
             \present ?/       |according to the     | 
              \       /        |standard  procedure  | 
               `--+--'         +---------------------+ 
                  | yes 
          +-------v--------+ 
          |LMS route lookup| 
          +-------+--------+ 
                  | 
                ,-v---. 
               /       \ 
              /  does   \ no     +----------+ 
             ( any route )------>| drop the | 
              \ exist ? /        | datagram | 
               \       /         +----------+ 
                `-+---' 
                  | yes 
          +-------v--------+ 
          |EMS route lookup| 
          +-------+--------+ 
                  | 
                ,-v---. 
               /is the \ 
              / pointed \ no          +---------------+ 
             (   route   )----------->|widest-enough  | 
              \ valid ? /             |shortest search| 
               \       /              +-------+-------+ 
                `-+---'                       | 
                  | yes         +-------------v--------------+ 
                  |             |write the index of the      | 
                  |             |selected route in FFIV[FIV] | 
                  |             +-------------+--------------+ 
           +------v--------+                  | 
           | increment FVI |<-----------------+ 
           +---------------+ 
                  | 
       +----------v------------+ 
       | transmit the datagram | 
       +-----------------------+ 
    
         Figure 23: flow chart of the forwarding algorithm 
 
 
   Proust               Expires - August 2004               [Page 37] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
    
4. 
  Extension of the Routing Control plane 
    
4.1 
   Overview 
    
   As depicted above, the support of an adaptive load balancing service 
   relies on a periodic update of the dynamic load attribute of the 
   routes.  Moreover, when the topology changes or when the distribution 
   of the network traffic sustains a significant modification, it is 
   assumed that the dynamic routing protocol would update the content of 
   its RIB, which in turn may lead to the updating of the FIB.   
    
   For this purpose, the dynamic routing protocol must be extended to 
   support such functional requirements.   
   In this approach, the ISIS dynamic routing protocol is chosen for 
   extension.  However, the approach should be easily transposable to 
   the other well-known link state dynamic routing protocol, namely 
   OSPF.   
    
   The standard specification of the ISIS routing protocol requires that 
   all information about the topology of a given network is conveyed in 
   only one control message called a Link State PDU (LSP).  In this 
   respect, each router generates only one valid LSP at a time that 
   should contain complete up-to-date information about its local 
   topology in relation to the global one.  Moreover, the specification 
   of the protocol requires that each router must regenerate its LSP on 
   a periodic basis.  Besides, whenever a topology change is detected, 
   the routers of the network should propagate such an event in order to 
   inform all routers.  It is the router that detects the local topology 
   change that should initiate the updating process by regenerating its 
   LSP.  Whatever the reason, when a new valid LSP is received, it 
   should always take overall precedence in the Link State DataBase 
   (LSDB).  The convergence property of the routing protocol is ensured 
   as soon as the LSDBs of the routers are synchronized with one another 
   within a routing domain.  Thus, most of the inner mechanisms of the 
   ISIS routing protocol aim at synchronizing the LSDBs as quickly as 
   possible.   
    
   With regard to the first functional requirement, the ISIS protocol is 
   extended to integrate a statistic module which would synthesize the 
   traffic load measures of its transmission interfaces on a periodic 
   basis.  To this end, this module might rely on the SNMP protocol to 
   collect the counter values coming from its transmission interfaces 
   which are stored in the MIB-II database of the system.  Moreover, the 
   structure of the RIB managed by the ISIS routing protocol must be 
   extended to allow a traffic load parameter to be attributed to a 
   route.   
   Besides, as such synthesis information is by nature much more dynamic 
   than the usual link state information, it is envisioned that such 
 
 
   Proust               Expires - August 2004               [Page 38] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
   information may not be conveyed within usual LSP control messages.  
   Therefore, the ISIS protocol is extended to support the propagation 
   and management of a new control message called Link Load PDU (LLP).  
   Each router would generate a single LLP on a periodic basis, which 
   would be propagated by neighboring systems within a routing domain.   
   For convenience, all valid LLPs would be stored in a separate 
   database called Link Load DataBase (LLDB).  As it is the case for the 
   topology-driven route calculation process, it is fundamental that the 
   LLDBs be synchronized as quickly as possible to ensure the coherence 
   of the load balancing mechanism on the scale of a routing domain.  To 
   this end, the mechanisms defined for the synchronization of the LSDBs 
   serve as a canvas to design those used to ensure an efficient 
   synchronization of the contents of the LLDBs.  Thus, two additional 
   ISIS control messages are also defined that would enforce the 
   synchronization of the LLDBs within a routing domain, namely the 
   Complete Sequence Number Link Load PDUs (CSNLLP) and the Partial 
   Sequence Number Link Load PDUs (PSNLLP).   
   Besides, a new TLV is defined that allows any LLP to be referenced in 
   both CSNLLP and PSNLLP control messages.   
   Hence, whatever the type of the data link interface, these messages 
   would allow ISIS neighbor routers to synchronize both their LLDBs and 
   their LSDBs in due time.   
    
   The synchronization of the LLDBs should occur whenever an LLP control 
   message is generated.  Therefore, the reception of any valid LLP 
   control message would immediately trigger the update of the load 
   attribute of the affected routes contained in the RIB table.   
   For this purpose, the structure of the RIB table under control of the 
   ISIS routing protocol must be extended.  A new attribute called 
   "path" is defined.  It represents the sequence of next-hop routers 
   along a given route from the viewpoint of a router.  Therefore, the 
   update of the traffic load attribute of any route calculated by the 
   ISIS routing protocol can be achieved.  It is the maximum of the 
   traffic load of the transmission links that relate to the considered 
   route, expressed as a percentage of an absolute bandwidth. 
    
   With regard to the second functional requirement, the ISIS routing 
   protocol should be extended to allow routes from the RIB to be added 
   or removed due to some significant changes in the distribution of the 
   traffic.   
   Typically, following the processing of a newly received LLP control 
   message as described above, such a procedure would look through the 
   RIB table, and would determine on the basis of the load attribute, if 
   some routes requires some update action in response.   
   For instance, such adaptive action might consist in adding an 
   alternative route to the one being temporarily overwhelmed by 
   traffic.  Thus, this triggered update mechanism requires a route 
   calculation procedure that takes into account several criteria.  This 
   path selection procedure should consider, at least, both the cost and 
 
 
   Proust               Expires - August 2004               [Page 39] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
   the load attributes, and may also consider some routing policy 
   constraints.  However, this optimization issue is known as being non 
   tractable and synonymous with a NP-complete issue.  For this reason, 
   the adopted approach relies on a simple heuristic procedure, which 
   optimizes all criteria but according to a specific order, called 
   "widest-enough shortest", since it first considers the bandwidth 
   requirement and then the total cost of the path. 
    
   Once this operation is completed, the ISIS routing protocol might 
   proceed immediately to the update of the route contained in the FIB.   
   For this purpose, the structure of the RIB table, under control of 
   the ISIS routing protocol must be extended.  A new attribute called 
   "index" is defined.  Combined with the destination prefix, it 
   identifies, unambiguously, a route within the FIB.   



































 
 
   Proust               Expires - August 2004               [Page 40] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
    
4.2 
   Statistic module 
    
   As shown in the figure 24 below, the statistic module might be 
   integrated inside the ISIS routing process.   
    
      +-------------------------------------------------------------+ 
      |                                                             | 
      |            +-------------------+                            | 
      | ,----.     |   ISIS dynamic    |     ,----.                 | 
      |( LSDB )<-->| routing protocol  |<-->( LLDB )                | 
      | `----'     |                   |     `----'                 | 
      |            |    +----------+   |                            | 
      |  ,---.     |    |statistic |<--+--------+                   | 
      | ( RIB )<-->|    |  module  |   |        |                   | 
      |  `---'     |    +----------+   |        |                   | 
      |            +-------------------+        |                   | 
      |                      ^                  |                   | 
      |                      |                  |                   | 
      |                      |                  |                   | 
      |  ,---.     +---------v---------+   +----v------+    ,----.  | 
      | ( RIB )<-->|  central routing  |   | SNMP agent|<->( MIB2 ) | 
      |  `---'     |     protocol      |   |           |    `----'  | 
      |            +---------^---------+   +-----------+            | 
      +----------------------+--------------------------------------+ 
      |                      |                                      | 
      |            +---------v----------+                           | 
      |            |                    |     ,---.                 | 
      |            | forwarding module  |<-->( FIB )                | 
      |            |                    |     `---'                 | 
      |            |                    |                           | 
      |            +--------------------+                           | 
      |                                                             | 
      +-------------------------------------------------------------+ 
    
       Figure 24: ISIS routing process in relation to an SNMP agent 
    
   Moreover, for convenience, it may use the information stored in the 
   MIB-II database as it is a standard that should be included in a 
   router.  The MIB-II database specification defines a set of objects 
   that provides information about the operating state of a network 
   device such as a router.   
   These objects are gathered into several groups.  With regard to the 
   functional purpose of the statistic module, the group called 
   "interface" is of particular interest.  It defines several objects 
   that supply raw statistical information about the operating state of 
   each transmission interface of a router.  For instance, the "ifTable" 
   sub-group contains valuable counters that record statistical 
   information about the input and output traffic observed on each 
 
 
   Proust               Expires - August 2004               [Page 41] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
   transmission interface as time goes by.  Moreover, these absolute 
   values are expressed either in term of number of datagrams 
   (ifInUcastPkts, ifOutUcastPkts, etc) or in bytes (ifInOctets, 
   ifOutOctets). 
    
   Thus, the statistic module could request the values of these counters 
   on a regular basis.  Given the maximum bandwidth resource of each 
   transmission interface, it could calculate the resulting synthesis of 
   the traffic load on its interfaces, experimented over a fixed period 
   of time.  Such a synthesis of results could be expressed as a 
   percentage of the absolute bandwidth.  Besides, these syntheses 
   results could be calculated over a well-chosen period of time.   
   As an example, three periods of time are proposed for the experiment. 
   A short time period might be considered as about 2 minutes.   
   A medium time period might be considered as about 5 minutes.   
   A long time period might be considered as about 30 minutes.   
    
   At best, the duration of the period of time should be a tunable 
   parameter.  In some way, this parameter controls the efficiency of 
   the router-based closed control loop, therefore, it might enable a 
   network operator to keep some control over the dynamic of its network 
   system.   
   Among the factors that influence the dynamic state of a network 
   system are the distribution of the traffic or the ratio between the 
   average flow bandwidth and the average available bandwidth 
   provisioned in the core network.   
   Besides, when the statistic module starts collecting measures about 
   the traffic load, it should initialize all its synthesis results with 
   zero.   
    
4.3 
   RIB structure 
    
   As mentioned before, the structure of the RIB table under control of 
   the ISIS routing process should be extended along three axes.   
   First, the load attribute is added to the RIB table to reflect the 
   level of traffic load that is experimented along a given route.  
   Second, the path attribute allows the load attribute of any given 
   route, to be updated efficiently and unambiguously.  Third, the index 
   attribute serves as a key identifier, when combined with a 
   destination prefix.  Therefore, it allows the FIB table to be updated 
   reliably and can be used as a routing information indication. 
    
    
    
    
    
    
    
    
 
 
   Proust               Expires - August 2004               [Page 42] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
    
    
   +---------------+-----+--------+---------+------------+-----+------+ 
   |  Destination  |Index|Next-Hop|Interface|   Path     |Load | Cost | 
   |   Prefix      |     |        |         |            |     |      | 
   +---------------+-----+--------+---------+------------+-----+------+ 
   |192.168.1.0/24 |  1  |10.1.1.1|  GE1/0  | nh1,nh2,.. | 42% | 100  | 
   +---------------+-----+--------+---------+------------+-----+------+ 
   |192.168.1.0.24 |  2  |10.2.1.1|  GE1/1  | nh3,nh4,.. | 32% | 150  | 
   +---------------+-----+--------+---------+------------+-----+------+ 
   |192.168.1.0/24 |  3  |10.3.1.1|  GE1/2  | nh5,nh6,.. |  8% | 200  | 
   +---------------+-----+--------+---------+------------+-----+------+ 
   |192.168.2.0/24 |  1  |10.3.1.1|  GE1/2  | nh1,nh2,.. | 20% | 100  | 
   +---------------+-----+--------+---------+------------+-----+------+ 
    
               Figure 25: extended structure of the RIB table 
    
   The "Destination Prefix" field represents an IP prefix linked to a 
   reachable underlying sub-network known within a routing domain.  
   The "Index" field represents an identifier that unambiguously 
   identifies a route among the subset of routes that share the same 
   destination prefix. 
   The "Next-Hop" field represents the IP address of the next-hop router 
   to be used along the route. 
   The "Interface" field identifies the output transmission link of the 
   router to be used along the route. 
   The "Path" field represents the sequence of next-hop routers that 
   identify, unambiguously, the set of transmission and switching 
   resources associated to a given route. 
   The "Load" field represents the maximum of the elementary traffic 
   load that has been measured for each transmission link along the 
   calculated route.  This quantity is expressed as a percentage of an 
   absolute bandwidth.   
   The "Cost" field represents the total usage cost of resources along 
   the route.  It is calculated for a given topology by means of the 
   Dijkstra's shortest path first (SPF) algorithm.   
    
    
4.4 
   LLP control messages 
    
   The need of an additional ISIS control message dedicated to the 
   transport of statistic traffic load information is first motivated by 
   stability reasons.  That way, the routing process clearly separates 
   the processing of the topology-related information from the 
   processing of the traffic load information.   
   The former type of information is assumed to be quite stable over 
   time.  It may be stored without any change in the LSDB database for 
   quite a long period of time (from 20 minutes up to 18 hours).   

 
 
   Proust               Expires - August 2004               [Page 43] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
   To the contrary, the latter type of information is much more dynamic 
   by nature and requires to be regenerated by all routers at a more 
   important pace.  The order of magnitude of its period of refreshing 
   might be as little as a few minutes.   
    
   Moreover, from the network bandwidth optimization viewpoint, it might 
   be more efficient to use a separate control message dedicated to the 
   transport of the traffic load related information.  In compensation, 
   the routing process must be extended to deal with a new control 
   message, namely, the Link Load PDU (LLP).   
    
   With regard to the LAN environments, it is assumed that the 
   transformation of the topology, which is used to synchronize the 
   LSDBs efficiently, should also be applied to synchronize the LLDB.   
   That is to say, any fully meshed topology should be transformed into 
   a star topology by means of a pseudonode router located at the center 
   of the star.  Since this router is only an abstraction, the cost and 
   traffic load values associated with its transmission links, are equal 
   to zero.  
    
   In addition, as the standard specification of the ISIS routing 
   protocol defines two routing levels designated by the acronyms L1 and 
   L2, it should be necessary to define one LLP control message for each 
   routing level.   
   However, as this specification is still in a very early development 
   stage, it is designed for only one routing level.  Besides, it is a 
   common practice for network operators to deploy ISIS by means of only 
   one routing level, namely, the routing level 2 that is synonymous of 
   a backbone routing level.  Therefore, this Internet Draft focuses on 
   extending ISIS routing capabilities at routing level 2 only. 



















 
 
   Proust               Expires - August 2004               [Page 44] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
    
                                                 No. of Octets 
             +--------------------------------+ 
             |                                | 
             |    Common Fixed ISIS Header    |      8 
             |                                | 
             +--------------------------------+ 
             |          PDU Length            |      2 
             +--------------------------------+ 
             |        Remaining Lifetime      |      2 
             +--------------------------------+ 
             |             LLP ID             |   ID Length + 2 
             +--------------------------------+ 
             |         Sequence Number        |      4 
             +--------------------------------+ 
             |            Checksum            |      2 
             +--------------------------------+ 
             |RES|RES|RES|       IS Type      |      1 
             +================================+==================== 
             |                                | 
             |     Variable Length Fields     |   Variable Length 
             |                                | 
             +--------------------------------+ 
    
         Figure 26: format of the LLP control message 
    
   The field called "PDU Length" represents the total length of the Link 
   Load PDU including the header. 
    
   The field called "Remaining Lifetime" represents the validity delay 
   of the Link Load PDU expressed in seconds. 
    
   The field called "LLP ID" represents the identifier of the LLP; it is 
   made of the concatenation of three data fields, the "Source ID", the 
   "Pseudonode ID" and the "LLP number".  The "Source ID" field might be 
   six bytes long.  It is usually characterized by an IP loopback 
   address that may identify, unambiguously, a router within a routing 
   domain.  The "Pseudonode IP" field represents an integer value.  It 
   is one byte long.  A non zero value identifies a pseudonode router, 
   while a zero value identifies a real router.  The "LLP number" field 
   represents the fragment number of the LLP, in case fragmentation 
   processing is needed.  It is also one byte long.  When no 
   fragmentation occurs, this field is filled with zero.   
    
   The field called "Sequence Number" represents the sequence number of 
   the Link Load PDU. 
    


 
 
   Proust               Expires - August 2004               [Page 45] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
   The field called "Checksum" represents a control sum that covers all 
   the bytes of the Link Load PDU included between the "Source ID" field 
   and the end the message. 
    
   The field called "IS Type" represents the level of routing (L1 only, 
   L2 only, or L1L2) which this LLP belongs to.   
    
   The variable length field represents all the elements of information 
   conveyed in this LLP.  They are represented using TLV coding format. 
    
   An LLP control message must include exactly one TLV code of type 1.  
   The TLV code of type 1 characterizes the "area address", to which the 
   router that originates this LLP belongs.   
    
   Moreover, an LLP control message may include exactly one TLV code of 
   type 134 and one TLV code of type 133.  The TLV code of type 134 
   characterizes the "router ID", to which the router that originates 
   this LLP refers.  The TLV code of type 133 allows some authentication 
   information to be associated with an LLP control message. 
    
   Besides, an LLP control message may include as many TLV codes of type 
   22 as necessary.  The TLV code of type 22 allows a transmission link 
   to be defined extensively.  This code allows some sub-TLV codes to be 
   specified.  In particular, an LLP control message may encapsulate 
   some sub-TLV codes of types 6, 8 and 9 in its TLV codes of type 22.   
    
   The sub-TLV code of type 6 allows an IPv4 address associated with a 
   transmission interface to be defined. 
    
   The sub-TLV code of type 8 allows an IPv4 address associated with an 
   ISIS neighbor interface to be defined. 
    
   The sub-TLV code of type 9 allows a maximum link bandwidth associated 
   with a transmission interface to be defined. 
    
   Moreover, in order to convey the statistic information about the 
   traffic load of any transmission link it is necessary to define 
   additional sub-TLV codes.  Thus, it is proposed to define 3 
   additional sub-TLVs that would enable a network operator to finely 
   configure the granularity of information collected in the LLDBs.   
    
   A sub-TLV called "Short Term Link Load" could be used to represent 
   the level of traffic load measured over the last two minutes.   
    
   The sub-TLV called "Medium Term Link Load" could be used to represent 
   the level of traffic load measured over the last five minutes.   
    
   The sub-TLV called "Long Term Link Load" could be used to represent 
   the level of traffic load measured over the last 30 minutes.   
 
 
   Proust               Expires - August 2004               [Page 46] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
4.5 
   Synchronization of the LLDB 
    
   As mentioned before, the mechanisms used to synchronize the LLDBs are 
   identical to the ones used for the synchronization of the LSDBs.   
   For independence and flexibility reasons, those mechanisms are 
   duplicated.  Thus, two additional ISIS control messages should be 
   defined for a given routing level, a Complete Sequence Number Link 
   Load PDU (CSNLLP) and a Partial Sequence Number Link Load PDU 
   (PSNLLP). 
    
   As mentioned before, this Internet Draft focuses on extending ISIS 
   routing capabilities at routing level 2 only. 
    
   The processing of the CSNLLP and PSNLLP control messages should be 
   the same as the ones defined for the CSNP and PSNP control messages 
   respectively.   
   In brief, the procedure uses the same selective flooding mechanism to 
   quickly broadcast any new LLP message generated by a router within a 
   routing domain.  With the provision that the received LLP is valid 
   and includes more recent information than the one contained in its 
   LLDB, a router should retransmits this LLP through all its interfaces 
   except the one whose LLP message was received from.   
   Such condition is based both on the sequence number and the remaining 
   lifetime defined in the header of the LLP messages.  
    
   Besides, in order to guarantee the delivery of LLP messages to all 
   routers within a routing domain, the procedure checks periodically 
   that all neighboring LLDBs are identical.   
   Depending on the type of transmission link, this procedure is more or 
   less complex.   
   In LAN subnetwork environment, it is based on the concept of 
   pseudonode router which logically transforms a broadcast subnetwork 
   into a simple star of point to point interconnection links.  Thus, 
   all exchanges of PSNLLP and CSNLLP messages are conducted between the 
   pseudonode router and the other neighboring routers. The pseudonode 
   router is designated among all neighboring routers by means of a 
   simple election mechanism.  This artifice reduces the control traffic 
   load on the LAN and quickens the synchronization period of time. 
   Furthermore, to cope with the contention issue inherent in LAN 
   subnetworks accesses, the pseudonode router regularly broadcasts a 
   CSNLLP message on the LAN.  This message includes a summary of its 
   LLDB by listing all the LLP headers recorded in its database.  Hence, 
   each neighboring router can check the consistency of its own LLDB in 
   relation to the one of the pseudonode router. 
   In point to point subnetwork environment, it is based on a simple 
   exchange of CSNLLP and PSNLLP messages that serve as query and 
   response messages. Once the LLDB databases are synchronized, no more 
   exchange of message is necessary.  
    
 
 
   Proust               Expires - August 2004               [Page 47] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
   However, while the body of CSNP and PSNP control messages refer to 
   LSP messages, the body of CSNLLP and PSNLLP control messages should 
   refer to LLP messages.   
    
   To this end, a new TLV code is needed to identify a list of LLP 
   identifier entries.  It is intentionally called "LLP Entries" by 
   analogy with the TLV code of type 9.  Each LLP identifier entry would 
   be composed of an LLP ID, a remaining lifetime value, a sequence 
   number value, and a checksum value.  The format of such TLV code is 
   represented in figure 27 below.   
    
                                                 No. of Octets 
          +--------------------------------+ 
          |              CODE              |      1 
          +--------------------------------+ 
          |             LENGTH             |      1 
          +--------------------------------+ 
          |             VALUE              |      LENGTH 
          +--------------------------------+ 
          |       REMAINING LIFETIME       |      2 
          +--------------------------------+ 
          |             LLP ID             |      8 
          +--------------------------------+ 
          |         LLP SEQ NUMBER         |      4 
          +--------------------------------+ 
          |            CHECKSUM            |      2 
          +--------------------------------+ 
          :                                : 
          :                                : 
          +--------------------------------+ 
          |       REMAINING LIFETIME       |      2 
          +--------------------------------+ 
          |             LLP ID             |      8 
          +--------------------------------+ 
          |         LLP SEQ NUMBER         |      4 
          +--------------------------------+ 
          |            CHECKSUM            |      2 
          +--------------------------------+ 
    
   Figure 27: new TLV code called "LLP entries" 
    




 
 
   Proust               Expires - August 2004               [Page 48] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
    
   The format of both CSNLLP and PSNLLP control messages are represented 
   in the figures hereafter. 
    
                                                 No. of Octets 
             +--------------------------------+ 
             |                                | 
             |    Common Fixed ISIS Header    |      8 
             |                                | 
             +--------------------------------+ 
             |          PDU Length            |      2 
             +--------------------------------+ 
             |           Source ID            |   ID length + 1 
             +--------------------------------+ 
             |          Start LLP ID          |   ID Length + 2 
             +--------------------------------+ 
             |          End LLP ID            |   ID Length + 2 
             +================================+==================== 
             |                                | 
             |     Variable Length Fields     |   Variable Length 
             |                                | 
             +--------------------------------+ 
    
         Figure 28: format of the CSNLLP control message 
    
   The field called "PDU Length" represents the total length of the 
   CSNLLP control message including its header. 
    
   The field called "Source ID" usually represents a system address that 
   unambiguously identifies the router that originates this control 
   message.  In most cases, the "ID length" might be six bytes long.  
   Usually, the "Source ID" value is obtained by concatenation of an IP 
   loopback address and an integer value named the "Pseudonode IP".  The 
   latter is one byte long.  A non zero value identifies a pseudonode 
   router, while a zero value identifies a real router. 
    
   The field called "Start LLP ID" represents the minimum value of the 
   LLP ID identifiers, included in the list of LLP identifier entries 
   that follows in the body of the CSNLLP control message.   
    
   The field called "End LLP ID" represents the maximum value of the LLP 
   ID identifiers, included in the list of LLP identifier entries that 
   follows in the body of the CSNLLP control message.   
    
   The variable length fields might contain one TLV code of type 133 and 
   one or several TLV codes of the new type called "LLP entries".   
    
    
    
 
 
   Proust               Expires - August 2004               [Page 49] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
                                                 No. of Octets 
             +--------------------------------+ 
             |                                | 
             |    Common Fixed ISIS Header    |      8 
             |                                | 
             +--------------------------------+ 
             |          PDU Length            |      2 
             +--------------------------------+ 
             |           Source ID            |   ID length + 1 
             +================================+==================== 
             |                                | 
             |     Variable Length Fields     |   Variable Length 
             |                                | 
             +--------------------------------+ 
    
      Figure 29: format of the PSNLLP control message 
    
   The field called "PDU Length" represents the total length of the 
   PSNLLP control message including its header. 
    
   The field called "Source ID" usually represents a system address that 
   unambiguously identifies the router that originates this control 
   message.  In most cases, the "ID length" might be six bytes long.  
   Usually, the "Source ID" value is obtained by concatenation of an IP 
   loopback address and an integer value named the "Pseudonode IP".  The 
   latter is one byte long.  A non zero value identifies a pseudonode 
   router while a zero value identifies a real router. 
    
   The variable length fields might contain one TLV code of type 133 and 
   one or several TLV codes of the new type called "LLP entries". 
    
    
4.6 
   RIB update 
    
4.6.1 Periodic updates 
    
   The standard specification of the ISIS routing protocol requires that 
   each router within a given routing domain should regenerate and 
   propagate its LSP on a periodic basis.  This period of time is called 
   the maximumLSPGenerationInterval and may vary between 20 minutes and 
   18 hours.  Therefore, on receipt of such LSP, a router should update 
   its LSDB which might, in turn, lead to the updating of its RIB.   
    
   The present specification of the extension capabilities of the ISIS 
   routing protocol requires that each router should regenerate and 
   propagate its LLP on a periodic basis.  This period of time is called 
   the maximumLLPGenerationInterval and may vary between 2 and 30 
   minutes.  Therefore, on receipt of such LLP, a router should update 
   its LLDB which might in turn lead to the updating of its RIB.   
 
 
   Proust               Expires - August 2004               [Page 50] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
   This updating operation consists only in updating the load attribute 
   of the affected routes.  Nevertheless, though being an independent 
   operation, such periodic update might also be followed by a trigger 
   update that may add or suppress an alternative route due to a 
   significant change in the distribution of the network traffic load. 
    
    
4.6.2 Triggered updates 
    
   As said above, such an update of the load attribute may trigger the 
   withdrawal or the addition of an alternative route.  Anytime, the 
   load attribute update operation that corresponds to the processing of 
   a newly received LLP message, is carried out, the router should 
   launch a control process to check that no route in the RIB is 
   overloaded or under-loaded.   
    
   To this end, it uses two thresholds: a minimum and a maximum.   
   Anytime, the level of the load attribute of a route is below the 
   minimum or above the maximum threshold, the router should retrieves 
   all the routes that share the same destination prefix as the one 
   linked to this route from its FIB. 
    
   The decision of adding an alternative route into the RIB would be 
   based on the following set of conditions: 
      - if the load attribute of a route is greater than a maximum 
        threshold value, then, 
      - if the load attribute of all the other routes that share the 
        same destination prefix, shows that these routes are also in a 
        state of traffic overload, then, 
      - if the number of active routes (for instance, such a limit may 
        have a value of 15) that share this destination prefix is still 
        under a limit fixed by a routing policy constraint, then, 
           o a new route might be added into the RIB 
                 
    
   The decision of withdrawing an alternative route from the RIB would 
   be based on the following set of conditions:  
      - if the load attribute of a given route in the RIB is smaller 
        than a minimum threshold value, then,  
      - if that given route is not the primary route also called SPF 
        route (i.e. a route which has an index value of 1), then,  
      - if the load attribute of another non primary route among those 
        that share the same destination prefix, shows that it is also 
        in a state of traffic under load, then, 
      - if the total cost of this given route is higher than the total 
        cost of this other non primary route being in a state of 
        traffic under load, then, 
           o this given route might be withdrawn from the RIB 
    
 
 
   Proust               Expires - August 2004               [Page 51] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
   In no case, should a valid primary route be suppressed due to a low 
   level of traffic load.  Moreover, in order to avoid both decisional 
   procedures to trigger each other, which may result in a sequence of 
   addition and withdrawal operations, once a shortest alternative route 
   is added it is no longer suppressed according to this mechanism.   
   On the contrary, a non shortest alternative route could be suppressed 
   due to a low level of its load attribute.   
    
   Anytime an LLP control message is correctly processed in the LLDB and 
   RIB databases, each router would walk through its RIB and look for 
   the applicability of the above set of conditions. 
    
   The figure 30 below shows the dynamics of this decision process in 
   relation to the load attribute. 
    
        at this time a new route is created 
                     | 
   route             | 
   traffic  ^        |     at this time a route might be suppressed 
   load     |        v                 | 
            |         ,-.              | 
            |        ;   \        _    | 
    MAX  30%|________;____:______/_\___|_____________ 
            |       /     :    ,'   \  | 
            |      /       `--'      : | 
            |    ,'                  | | 
            |   /                    | | 
            |  ;                     : v    / 
    MIN   3%|__|______________________\____/_________ 
            |  |                       `--' 
            |_______________________________________> 
                                                 time 
    
   Figure 30: dynamics of the process of creation/suppression of a route 
    
   Besides, according to the standard specification of the ISIS routing 
   protocol whenever a topology change is detected, it is advertised 
   within a given routing domain by means of the propagation of a new 
   LSP.   
   As a result, the LSDB table is updated which in turn might lead to 
   the updating of the RIB table.   
   Anytime a topology change affects the RIB table, the ISIS routing 
   protocol should also consider its impact on the alternative routes.  
   To this end, all routes that need to be recalculated should be 
   identified.  The ones that are not valid anymore should be suppressed 
   from the RIB as soon as possible.  
    
    

 
 
   Proust               Expires - August 2004               [Page 52] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
4.7 
   Route calculation algorithm 
    
   The route calculation procedure relies both on the LSDB and the LLDB 
   databases.   
   Whenever a route must be calculated, the route calculation procedure 
   is carried out either in one step or in two steps according to the 
   type of route.   
    
   If the RIB database no longer contains any more a valid primary route 
   (also named SPF route) associated with the given destination prefix, 
   then the calculation procedure must calculate it by means of a SPF 
   algorithm based on the overall LSDB database.  The index value 
   attributed to a primary route must always be set to one. 
    
   To the contrary, if the RIB database still contains a valid primary 
   route associated with the given destination prefix, then the route 
   calculation procedure must be carried out in two steps. 
    
   First, it selects, from the LSDB database, the link transmissions 
   that are able to sustain a certain amount of traffic load.  
   On the basis of the information stored in the LLDB, this procedure 
   logically suppresses from the LSDB all transmission links, whose 
   traffic loads are greater than a maximum threshold value.   
    
   Second, based on the resulting reduced topology, this route 
   calculation procedure would work out a shortest path according to the 
   second metric being considered, namely, the cost.   
   This metric is associated to any transmission link and represents the 
   cost of using a transmission link.  Usually, the value associated 
   with the cost attribute of a transmission link is inversely 
   proportionate to its maximum bandwidth.  Therefore, the more 
   bandwidth associated with a transmission link, the less it costs to 
   use it.  
   Finally, if the total cost of the calculated route is not exceeding a 
   certain limit value fixed to comply with a routing policy 
   consideration, then the route is activated in the RIB of the ISIS 
   routing protocol. 
    
    
5. 
  Ongoing Experiment 
    
   At the present time, an implementation of the overall architecture 
   has been started.  The testbed is composed of several PC-based 
   routers and terminals that rely on the Linux operating system.   
   Besides, all developments that relate to the ISIS routing protocol 
   are based on a commercial distribution of the Zebra protocol routing 
   stack.   
    
    
 
 
   Proust               Expires - August 2004               [Page 53] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
6. 
  Conclusion 
    
   The architecture presented in the present Internet Draft aims at load 
   balancing flows among multiple routes that are believed to be 
   slightly loaded.  Basically, the approach relies on several 
   functional extensions that affect both the IP protocol and an intra-
   domain link state dynamic routing protocol, such as, ISIS.  The 
   former extensions enable flow to be load balanced according to a 
   criterion synonymous with a level of resource availability.  Besides, 
   thanks to an end-to-end piggybacking mechanism, those extensions of 
   the IP protocol allow flows, in the course of time, to be pinned 
   along the routes.  The latter extensions enable multiple routes 
   associated with a given destination prefix to be calculated according 
   to multiple criteria, such as, the level of bandwidth availability 
   and the total cost.  In addition to the shortest path first routes, 
   the routing protocol is able to manage alternative routes that would 
   use available resources.  To this end, the ISIS routing protocol is 
   extended to maintain a representation of the current distribution of 
   the network traffic load on a short term periodic basis.   
    
   As the solution applies to flow level, in term of performance and 
   efficiency, it possesses a useful level of granularity.   
   Besides, the solution is stateless, hence, it is in some ways 
   scalable and secure friendly. 
   Moreover, thanks to the route pinning mechanism, the load-balancing 
   service should exhibit some good stability properties.  
   In terms of simplicity, the implementation of the end-to-end 
   piggybacking mechanism should be of low cost.  However, the support 
   of the designed forwarding mechanism in the routers might require 
   special care.  With regard to this concern, it is expected to get 
   some feedback from the experimental field in the future.   


















 
 
   Proust               Expires - August 2004               [Page 54] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
IANA Considerations 
    
   This draft requires the use of a new IP option and, therefore, a new 
   option number from the number space within the IP protocol.  
   Moreover, it also requires the use of new ISIS control messages and, 
   therefore, new PDU type values from the number spaces within the 
   specification of the ISIS routing protocol.   
   Besides, this draft also requires the use of new sub-TLVs and, hence, 
   new code values from the number spaces within the specification of 
   the ISIS TLV type number 22 named "Extended IS Reachability TLV". 
    
   This section explains the logic used by the author to choose the most 
   appropriate number space for each new entity, and is intended to 
   assist in the determination of any final values assigned by IANA. 
    
   New IP Option 
    
   The Internet Protocol has provisions for optional header fields 
   identified by an option type field.  As the IP option defined in this 
   draft must be copied in the fragments, the most significant bit of 
   the one byte length option field must be set to one.  Besides, as 
   this option belongs to the class of control messages, the next two 
   significant bits must be set to zero.  Finally, the option type 
   within the class 0 is coded using the 5 least significant bits of the 
   option field.  At the time of writing this draft, the number 25 is 
   the next available value.  Therefore, the proposed value for the IP 
   option defined in this draft is the decimal number 153. 
    
   New PDU Types 
    
   The type of any ISIS control message is defined in the fixed ISIS 
   common header of eight bytes long.  Besides, it is coded in the "PDU 
   type" field that is 5 bits long.  As the time of writing this 
   Internet Draft, it appears that values 30, 35 and 37 of the "PDU 
   type" field are available. 
   Thus, with regard to the code value needed to be associated with the 
   definition of the PDU type named LLP at routing level 2, the proposed 
   value is 30.  Besides, it is proposed to use the number 35 to be 
   associated with the definition of the PDU type named CSNLLP at 
   routing level 2.  Finally, it is proposed to use the number 37 to be 
   associated with the definition of the PDU type named PSNLLP at 
   routing level 2.  
    
   New TLV Code 
    
   The TLV codes are included in the variable length fields of the 
   different ISIS control messages. As its acronym implies, the type of 
   any TLV code is defined in the first field of this standard data 
   structure. It is coded by means of one byte.  
 
 
   Proust               Expires - August 2004               [Page 55] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
   This draft requires the use of a new TLV code called "LLP Entries".  
   Thus, by analogy with the TLV code of type 9, it is proposed to 
   associate the value of 19 with this TLV code.  
    
   New sub-TLV Codes 
    
   The TLV code of type 22, called "Extended IS Reachability TLV", 
   enables the characterization information of a transmission link being 
   advertised in an LSP to be extended.  It is specified in [ISIS-TE] 
   and relies on the concept of sub-TLV for the purpose of traffic 
   engineering parameters definition. 
   This draft also requires the use of three new sub-TLVs and, hence, 
   three new code values from the number space within the specification 
   of the TLV type number 22. 
   At the time of writing this document, values 19, 20 and 21 are 
   available from this sub-TLV number space. 
   Hence, it is proposed to associate the values 19, 20 and 21 with the 
   sub-TLVs named "Short Term Link Load", "Medium Term Link Load" and 
   "Long Term Link Load" respectively. 
    
    
Security Considerations 
    
   As mentioned earlier, a terminal might misbehave when processing the 
   IP option described in this document.  For instance, a terminal could 
   send a datagram with an inappropriate sequence of index values.  In 
   that case, the routers along the route would detect such anomaly and, 
   therefore, would proceed to the remarking of the FFIV vector.   
   Therefore, from the viewpoint of a terminal, it is useless to spoof 
   the FFIV vector, since such practice would finally end up rerouting 
   the flow.  Besides, as the routes are calculated independently of the 
   flows by means of a shortest path algorithm, it should be impossible 
   to introduce a routing loop, by means of a manipulation of the FFIV 
   vector values.   
    
    
Normative References 
    
   [RFC2026]  Bradner, S., "The Internet Standards Process û Revision 
   3", BCP 9, RFC 2026, October 1996. 
    
   [BCP14]  Bradner, S., "Key words for use in RFCs to Indicate 
   Requirement Levels", BCP 14, RFC 2119, March 1997. 
    
   [FYI18]  Malkin, G., "Internet Users' Glossary", BCP 18, RFC 1983, 
   August 1996. 
    
   [ISIS]  ISO 10589, "Intermediate System to Intermediate System Intra-
   Domain Routeing Exchange Protocol for use in Conjunction with the 
 
 
   Proust               Expires - August 2004               [Page 56] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
   Protocol for Providing the Connectionless-mode Network Service (ISO 
   8473)" [Also republished as RFC 1142] 
    
    
Informative References 
    
   [ISIS-IETF]  Callon, R.W., "Use of OSI IS-IS for routing in TCP/IP 
   and dual environments", RFC 1195, December 1990. 
    
   [INT-ARCH1]  Carpenter, B., "Architectural Principles of the 
   Internet", RFC 1958, June 1996. 
    
   [INT-ARCH2]  Bush, R., Meyer, D., "Some Internet Architectural 
   Guidelines and Philosophy", RFC 3439, December 2002. 
    
   [BCP41]  Floyd, S., "Congestion Control Principles", September 2000. 
    
   [QOSARCFRWK]  Huston, G., "Next steps for the IP QoS Architecture", 
   RFC 2990, November 2000. 
    
   [QOSRFRWK]  Crawley, E., Nair, R., Rajagopalan, B., Sandick, H., "A 
   framework for QoS-based Routing in the Internet", RFC 2386, August 
   1998. 
    
   [TE-REQ]  Awduche, D., Malcolm, J., Agogbua, J., O'Dell, M., and 
   McManus, J., "Requirements for Traffic Engineering Over MPLS,", RFC 
   2702, September 1999. 
    
   [OMP]  Villamizar, C., Li, T., "IS-IS Optimized Multipath (ISIS-
   OMP)", Work in Progress, January 2002. 
    
   [QOSPF]  Apostolopoulos, G., Williams, D., Kamat, S., Guerin, R., 
   Orda, A., Przygienda, T., "QOS routing mechanims and OSPF extensions 
   ", RFC 2676, August 1999. 
    
   [OSPF]  Moy, J., "Open Shortest Path First version 2", STD 54, RFC 
   2328, April 1998. 
    
   [E2E]  Kempf, J., Austein, R., "The Rise of the Middle and the Future 
   of End to End: Reflections on the evolution of the Internet 
   Architecture", Work in Progress, October 2003. 
    
   [IAB-FUTURE]  Atkinson, R., Floyd, S., "IAB Concerns and 
   Recommendations Regarding Internet Research and Evolution", Work in 
   Progress, October 2003. 
    
   [IRTF-ROUTREQ]  Doria, A., Davies, E., Kastenholtz, F., "Requirements 
   for Internet Domain Routing", Work in Progress, December 2003.   
    
 
 
   Proust               Expires - August 2004               [Page 57] 
   Internet-Draft       Routing at Flow level                2/5/2004 
 
 
   [IRTF-RR]  http://www.irtf.org/rrg/ 
    
   [ISIS-TE]  Smit, H., Li, T., "IS-IS extensions for Traffic 
   Engineering", Work in Progress, August 2003.   
    
    
Acknowledgments 
    
   The author would like to thank No‹l Cantenot of France Telecom R&D 
   for his fruitful comments on this work. 
    
    
Author's Address 
    
   Christophe Proust 
   France Telecom R&D 
   42 rue des Coutures 
   14066 Caen cedex 4 - FRANCE 
   Phone: +33 2 31759308 
   Email: christophe.proust@francetelecom.com 
    
    
IPR Notice 
 
   The IETF has been notified of intellectual property rights claimed in 
   regard to some or all of the specification contained in this 
   document.  For more information consult the online list of claimed 
   rights.   
    
    
Full Copyright Statement 
 
   Copyright (C) The Internet Society (2003).  All Rights Reserved. 
    
   This document and translations of it may be copied and furnished to 
   others, and derivative works that comment on or otherwise explain it 
   or assist in its implementation may be prepared, copied, published 
   and distributed, in whole or in part, without restriction of any 
   kind, provided that the above copyright notice and this paragraph are 
   included on all such copies and derivative works.  However, this 
   document itself may not be modified in any way, such as by removing 
   the copyright notice or references to the Internet Society or other 
   Internet organizations, except as needed for the purpose of 
   developing Internet standards in which case the procedures for 
   copyrights defined in the Internet Standards process must be 
   followed, or as required to translate it into languages other than 
   English. 
    
    
 
 
   Proust               Expires - August 2004               [Page 58]