Internet DRAFT - draft-anderson-forces-model

draft-anderson-forces-model





   Internet Draft                                 T. Anderson 
   Expiration: April 2002                         Intel Labs 
   File: draft-anderson-forces-model-00.txt       November 2001 
   Working Group: ForCES                        
    
    
          ForCES Architectural Framework and FE Functional Model 
    
    
    
                 draft-anderson-forces-framework-00.txt 
    
    
    
    
   Status of this Memo 
    
   This document is an Internet-Draft and is in full conformance with  
   all provisions of Section 10 of RFC2026.  Internet-Drafts are  
   working documents of the Internet Engineering Task Force (IETF),  
   its areas, and its working groups.  Note that other groups may  
   also distribute working documents as Internet-Drafts. 
    
   Internet-Drafts are draft documents valid for a maximum of six  
   months and may be updated, replaced, or obsoleted by other  
   documents at any time.  It is inappropriate to use Internet-Drafts  
   as reference material or to cite them other than as ``work in  
   progress.'' 
    
   The list of current Internet-Drafts can be accessed at 
   http://www.ietf.org/ietf/1id-abstracts.txt. 
    
   The list of Internet-Draft Shadow Directories can be accessed at  
   http://www.ietf.org/shadow.html. 
    
Conventions used in this document  
        
   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",  
   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in  
   this document are to be interpreted as described in [RFC-2119]. 
    
1. Abstract 

 

   This document defines an architecture for ForCES network elements 
   and a functional model for ForCES forwarding elements.  This model 
   is used to describe the capabilities of ForCES forwarding elements 
   within the context of the ForCES protocol.  The architecture and 
   forwarding element model defined herein is intended to satisfy the 
   requirements specified in the ForCES requirements draft [FORCES-
   REQ]. 
    



  
Anderson                                                      [Page 1] 



2. Definitions 
    
   Most of these definitions are copied from the ForCES requirements 
   document [FORCES-REQ]. 
    
   Addressable Entity (AE) - A physical device that is directly 
   addressable given some interconnect technology.  For example, on 
   Ethernet, an AE is a device to which we can communicate using an 
   Ethernet MAC address; on IP networks, it is a device to which we can 
   communicate using an IP address; and on a switch fabric, it is a 
   device to which we can communicate using a switch fabric port 
   number. 
    
   Physical Forwarding Element (PFE) - An AE that includes hardware 
   used to provide per-packet processing and handling.  This hardware 
   may consist of (but is not limited to) network processors, ASIC's, 
   or general-purpose processors.  For example, line cards in a 
   forwarding backplane are PFEs. 
    
   PFE Partition - A logical partition of a PFE consisting of some 
   subset of each of the resources (e.g., ports, memory, forwarding 
   table entries) available on the PFE.  This concept is analogous to 
   that of the resources assigned to a virtual router [REQ-PART]. 
    
   Physical Control Element (PCE) - An AE that includes hardware used 
   to provide control functionality.  This hardware typically includes 
   a general-purpose processor. 
    
   PCE Partition - A logical partition of a PCE consisting of some 
   subset of each of the resources available on the PCE. 
    
   Forwarding Element (FE) - A logical entity that implements the 
   ForCES protocol.  FEs use the underlying hardware to provide per-
   packet processing and handling as directed by a CE via the ForCES 
   protocol.  FEs may use the hardware from PFE partitions, whole PFEs, 
   or multiple PFEs. 
    
   Proxy FE - A name for a type of FE that cannot directly modify its 
   underlying hardware but instead manipulates that hardware using some 
   intermediate form of communication (e.g., a non-ForCES protocol or 
   DMA).  A proxy FE will typically be used in the case where a PFE 
   cannot implement (e.g., due to the lack of a general purpose CPU) 
   the ForCES protocol directly. 
    
   Control Element (CE) - A logical entity that implements the ForCES 
   protocol and uses it to instruct one or more FEs as to how they 
   should process packets.  CEs handle functionality such as the 
   execution of control and signaling protocols.  CEs may use the 
   hardware of PCE partitions or whole PCEs.  (The use of multiple PCEs 
   will usually be modeled as separate CEs.) 
    




  
Anderson                                                      [Page 2] 



   Pre-association Phase - The period of time during which a FE does 
   not know which CE is to control it and vice versa. 
    
   Post-association Phase - The period of time during which a FE does 
   know which CE is to control it and vice versa. 
    
   ForCES Protocol - While there may be multiple protocols used within 
   a device supporting ForCES, the term "ForCES protocol" refers only 
   to the ForCES post-association phase protocol (see below). 
    
   ForCES Post-Association Phase Protocol - The protocol used for post-
   association phase communication between CEs and FEs.  This protocol 
   does not apply to CE-to-CE communication, FE-to-FE communication, or 
   to communication between FE and CE managers.  The ForCES protocol is 
   a master-slave protocol in which FEs are slaves and CEs are masters. 
    
   FE Model - A model that describes the logical processing functions 
   of a FE. 
    
   FE Manager - A logical entity that operates only in the pre-
   association phase and is responsible for determining to which CE(s) 
   a FE should communicate.  This determination process is called CE 
   discovery and may involve the FE manager learning the capabilities 
   of available CEs.  A FE manager may use anything from a static 
   configuration to a pre-association phase protocol (see below) to 
   determine which CE to use.  Being a logical entity, a FE manager 
   might be physically combined with any of the other logical entities 
   mentioned in this section. 
    
   CE Manager - A logical entity that operates only in the pre-
   association phase and is responsible for determining to which FE(s) 
   a CE should communicate.  This determination process is called FE 
   discovery and may involve the CE manager learning the capabilities 
   of available FEs.  A CE manager may use anything from a static 
   configuration to a pre-association phase protocol (see below) to 
   determine which FE to use.  Being a logical entity, a CE manager 
   might be physically combined with any of the other logical entities 
   mentioned in this section. 
    
   Pre-association Phase Protocol - A protocol between FE managers and 
   CE managers that helps them determine which CEs or FEs to use.  A 
   pre-association phase protocol may include a CE and/or FE capability 
   discovery mechanism.  It is important to note that this capability 
   discovery process is wholly separate from (and does not replace) 
   that used within the ForCES protocol.  However, the two capability 
   discovery mechanisms may utilize the same FE model (see Section 5).  
   Pre-association phase protocols are not discussed further in this 
   document. 
    
   ForCES Network Element (NE) - An entity composed of one or more CEs 
   and one or more FEs.  To entities outside a NE, the NE represents a 
   single point of management.  Similarly, a NE usually hides its 



  
Anderson                                                      [Page 3] 



   internal organization from external entities.  However, one 
   exception to this rule is that CEs and FEs may be directly managed 
   to transition them from the pre-association phase to the post-
   association phase. 
    
   ForCES Protocol Element - A FE or CE. 
    
   High Touch Capability - This term will be used to apply to the 
   capabilities found in some forwarders to take action on the contents 
   or headers of a packet based on content other than what is found in 
   the IP header.  Examples of these capabilities include NAT-PT, 
   firewall, and L7 content recognition. 
    
   Bootstrap CE - The first CE that a FE connects to in a ForCES NE. 
    
   CE set - One or more equivalently capable CEs designed to operate 
   concurrently (for load sharing) or in a 1+N failover mode (for 
   redundancy).   
    
3. Introduction 
 
  [TBD] 
 

4. Architecture 
    
   This section defines a ForCES architectural framework.  This ForCES 
   framework consists primarily of ForCES NE's but also includes 
   several ancillary components.  ForCES NE's appear to external 
   entities as monolithic pieces of network equipment, e.g., routers, 
   NAT's, firewalls, or load balancers.  (See [FORCESREQ], Section 5, 
   Requirement 4.)  Internally, however, ForCES NE's are composed of 
   several logical components.  By defining logical components and 
   specifying the interactions between them, the ForCES architecture 
   allows these components to be physically separated.  This physical 
   separation accrues several benefits to the ForCES architecture.  For 
   example, separate components would allow vendors to specialize in 
   one component without having to become experts in all components.  
   Scalability is also provided by this architecture in that additional 
   forwarding or control capacity can be added to existing network 
   elements without the need for forklift upgrades.  The components of 
   the ForCES architecture and their relationships are pictured in the 
   following diagram.  For convenience, the interactions between 
   components are labeled by reference points Gp, Gc, Gf, Gr, Gl, and 
   Gi. 
    
                           --------------------------------------- 
                           | ForCES Network Element              | 
                           | -------------------                 | 
                           | |        CE Set 1 |                 | 
                           | |                 |                 | 
    --------------   Gc    | |-----------------| Gr ------------ | 
    | CE Manager |---------+-|  Head | CE 2..N |----| CE Set 2 | | 



  
Anderson                                                      [Page 4] 



    --------------         | |   CE  |         |    |          | | 
          |                | -------------------    ------------ | 
          | Gl             |         |\     ---------/      |    | 
          |                |    Gp   | \   /   Gp           | Gp | 
          |                |         |  --/----------\      |    | 
    --------------     Gf  | --------------      --------------  | 
    | FE Manager |---------+-|     FE     |  Gi  |     FE     |  | 
    --------------   \     | |            |------|            |  | 
                      \ Gf | --------------      --------------  | 
                       ----+------------------------/            | 
                           --------------------------------------- 
    
4.1. Control Elements 
    
   This architecture permits multiple CEs to be present in a network 
   element.  These CEs may be used for any combination of redundancy, 
   load sharing, or distributed control.  Redundancy is the case where 
   one or more CEs are prepared to take over should an active CE fail.  
   Load sharing is the case where two or more CEs are concurrently 
   active and where any request that can be serviced by one of the CEs 
   can also be serviced by any of the other CEs.  In both redundancy 
   and load sharing, the CEs involved are equivalently capable.  The 
   only difference between these two cases is in terms of how many 
   active CEs there are.  Distributed control is the case where two or 
   more CEs are concurrently active but where certain requests can only 
   be serviced by certain CEs. 
    
   To enable multiple CEs, control in a ForCES NE is handled by one or 
   more CE sets.  Each CE set can specialize in handling a particular 
   subset of the control functions of a NE.  For example, one CE set 
   may handle routing functions while another may handle firewall or 
   QoS functions.  Each CE set is itself composed of multiple CEs.  All 
   of the CEs in a CE set are equivalently capable, meaning that each 
   is capable of performing the same set of functions albeit with 
   possibly different performance.  The remaining members of a CE set 
   may be used for load sharing or redundancy purposes.  Communication 
   between members of a CE set or between CE sets is discussed in 
   Section 4.10.  CEs are wholly responsible for coordinating amongst 
   themselves to provide redundancy, load sharing, or distributed 
   control, if desired. 
    
   CEs are concerned with controlling the layer-3 and above 
   capabilities of FEs.  CEs are not concerned with controlling the 
   layer-2 and below communication aspects of the FE.   
    
   While the ForCES model allows for multiple CEs, the coordination of 
   those CEs is beyond the current scope of ForCES. In cases where an 
   implementations uses multiple CEs or CE sets, it is still required 
   that an implementation must maintain the invariant that a single NE 
   MUST NOT appear as multiple NEs even in the presence of link 
   failures between FEs and/or CEs. 
    



  
Anderson                                                      [Page 5] 



4.2. Forwarding Elements 
    
   FEs are responsible for per-packet processing and handling as 
   directed by its CEs.  FEs have no initiative of their own.  Instead, 
   FEs are slaves to their CEs and only do as they are told (Section 
   4.9).  FEs may communicate with one or more CEs, either from the 
   same or different CE sets concurrently.  However, FEs have no notion 
   of CE redundancy, load sharing, or distributed control.  Instead, 
   FEs accept commands from any CE authorized to control them.  This 
   architecture mandates that a coarse grain mapping of requests to CE 
   sets be possible but also allows finer grain mappings.  For example, 
   at a minimum, a CE must be able to specify a single CE set to which 
   all requests generated by the FE should be sent.  However, the 
   architecture also allows different CE sets to be mapped to different 
   types of requests if the FE is capable of differentiating between 
   request types.   
    
   This architecture permits multiple FEs to be present in a NE.  Each 
   of these FEs may potentially have a different set of capabilities.  
   FEs express these capabilities using the ForCES FE model described 
   in Section 5.  FEs are responsible for establishing and maintaining 
   layer-2 connectivity with other FEs or with entities external to the 
   NE.  Thus, FEs are also responsible for any signaling required at 
   layer-2. 
    
4.3. CE Managers 
    
   CE managers are responsible for determining which FEs a CE should 
   control.  It is legitimate for CE managers to be hard-coded with the 
   knowledge of with which FEs its CEs should communicate.  Likewise, 
   CE managers can communicate with any other entity or perform any 
   kind of computation to make that determination.   
    
4.4. FE Managers 
    
   FE managers are responsible for determining to which CE any 
   particular FE should initially communicate.  Like CE managers, no 
   restrictions are placed on how a FE manager decides to which CEs its 
   FEs should communicate.  The FE manager can be hard-coded with this 
   information or communicate with any other entity to make that 
   determination. 
    
4.5. Gl Reference Point 
    
   CE managers and FE managers may communicate with each other across 
   the Gl reference point in order to help them decide which CEs and 
   FEs should communicate with each other.  Communication across the Gl 
   reference point is entirely optional in this architecture.  No 
   requirements are placed on this reference point. 
    
   CE managers and FE managers may be operated by different entities.  
   The operator of the CE manager may not want to divulge, except to 



  
Anderson                                                      [Page 6] 



   specified FE managers, any characteristics of the CEs it manages.  
   Similarly, the operator of the FE manager may not want to divulge FE 
   characteristics, except to authorized entities.  As such, CE 
   managers and FE managers may need to authenticate one another.  
   Subsequent communication between CE managers and FE managers may 
   require other security functions such as privacy, non-repudiation, 
   freshness, and integrity. 
    
   Once the necessary security functions have been performed, the CE 
   and FE managers MAY communicate to determine which CEs and FEs 
   should communicate with each other.  In this process, the CE and FE 
   managers will likely learn of the existence of available FEs and CEs 
   respectively.  This process is called discovery and will likely 
   entail one or both managers learning the capabilities of the 
   discovered ForCES protocol elements. 
    
4.6. Gf Reference Point 
    
   The Gf reference point is used to inform forwarding elements of the 
   decisions made by FE managers.  Only authorized entities may 
   instruct a FE with respect to which CE should control it.  
   Therefore, authentication is a necessary between FE managers and 
   FEs.  Privacy, integrity, and freshness are also required.  Once the 
   appropriate security has been established, FE managers may instruct 
   FEs across this reference point to join a new NE or to disconnect 
   from an existing NE. 
    
4.7. Gc Reference Point 
    
   The Gc reference point is used to inform control elements of the 
   decisions made by CE managers.  Only authorized entities may 
   instruct a CE to control certain FEs.  Privacy, integrity, and 
   freshness are also required across this reference point.  Once 
   appropriate security has been established, the CE manager may 
   instruct CEs as to which FEs they should control and how they should 
   control them. 
    
4.8. Gi Reference Point 
    
   Packets that enter the NE via one FE and leave the NE via a 
   different FE are transferred between FEs across the Gi reference 
   point.  (See [FORCESREQ], Section 5, Requirement 3.) 
    
4.9. Gp Reference Point 
    
   Based on the information acquired through CEs' control processing, 
   CEs will frequently need to manipulate the packet-forwarding 
   behaviors of their FE(s).  This manipulation of the forwarding plane 
   is performed across the Gp ("p" meaning protocol) reference point.  
   In this architecture, the ForCES protocol is exclusively used for 
   all communication across the Gp reference point. 
    



  
Anderson                                                      [Page 7] 



4.10. Gr Reference Point 
    
   Varying degrees of synchronization are necessary to provide 
   redundancy, load sharing or distributed control.  However, in all 
   cases, consistency protocols between CEs take place across the Gr 
   reference point and are out of the scope of this document.  
   Likewise, detecting the inability to synchronize due to a loss of 
   connectivity between CEs is out of the scope of this document. 
    
   It is not necessary to define any protocols across the Gr reference 
   point to enable simple control/forwarding separation (i.e., single 
   CE and multiple FEs).  However, to make it possible to define Gr at 
   a later time, the concept of CE sets and the associated CE/FE 
   behavior should be included in the first versions of the ForCES 
   protocol.  From the basic CE set building block concept, protocols 
   across the Gr reference point can be defined to provide the desired 
   effect. 
    
5. FE Model 
    
   This section describes a model that can be used to express the 
   capabilities of a ForCES FE.  (As we will see, this model can also 
   be used as the basis to control a FE's capabilities.)  This model 
   satisfies the requirements set forth in ForCES requirements document 
   [FORCES-REQ] with respect to FE modeling.  Our model is composed of 
   two level hierarchy of detail.  The higher level of the hierarchy 
   expresses which logical data path elements exist in the FE and 
   describes how these elements are interconnected.  We call these 
   logical data path elements "stages."  The lower level of the 
   hierarchy expresses the capabilities of each stage that the FE 
   provides.  In general, the lower level expresses these capabilities 
   in terms of five categories: 1) what information the stage uses to 
   classify packets, 2) once classified, the actions the stage can 
   perform on the packet, 3) the statistics the stage collects in this 
   process, 4) the asynchronous events the stage may send to the CE as 
   part of this process, and 5) the parameters that the stage uses to 
   control its overall behavior. 
    
5.1. Introduction 
    
   The ForCES architecture allows Forwarding Elements (FEs) of varying 
   functionality to participate in a ForCES network element.  The 
   implication of this varying functionality is that CEs can make only 
   minimal assumptions about the functionality provided by its FEs.  
   Instead, CEs discover the capabilities of their FEs.  [FORCES-REQ] 
   mandates that this capability information be expressed in the form 
   of a FE model.  [FORCES-REQ] further requires that this FE model 
   describe which logical functions (i.e., stages) are present in the 
   FE and in which order these stages are performed.  See [FORCES-REQ] 
   for types of logical functions that this model must support.  For 
   each logical function, [FORCES-REQ] also requires that the FE model 
   be able to describe each stage∆s "capabilities."   



  
Anderson                                                      [Page 8] 



    
   A stage's capabilities clarify what the stage does but not how it 
   does it.  (There is a small exception to this described later for 
   the case where the FE allows the CE to choose which algorithm the 
   stage should use.)  For example, a forwarding function may perform a 
   lookup on destination IP address and mask to find a next hop IP 
   address and egress interface.  However, the fact that the forwarding 
   function uses a Patricia Trie or a CAM to accomplish this lookup is 
   not relevant to the CE.  Stage capabilities are best illustrated by 
   the following description of the logical packet-processing model of 
   a stage. 
    
   Stages logically process packets using the following process.  
   First, the stage receives a packet and performs a classification 
   step on the packet.  This classification step finds the highest 
   priority rule (i.e., filter) in the stage's rule set (i.e., 
   classification or rule table) that matches the given packet.  Next, 
   the stage performs one or more actions associated with the matching 
   rule.  As part of this process, the stage may update certain 
   statistics (e.g., number of packets processed, number of packets 
   matching each filter rule) to reflect the types of packets it has 
   processed.  As one of the actions (or occasionally asynchronously), 
   the stage may generate an event for further processing by the CE.  
   For example, a stage may detect that the router alert IP option is 
   present in a packet and would then generate a "packet redirection" 
   event to send the packet to the CE.  Finally, some stages may have 
   tunable "knobs" that affect how they process packets.  For example, 
   a FE may provide various algorithms for performing a metering 
   function (e.g., average rate, exponentially weighted moving average, 
   token bucket). 
    
   From this process, we see that the capabilities of stages can be 
   modeled by describing the five logical sets of data maintained by 
   each stage.  The first two sets of data are the filtering rules and 
   associated actions that are applied to each packet as they pass 
   through the stage.  The third set of data is the statistics 
   maintained by the stage.  The fourth set is the current state of the 
   stage's tunable "knobs."  Finally, the fifth set is the set of 
   events for which the CE has registered to receive notifications from 
   the stage.  Manipulation of these five logical databases can be used 
   as a model for control of each stage. 
    
5.2. Model Approach 
    
   There are many ways that one could model the packet processing 
   capabilities of a FE.  However, as we shall see, there is often a 
   tradeoff between the flexibility of a FE model and the ease with 
   which the CE can interpret that model to provide services.  One 
   approach to this problem is to define a number of simple "device 
   types."  Each of these device types would have well-known components 
   connected together in well-known ways.  For example, we could define 
   a RFC1812 router device type that does a longest prefix match on 



  
Anderson                                                      [Page 9] 



   destination IP address and mask and forwards packets to the 
   associated next hop IP address.  However, since many services (e.g., 
   QoS, firewall, intrusion detection) are being added to network 
   devices, the number of possible device types would be exponential in 
   the number of services.  Writing a CE that understood exponentially 
   many device types would be a daunting task.  Therefore, one would 
   likely want to restrict the number of devices types to a small set 
   of "likely" devices.  Coming up with this set would be difficult.  
   Furthermore, restricting device types would seem to disallow vendors 
   from creating interesting new devices.  One could attempt to solve 
   this problem by allowing vendors to define their own proprietary 
   device types but this only leads to another explosion of device 
   types and introduces interoperability problems for CE vendors who do 
   not have access to the description of FE vendors' proprietary device 
   types. 
    
   The FE model proposed in this document tries to strike a balance 
   between flexibility of the model and ease of use by the CE.  The 
   model tries to strike this balance by describing packet processing 
   in two levels of detail.  The higher level of detail (Section 5.3) 
   uses the concept of logical functions to make it easier for CEs to 
   determine how to implement a service with a given model.  The lower 
   level of detail (Section 5.4) allows great flexibility to express 
   the realization of a logical function chosen by a FE.  The model 
   allows arbitrary topologies to be described.  While arbitrary 
   topologies make it harder for the CE to understand the FE, it is 
   asserted that static topology (or small set of topologies) is 
   insufficient to describe the types of devices already in use. 
    
5.3. Logical Functions and Topology 
    
   There are two largely orthogonal parts to the FE model proposed in 
   this draft.  The first part provides a way to describe which logical 
   functions are present in a FE and how packets flow between these 
   logical functions.   The concept of a logical function is akin to 
   that of an abstract base class in object-oriented terminology.  By 
   saying that a FE supports a logical function, what we are really 
   saying is that the FE implements a specific concrete "derived class" 
   version of the logical function.  The following inheritance diagram 
   illustrates this concept. 
    
                             Stage 
                            /  |  \ 
                           /   |   \ 
                          /    |    \ 
                         /     |     \ 
                        /      |      \               Logical 
                Forwarder    Meter   Shaper <======== Function 
                 /  \          |        \             Level 
                /    \         |         \ 
               /      \        |          \ 
    RFC1812Fwder  WebSwitch  Token       Leaky  <===== Capability 



  
Anderson                                                     [Page 10] 



                             Bucket      Bucket        Level 
    
   By describing the FE at this high level, the FE model is able to 
   give a broad overview of what processing a FE may perform on 
   packets.  The goal of this part of the FE model is to provide a way 
   for the CE to know which stage(s) to modify to achieve a given 
   service.  As such, this model allocates a namespace for the 
   specification of different logical functions.  (We expect about 15 
   to 20 logical functions to be defined initially, e.g., ingress port, 
   egress port, forwarder, meter, marker, shaper, scheduler, queue, 
   encapsulator, decapsulator, encrypter, decrypter, NAT, mux, demux, 
   and editor.)  Each FE allocates a FE-unique stage identifier (USI) 
   to each of its stages and passes the USI along with the 
   corresponding logical function name as part of the FE capability 
   description.  This allows there to be multiple instances of the same 
   logical function in each FE's model.  We will start with a simple 
   version of the model illustrating a capability exchange.  In 
   subsequent sections, we will expand the model and refine the same 
   capability exchange.  The following is the first version of the 
   capability exchange that indicates which logical functions are 
   present and how they are connected together. 
    
   - The number of stages supported. 
   - For each stage: 
     - The USI. 
     - The logical function name (from the namespace) that this stage 
      implements. 
     - The number of downstream stages to which this stage can send 
      packets. 
     - For each downstream stage: 
       - The USI of the downstream stage. 
       - A label for this exit point (i.e., target) from the stage. 
    
   This representation allows zero or more instances of each logical 
   function to be present in a FE model.  Furthermore, this 
   representation encodes the topology of the provided stages.  Since 
   it is not possible to represent all possible FEs' processing models 
   using a fixed topology, the model presented in this draft allows 
   functions to be connected with largely arbitrary topologies.  The 
   only restrictions on topology relate to the source and sink natures 
   of ingress and egress port functions respectively.  For example, 
   egress port functions must not have any downstream stages whereas no 
   other stage may refer to an ingress port function as one of its 
   downstream stages.  Cycles in the topology are permitted. 
    
5.4. Stage Capabilities 
    
   This section defines how the capabilities of all the stages in our 
   model can be expressed using a single methodology.  We achieve this 
   uniformity by viewing all stages as acting according to the 
   classification/action paradigm.  In this paradigm, when a packet 
   logically enters a stage, the stage first performs a classification 



  
Anderson                                                     [Page 11] 



   on the packet.  This classification is performed according to a 
   logical database of classification entries maintained by the stage.  
   Next, the stage performs one or more actions associated with the 
   matching classification entry.  Each classification entry contains 
   this set of actions that the stage should perform for all packets 
   that match the entry. 
    
   This paragraph provides several examples of how the stages 
   identified in Section 3 can be viewed as acting according to the 
   classification/action paradigm.  This paradigm is most naturally 
   applied to the generic filtering stages.  In those stages, 
   prioritized filters (e.g., ACLs) are installed in a stage∆s logical 
   database.  These filters specify which fields in the packet should 
   be evaluated and which values should be present in those fields for 
   the filter to match.  In each filter, a pass or drop action is 
   typically specified that determines the disposition of packets 
   matching the filter.  This paradigm maps to classical layer 3 
   forwarding in the following way.  The logical database of 
   classification/action entries corresponds to a forwarding table.  
   The entries in this forwarding table have typically consisted of a 
   network address, a network mask, a next hop IP address, and an 
   egress interface number.  The network address and mask make up the 
   classification portion of this entry while the next hop IP address 
   and egress interface correspond to a parameterized "forwarding 
   decision" action.  The typical longest-prefix match algorithms 
   utilized by forwarding stages are nothing but classification 
   algorithms optimized for a masked match against a packet∆s 
   destination IP address.  Finally, the metering stage can also be 
   viewed in terms of classification and action.  Meters take a flow 
   specification and some rate limiting parameters (and optionally a 
   rate limiting algorithm).  This flow specification may be based on 
   DSCP, 5-tuple or some other arbitrary packet contents.  In any case, 
   this flow specification essentially defines a classification entry.  
   The rate limiting parameters are parameters to the specified rate 
   limiting action (or to an assumed rate limiting algorithm when one 
   is not explicitly specified). 
    
   While most of the functionality of a stage can be described 
   according to the classification/action paradigm, some additional 
   functions remain.  These additional functions relate to how the 
   stage as a whole operates (as opposed to how the stage handles 
   individual flows), the kinds of asynchronous notifications that the 
   stage can send to the CE and the types of statistics the stage 
   maintains.  While we will often have no control over the algorithm 
   the stage uses to perform its function, there may be certain knobs 
   and dials that we can adjust to control the algorithm.  We call 
   these knobs and dials "parameters" to the stage because they 
   resemble parameters to algorithms.  For example, one can view an 
   ingress port stage as running an ARP algorithm that responds to ARP 
   requests.  In order for the ARP algorithm to know when to respond to 
   an ARP request, the ARP algorithm needs to know the IP addresses of 




  
Anderson                                                     [Page 12] 



   each port.  Thus, IP addresses can be viewed as parameters to the 
   ingress port stage.   
    
   Next, some stages can be viewed as the originators of asynchronous 
   notifications, i.e., events.  These events correspond to occurrences 
   that the CE cannot anticipate.  For example, the ingress and egress 
   port stages may be able to send the link up/down event when they 
   detect that their port link state has changed.  Likewise, one or 
   more stages may support the packet redirection event for sending 
   well-known control packets to the CE.  Since CEs may not want to 
   receive all the events that a FE may generate, the ForCES protocol 
   SHOULD support a registration/deregistration mechanism where the CE 
   can signal its interest in receiving the events that it has 
   discovered via this FE model.  Finally, stages may maintain certain 
   statistics related to their packet processing.     
    
   In simplest terms, we describe the capabilities of each stage simply 
   by listing the names of the items in each of the five categories 
   that that stage supports.  This approach is illustrated in the 
   following updated capability exchange. 
    
   - The number of stages supported. 
   - For each stage: 
     - The USI. 
     - The logical function name (from the namespace) that this stage 
      implements. 
      
     - The number of properties supported by the stage. 
     - For each property: 
       - The name of the property from the property namespace. 
        
     - The number of properties supported by the stage. 
     - For each action: 
       - The name of the action from the action namespace. 
        
     - The number of parameters supported by the stage. 
     - For each parameter: 
       - The name of the parameter from the parameter namespace. 
        
     - The number of events supported by the stage. 
     - For each event: 
       - The name of the event from the event namespace. 
        
     - The number of statistics supported by the stage. 
     - For each statistic: 
       - The name of the statistic from the statistic namespace. 
        
     - The number of downstream stages to which this stage can send 
      packets. 
     - For each downstream stage: 
       - The USI of the downstream stage. 
       - A label for this exit point (i.e., target) from the stage. 



  
Anderson                                                     [Page 13] 



    
    
   The following paragraphs describe in more detail how the 
   classification, action, parameter, event and statistics capabilities 
   are expressed. 
    
5.4.1. Classification Capabilities 
    
   The classification capabilities of a stage are expressed in our 
   model through a variable length sequence of "properties."  Each 
   property in the sequence indicates that the stage is capable of 
   including that property in any of the classification entries for 
   that stage.  Properties come in two varieties: packet properties and 
   metadata (tag) properties.  Packet properties are those protocol 
   fields that occur explicitly in packets.  For example, in the IP 
   protocol, the version, type of service bits, fragment offset, time-
   to-live, protocol, source address, and destination address are 
   potentially useful packet properties for classification.  Other 
   examples of useful packet properties include UDP source/destination 
   port, TCP source/destination port, and ICMP type and code fields.  
   Metadata (tag) properties are those values associated with a packet 
   that do not occur explicitly in the packet.  For example, the 
   "ingress port" tag may be associated with a packet by the ingress 
   stage.  This tag indicates by which port the packet entered the FE.  
   This tag may be useful to classify on in subsequent stages.  For 
   example, some stages may give preferential treatment to packets 
   arriving on a certain port because that port is associated with a 
   customer receiving premium service.  Without the "ingress port" tag, 
   subsequent stages would have no way of knowing on which port a 
   packet entered the FE.  As another example, if the forwarder stage 
   is processing a multicast packet, that stage may need to know what 
   port the packet came in on so that the forwarder does not send the 
   packet back along the original link.  In order to exchange property 
   information, we must agree on how to represent the presence of 
   absence of a property.  This model allocates a property namespace 
   for this purpose.  This namespace is shared across all stages 
   because many stages will classify on the same properties (e.g., 
   ingress/egress port number or destination IP address). 
    
5.4.2. Action Capabilities 
    
   Similarly, the action capabilities of a stage are represented by a 
   logical sequence of "actions."  Each action in the sequence 
   indicates that the stage is capable of having that action associated 
   with one of the stage∆s classification entries.  Actions come in 
   three varieties.  The first type of action edits (e.g., changes a 
   field, inserts/removes a header) the current packet being processed.  
   The second type of action associates or dissociates a piece of 
   metadata (tag) with the packet being processed.  The third type of 
   action selects a target (i.e., downstream stage) for the packet.  
   For example, the action provided by the forwarder stage typically 
   associates the "forwarding decision" tag with a packet.  (The 



  
Anderson                                                     [Page 14] 



   forwarding decision tag is a parameterized tag that specifies which 
   interface(s) the packet should be sent out and what the next hop IP 
   address is of the next router(s).)  The egress stage then logically 
   classifies on this forwarding decision tag to determine which 
   interface to send the packet out.  As another example, the Meter 
   stage may be configured to either drop packets exceeding a certain 
   rate limit or it may be configured to simply "tag" those packets 
   (e.g., with the "exceeding guaranteed rate" tag).  A subsequent 
   stage may be configured to drop or pass packets tagged this way 
   depending on some other characteristic of the system.  In contrast, 
   NAT stages would use the first type of action to edit the current 
   packet by rewriting the source or destination IP address.  Some 
   stages may be configured to drop packets matching certain 
   classifiers.  Drop may be seen as removing all the headers and 
   payload from the packet and removing all associated metadata 
   properties as well.  Like properties, this model allocates a 
   namespace for the identification of different actions.  This 
   namespace is shared across all stages because different stages may 
   share the same action (e.g., drop). 
    
5.4.3. Parameter Capabilities 
    
   The parameters supported by a stage are expressed by a logical 
   sequence of "parameters."  Each parameter in the sequence represents 
   one of the knobs or dials used by the stage.  A namespace is 
   allocated for the identification of parameters.  This namespace is 
   shared across all stages because stages may share the same 
   parameters. 
    
5.4.4. Event Capabilities 
    
   The events supported by a stage are expressed by a logical sequence 
   of "events."  Each event in the sequence represents one of the 
   events that the FE may be configured to send to the CE when the 
   event happens.  A namespace is allocated for the identification of 
   events.  This namespace is shared across all stages because stages 
   may share the same events (e.g., packet redirection or link 
   up/down). 
    
5.4.5. Statistics Capabilities 
    
   The statistics collected by a stage are expressed by a logical 
   sequence of "statistics."  Each statistic in the sequence represents 
   one of the statistics maintained by the stage.  A namespace is 
   allocated for the identification of statistics.  This namespace is 
   shared across all stages because stages may share the same 
   statistics (e.g., number of packets processed). 
    
5.5. Read-only Stages 
    
   The FE model must be able to express that certain stages in a FE may 
   not be modifiable by a CE.  However, the model cannot simply ignore 



  
Anderson                                                     [Page 15] 



   these stages, as it may be necessary to understand their 
   functionality to predict the behavior of the FE.  For example, 
   consider the following subset of a FE model.  While the FE may allow 
   the Demux to be configured to select different kinds of traffic to 
   be sent to the A, B, and X targets, the subsequent meters may not be 
   programmable.  However, the behavior of these meters must be known 
   so that the CE can make decisions as to which traffic should be sent 
   to which target (depending on the QoS desired for the traffic). 
    
                      +-----+   +-----+ 
                      |     |   |     |---------------> 
        Demux      +->|     |-->|     |     +-----+   
       +-----+     |  |     |   |     |---->|     |   
       |    A|------  +-----+   +-----+     +-----+    
   --->|    B|-----+  Marker1   Meter1      Absolute   
       |    X|---+ |                        Dropper1   
       +-----+   | |  +-----+   +-----+                
                 | |  |     |   |     |---------------> 
                 | +->|     |-->|     |     +-----+    
                 |    |     |   |     |---->|     |    
                 |    +-----+   +-----+     +-----+    
                 |    Marker2   Meter2      Absolute   
                 |                          Dropper2   
                 |    +-----+   +-----+                
                 |    |     |   |     |---------------> 
                 |--->|     |-->|     |     +-----+ 
                      |     |   |     |---->|     | 
                      +-----+   +-----+     +-----+ 
                      Marker3   Meter3      Absolute 
                                            Dropper3 
    
   Two additions to the model are necessary to support read-only 
   stages: first, a Boolean flag that indicates whether the stage is 
   read-only or not, and second, an agreed upon way of expressing any 
   static classification/action entries.  (There may be static 
   parameters as well, which will need a similar expression.)  In each 
   classification/action entry, there are zero or more properties and 
   one or more actions.  When multiple properties are present, the 
   result is a logical AND of each property (e.g., if destination IP 
   address==X AND IP protocol==TCP AND TCP destination port 
   number==80).  When multiple actions are present, all those actions 
   are performed on matching packets.  To represent each property or 
   action, a type/length/value (TLV) approach is used.  The names 
   defined the property and action namespaces are suitable as the type 
   in the TLV.  The length of the TLV is an appropriately sized integer 
   and represents the size of the "value" portion of the TLV.  The 
   value portion of the TLV may itself have some structure and it is 
   therefore necessary to standardize a data structure that corresponds 
   to each type in the namespace.  Combining all these concepts 
   together, the following model is used to express the static 
   classification/action entries: 
    



  
Anderson                                                     [Page 16] 



   - The number of static classification/action entries. 
   - For each entry: 
     - The number of properties. 
     - The number of actions. 
     - For each property: 
       - The name of the property. 
       - The length of the property. 
       - The value of the property (using the data structure 
          corresponding to the given name.) 
     - For each action: 
       - The name of the action. 
       - The length of the action. 
       - The value of the action (using the data structure 
          corresponding to the given name.) 
 
5.6. TLV Errata 
    
   The capability exchange shown in Section 5.4 represents an all-or-
   nothing approach to the five categories of capabilities.  For 
   example, either you support all types of classification (e.g., equal 
   to, not equal to, range matching, inverse range matching) for all 
   values of a property or you support no classification for that 
   field.  However, in practice, things are often not as simple.  For 
   example, some stages may be able to classify on specific values for 
   certain fields but no others, or a stage may be able to match the IP 
   protocol field for either TCP or UDP but nothing else.  The FE model 
   must therefore be capable of expressing these sorts of restrictions 
   on the values associated with any of the five categories of 
   capabilities.  To express these restrictions, no longer can we 
   describe capabilities by listing the names of supported items in 
   each of the five namespaces.  Instead, along with each supported 
   item, the model must describe any restrictions associated with that 
   item.  The model describes these restrictions in the following way. 
    
   Like section 5.5, a TLV structure is used.  However, each TLV 
   contains two values instead of one.  The first value represents the 
   bottom of a range of allowable values for the item while the second 
   value represents the top of a range of allowable values.  It is 
   important to note the difference between the ability to select one 
   specific value in a range between A and B and the ability to select 
   a range of values, C-D, between A and B (A < C < D < B).  The two 
   values in the TLV represent A and B but do not imply the ability to 
   do range checking.  In fact, several different kinds of matching are 
   capable with the specific range of values.  There is "equal to" 
   matching (e.g., does field X have the value C, where A < C < B?), 
   "not equal to" matching (e.g., is X not equal to C?), "less than" 
   matching, "not less than" matching, "inside range" matching (e.g., 
   is X in C-D?), and "not inside range" matching (e.g., is X not in C-
   D?).  "Less than" and "not less than" matching are specialized forms 
   of range matching and can be expressed in that form given an 
   appropriate lower or upper bound.  We therefore need four additional 
   flags associated with each specified range (i.e., A-B).  These flags 



  
Anderson                                                     [Page 17] 



   indicate whether equal to, not equal to, inside range, or not inside 
   range types of matching are allowed.  Using the property category as 
   an example, the capability expression model becomes the following: 
    
  - The number of properties supported by the stage. 
  - For each property: 
     - The name of the property from the property namespace. 
     - The length of the value portion associated with this property. 
     - A flag indicating whether "equal to" classification is allowed. 
     - A flag indicating whether "not equal to" classification is 
       allowed. 
     - A flag indicating whether "inside range" classification is 
       allowed. 
     - A flag indicating whether "not inside range" classification is 
       allowed. 
     - The bottom of a range of values, using the data structure 
       associated with the given property. 
     - The top of a range of values, using the data structure 
       associated with the given property. 
    
   The previous paragraph describes capabilities inside one contiguous 
   range.  This paragraph describes how capabilities are represented in 
   non-contiguous ranges, as in the one that motivated this section 
   (i.e., matching the IP protocol field for TCP or UDP only).  To 
   express capabilities for non-contiguous ranges, multiple 
   capabilities entries are used, each having the same name from the 
   chosen namespace.  For example, to express our motivating example, 
   the following two entries are used. 
    
   - 2 properties entries to follow. 
   - Entry 1: 
     - Name: IP protocol 
     - Length: two octets. 
     - Equal to: True 
     - Not equal to: False 
     - Inside range: False 
     - Not inside range: False 
     - Bottom: 6, TCP 
     - Top: 6, TCP 
   - Entry 2: 
     - Name: IP protocol 
     - Length: two octets. 
     - Equal to: True 
     - Not equal to: False 
     - Inside range: False 
     - Not inside range: False 
     - Bottom: 17, UDP 
     - Top: 17, UDP 
      
     Unlike properties, the other four categories have no need for the 
     flags indicating the four types of classification.  However, the 
     other four categories still do need the bottom and top of range to 



  
Anderson                                                     [Page 18] 



     indicate the range of allowable values from which the CE can 
     select only one. 
    
5.7. Completed Capability Exchange 
    
   Having updated the capability exchange data model to express each 
   stage's capabilities according to the five categories, the 
   capability exchange consists of the following information: 
    
   - The number of stages supported. 
   - For each stage: 
     - The USI. 
     - The logical function name (from the namespace) that this stage 
      implements. 
      
     - The number of properties supported by the stage. 
     - For each property: 
       - The name of the property from the property namespace. 
       - The length of the value portion associated with this property. 
       - A flag indicating whether "equal to" classification is 
          allowed. 
       - A flag indicating whether "not equal to" classification is 
          allowed. 
       - A flag indicating whether "inside range" classification is 
          allowed. 
       - A flag indicating whether "not inside range" classification is 
          allowed. 
       - The bottom of a range of values, using the data structure 
          associated with the given property. 
       - The top of a range of values, using the data structure 
          associated with the given property. 
        
     - The number of actions supported by the stage. 
     - For each action: 
       - The name of the action from the action namespace. 
       - The length of the value portion associated with this action. 
       - The bottom of a range of values, using the data structure 
          associated with the given action. 
       - The top of a range of values, using the data structure 
          associated with the given action. 
        
     - The number of parameters supported by the stage. 
     - For each parameter: 
       - The name of the parameter from the parameter namespace. 
       - The length of the value portion associated with this 
          parameter. 
       - The bottom of a range of values, using the data structure 
          associated with the given parameter. 
       - The top of a range of values, using the data structure 
          associated with the given parameter. 
        
     - The number of events supported by the stage. 



  
Anderson                                                     [Page 19] 



     - For each event: 
       - The name of the event from the event namespace. 
       - The length of the value portion associated with this event. 
       - The bottom of a range of values, using the data structure 
          associated with the given event. 
       - The top of a range of values, using the data structure 
          associated with the given event. 
        
     - The number of statistics supported by the stage. 
     - For each statistic: 
       - The name of the statistic from the statistic namespace. 
       - The length of the value portion associated with this 
          statistic. 
       - The bottom of a range of values, using the data structure 
          associated with the given statistic. 
       - The top of a range of values, using the data structure 
          associated with the given statistic. 
        
     - A flag indicating whether the stage is read-only. 
      
     - The number of static classification/action entries. 
     - For each static classification/action entry: 
      - The number of properties. 
      - The number of actions. 
      - For each property: 
        - The name of the property. 
        - The length of the property. 
        - The value of the property (using the data structure 
           corresponding to the given name.) 
      - For each action: 
        - The name of the action. 
        - The length of the action. 
        - The value of the action (using the data structure 
           corresponding to the given name.) 
 
     - The number of static parameters. 
     - For each static parameter: 
        - The name of the parameter. 
        - The length of the parameter. 
        - The value of the parameter (using the data structure 
           corresponding to the given name.) 
      
     - The number of downstream stages to which this stage can send 
       packets. 
     - For each downstream stage: 
       - The USI of the downstream stage. 
       - A label for this exit point (i.e., target) from the stage. 
    
6. Applicability to RFC1812 
    
   [To be done.] 
    



  
Anderson                                                     [Page 20] 



7. Security Considerations 
    
   Significant security considerations need to be documented but were 
   not done in time for submission. Next revision will begin to address 
   these issues. 

 

8. References 

 

   [FORCES-REQ] T. Anderson, et. al., "Requirements for Separation of 

              IP Control and Forwarding", work in progress, September 

              2001, <draft-anderson-forces-req-02.txt>. 

 

9. Authors' Addresses 

 

   Todd A. Anderson 
   Intel Labs 
   2111 NE 25th Avenue 
   Hillsboro, OR 97124 USA 
   Phone: +1 503 712 1760 
Email: todd.a.anderson@intel.com 
 
   1. Abstract........................................................1 
   2. Definitions.....................................................2 
   3. Introduction....................................................4 
   4. Architecture....................................................4 
      4.1. Control Elements...........................................5 
      4.2. Forwarding Elements........................................6 
      4.3. CE Managers................................................6 
      4.4. FE Managers................................................6 
      4.5. Gl Reference Point.........................................6 
      4.6. Gf Reference Point.........................................7 
      4.7. Gc Reference Point.........................................7 
      4.8. Gi Reference Point.........................................7 
      4.9. Gp Reference Point.........................................7 
      4.10. Gr Reference Point........................................8 
   5. FE Model........................................................8 
      5.1. Introduction...............................................8 
      5.2. Model Approach.............................................9 
      5.3. Logical Functions and Topology............................10 
      5.4. Stage Capabilities........................................11 
         5.4.1. Classification Capabilities..........................14 
         5.4.2. Action Capabilities..................................14 
         5.4.3. Parameter Capabilities...............................15 
         5.4.4. Event Capabilities...................................15 
         5.4.5. Statistics Capabilities..............................15 
      5.5. Read-only Stages..........................................15 
      5.6. TLV Errata................................................17 
      5.7. Completed Capability Exchange.............................19 
   6. Applicability to RFC1812.......................................20 
   7. Security Considerations........................................21 
   8. References.....................................................21 



  
Anderson                                                     [Page 21] 



   9. Authors' Addresses.............................................21 
       





















































  
Anderson                                                     [Page 22]