Internet Draft T. Anderson Expiration: April 2002 Intel Labs File: draft-anderson-forces-model-00.txt November 2001 Working Group: ForCES ForCES Architectural Framework and FE Functional Model draft-anderson-forces-framework-00.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as ``work in progress.'' The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC-2119]. 1. Abstract This document defines an architecture for ForCES network elements and a functional model for ForCES forwarding elements. This model is used to describe the capabilities of ForCES forwarding elements within the context of the ForCES protocol. The architecture and forwarding element model defined herein is intended to satisfy the requirements specified in the ForCES requirements draft [FORCES- REQ]. Anderson [Page 1] 2. Definitions Most of these definitions are copied from the ForCES requirements document [FORCES-REQ]. Addressable Entity (AE) - A physical device that is directly addressable given some interconnect technology. For example, on Ethernet, an AE is a device to which we can communicate using an Ethernet MAC address; on IP networks, it is a device to which we can communicate using an IP address; and on a switch fabric, it is a device to which we can communicate using a switch fabric port number. Physical Forwarding Element (PFE) - An AE that includes hardware used to provide per-packet processing and handling. This hardware may consist of (but is not limited to) network processors, ASIC's, or general-purpose processors. For example, line cards in a forwarding backplane are PFEs. PFE Partition - A logical partition of a PFE consisting of some subset of each of the resources (e.g., ports, memory, forwarding table entries) available on the PFE. This concept is analogous to that of the resources assigned to a virtual router [REQ-PART]. Physical Control Element (PCE) - An AE that includes hardware used to provide control functionality. This hardware typically includes a general-purpose processor. PCE Partition - A logical partition of a PCE consisting of some subset of each of the resources available on the PCE. Forwarding Element (FE) - A logical entity that implements the ForCES protocol. FEs use the underlying hardware to provide per- packet processing and handling as directed by a CE via the ForCES protocol. FEs may use the hardware from PFE partitions, whole PFEs, or multiple PFEs. Proxy FE - A name for a type of FE that cannot directly modify its underlying hardware but instead manipulates that hardware using some intermediate form of communication (e.g., a non-ForCES protocol or DMA). A proxy FE will typically be used in the case where a PFE cannot implement (e.g., due to the lack of a general purpose CPU) the ForCES protocol directly. Control Element (CE) - A logical entity that implements the ForCES protocol and uses it to instruct one or more FEs as to how they should process packets. CEs handle functionality such as the execution of control and signaling protocols. CEs may use the hardware of PCE partitions or whole PCEs. (The use of multiple PCEs will usually be modeled as separate CEs.) Anderson [Page 2] Pre-association Phase - The period of time during which a FE does not know which CE is to control it and vice versa. Post-association Phase - The period of time during which a FE does know which CE is to control it and vice versa. ForCES Protocol - While there may be multiple protocols used within a device supporting ForCES, the term "ForCES protocol" refers only to the ForCES post-association phase protocol (see below). ForCES Post-Association Phase Protocol - The protocol used for post- association phase communication between CEs and FEs. This protocol does not apply to CE-to-CE communication, FE-to-FE communication, or to communication between FE and CE managers. The ForCES protocol is a master-slave protocol in which FEs are slaves and CEs are masters. FE Model - A model that describes the logical processing functions of a FE. FE Manager - A logical entity that operates only in the pre- association phase and is responsible for determining to which CE(s) a FE should communicate. This determination process is called CE discovery and may involve the FE manager learning the capabilities of available CEs. A FE manager may use anything from a static configuration to a pre-association phase protocol (see below) to determine which CE to use. Being a logical entity, a FE manager might be physically combined with any of the other logical entities mentioned in this section. CE Manager - A logical entity that operates only in the pre- association phase and is responsible for determining to which FE(s) a CE should communicate. This determination process is called FE discovery and may involve the CE manager learning the capabilities of available FEs. A CE manager may use anything from a static configuration to a pre-association phase protocol (see below) to determine which FE to use. Being a logical entity, a CE manager might be physically combined with any of the other logical entities mentioned in this section. Pre-association Phase Protocol - A protocol between FE managers and CE managers that helps them determine which CEs or FEs to use. A pre-association phase protocol may include a CE and/or FE capability discovery mechanism. It is important to note that this capability discovery process is wholly separate from (and does not replace) that used within the ForCES protocol. However, the two capability discovery mechanisms may utilize the same FE model (see Section 5). Pre-association phase protocols are not discussed further in this document. ForCES Network Element (NE) - An entity composed of one or more CEs and one or more FEs. To entities outside a NE, the NE represents a single point of management. Similarly, a NE usually hides its Anderson [Page 3] internal organization from external entities. However, one exception to this rule is that CEs and FEs may be directly managed to transition them from the pre-association phase to the post- association phase. ForCES Protocol Element - A FE or CE. High Touch Capability - This term will be used to apply to the capabilities found in some forwarders to take action on the contents or headers of a packet based on content other than what is found in the IP header. Examples of these capabilities include NAT-PT, firewall, and L7 content recognition. Bootstrap CE - The first CE that a FE connects to in a ForCES NE. CE set - One or more equivalently capable CEs designed to operate concurrently (for load sharing) or in a 1+N failover mode (for redundancy). 3. Introduction [TBD] 4. Architecture This section defines a ForCES architectural framework. This ForCES framework consists primarily of ForCES NE's but also includes several ancillary components. ForCES NE's appear to external entities as monolithic pieces of network equipment, e.g., routers, NAT's, firewalls, or load balancers. (See [FORCESREQ], Section 5, Requirement 4.) Internally, however, ForCES NE's are composed of several logical components. By defining logical components and specifying the interactions between them, the ForCES architecture allows these components to be physically separated. This physical separation accrues several benefits to the ForCES architecture. For example, separate components would allow vendors to specialize in one component without having to become experts in all components. Scalability is also provided by this architecture in that additional forwarding or control capacity can be added to existing network elements without the need for forklift upgrades. The components of the ForCES architecture and their relationships are pictured in the following diagram. For convenience, the interactions between components are labeled by reference points Gp, Gc, Gf, Gr, Gl, and Gi. --------------------------------------- | ForCES Network Element | | ------------------- | | | CE Set 1 | | | | | | -------------- Gc | |-----------------| Gr ------------ | | CE Manager |---------+-| Head | CE 2..N |----| CE Set 2 | | Anderson [Page 4] -------------- | | CE | | | | | | | ------------------- ------------ | | Gl | |\ ---------/ | | | | Gp | \ / Gp | Gp | | | | --/----------\ | | -------------- Gf | -------------- -------------- | | FE Manager |---------+-| FE | Gi | FE | | -------------- \ | | |------| | | \ Gf | -------------- -------------- | ----+------------------------/ | --------------------------------------- 4.1. Control Elements This architecture permits multiple CEs to be present in a network element. These CEs may be used for any combination of redundancy, load sharing, or distributed control. Redundancy is the case where one or more CEs are prepared to take over should an active CE fail. Load sharing is the case where two or more CEs are concurrently active and where any request that can be serviced by one of the CEs can also be serviced by any of the other CEs. In both redundancy and load sharing, the CEs involved are equivalently capable. The only difference between these two cases is in terms of how many active CEs there are. Distributed control is the case where two or more CEs are concurrently active but where certain requests can only be serviced by certain CEs. To enable multiple CEs, control in a ForCES NE is handled by one or more CE sets. Each CE set can specialize in handling a particular subset of the control functions of a NE. For example, one CE set may handle routing functions while another may handle firewall or QoS functions. Each CE set is itself composed of multiple CEs. All of the CEs in a CE set are equivalently capable, meaning that each is capable of performing the same set of functions albeit with possibly different performance. The remaining members of a CE set may be used for load sharing or redundancy purposes. Communication between members of a CE set or between CE sets is discussed in Section 4.10. CEs are wholly responsible for coordinating amongst themselves to provide redundancy, load sharing, or distributed control, if desired. CEs are concerned with controlling the layer-3 and above capabilities of FEs. CEs are not concerned with controlling the layer-2 and below communication aspects of the FE. While the ForCES model allows for multiple CEs, the coordination of those CEs is beyond the current scope of ForCES. In cases where an implementations uses multiple CEs or CE sets, it is still required that an implementation must maintain the invariant that a single NE MUST NOT appear as multiple NEs even in the presence of link failures between FEs and/or CEs. Anderson [Page 5] 4.2. Forwarding Elements FEs are responsible for per-packet processing and handling as directed by its CEs. FEs have no initiative of their own. Instead, FEs are slaves to their CEs and only do as they are told (Section 4.9). FEs may communicate with one or more CEs, either from the same or different CE sets concurrently. However, FEs have no notion of CE redundancy, load sharing, or distributed control. Instead, FEs accept commands from any CE authorized to control them. This architecture mandates that a coarse grain mapping of requests to CE sets be possible but also allows finer grain mappings. For example, at a minimum, a CE must be able to specify a single CE set to which all requests generated by the FE should be sent. However, the architecture also allows different CE sets to be mapped to different types of requests if the FE is capable of differentiating between request types. This architecture permits multiple FEs to be present in a NE. Each of these FEs may potentially have a different set of capabilities. FEs express these capabilities using the ForCES FE model described in Section 5. FEs are responsible for establishing and maintaining layer-2 connectivity with other FEs or with entities external to the NE. Thus, FEs are also responsible for any signaling required at layer-2. 4.3. CE Managers CE managers are responsible for determining which FEs a CE should control. It is legitimate for CE managers to be hard-coded with the knowledge of with which FEs its CEs should communicate. Likewise, CE managers can communicate with any other entity or perform any kind of computation to make that determination. 4.4. FE Managers FE managers are responsible for determining to which CE any particular FE should initially communicate. Like CE managers, no restrictions are placed on how a FE manager decides to which CEs its FEs should communicate. The FE manager can be hard-coded with this information or communicate with any other entity to make that determination. 4.5. Gl Reference Point CE managers and FE managers may communicate with each other across the Gl reference point in order to help them decide which CEs and FEs should communicate with each other. Communication across the Gl reference point is entirely optional in this architecture. No requirements are placed on this reference point. CE managers and FE managers may be operated by different entities. The operator of the CE manager may not want to divulge, except to Anderson [Page 6] specified FE managers, any characteristics of the CEs it manages. Similarly, the operator of the FE manager may not want to divulge FE characteristics, except to authorized entities. As such, CE managers and FE managers may need to authenticate one another. Subsequent communication between CE managers and FE managers may require other security functions such as privacy, non-repudiation, freshness, and integrity. Once the necessary security functions have been performed, the CE and FE managers MAY communicate to determine which CEs and FEs should communicate with each other. In this process, the CE and FE managers will likely learn of the existence of available FEs and CEs respectively. This process is called discovery and will likely entail one or both managers learning the capabilities of the discovered ForCES protocol elements. 4.6. Gf Reference Point The Gf reference point is used to inform forwarding elements of the decisions made by FE managers. Only authorized entities may instruct a FE with respect to which CE should control it. Therefore, authentication is a necessary between FE managers and FEs. Privacy, integrity, and freshness are also required. Once the appropriate security has been established, FE managers may instruct FEs across this reference point to join a new NE or to disconnect from an existing NE. 4.7. Gc Reference Point The Gc reference point is used to inform control elements of the decisions made by CE managers. Only authorized entities may instruct a CE to control certain FEs. Privacy, integrity, and freshness are also required across this reference point. Once appropriate security has been established, the CE manager may instruct CEs as to which FEs they should control and how they should control them. 4.8. Gi Reference Point Packets that enter the NE via one FE and leave the NE via a different FE are transferred between FEs across the Gi reference point. (See [FORCESREQ], Section 5, Requirement 3.) 4.9. Gp Reference Point Based on the information acquired through CEs' control processing, CEs will frequently need to manipulate the packet-forwarding behaviors of their FE(s). This manipulation of the forwarding plane is performed across the Gp ("p" meaning protocol) reference point. In this architecture, the ForCES protocol is exclusively used for all communication across the Gp reference point. Anderson [Page 7] 4.10. Gr Reference Point Varying degrees of synchronization are necessary to provide redundancy, load sharing or distributed control. However, in all cases, consistency protocols between CEs take place across the Gr reference point and are out of the scope of this document. Likewise, detecting the inability to synchronize due to a loss of connectivity between CEs is out of the scope of this document. It is not necessary to define any protocols across the Gr reference point to enable simple control/forwarding separation (i.e., single CE and multiple FEs). However, to make it possible to define Gr at a later time, the concept of CE sets and the associated CE/FE behavior should be included in the first versions of the ForCES protocol. From the basic CE set building block concept, protocols across the Gr reference point can be defined to provide the desired effect. 5. FE Model This section describes a model that can be used to express the capabilities of a ForCES FE. (As we will see, this model can also be used as the basis to control a FE's capabilities.) This model satisfies the requirements set forth in ForCES requirements document [FORCES-REQ] with respect to FE modeling. Our model is composed of two level hierarchy of detail. The higher level of the hierarchy expresses which logical data path elements exist in the FE and describes how these elements are interconnected. We call these logical data path elements "stages." The lower level of the hierarchy expresses the capabilities of each stage that the FE provides. In general, the lower level expresses these capabilities in terms of five categories: 1) what information the stage uses to classify packets, 2) once classified, the actions the stage can perform on the packet, 3) the statistics the stage collects in this process, 4) the asynchronous events the stage may send to the CE as part of this process, and 5) the parameters that the stage uses to control its overall behavior. 5.1. Introduction The ForCES architecture allows Forwarding Elements (FEs) of varying functionality to participate in a ForCES network element. The implication of this varying functionality is that CEs can make only minimal assumptions about the functionality provided by its FEs. Instead, CEs discover the capabilities of their FEs. [FORCES-REQ] mandates that this capability information be expressed in the form of a FE model. [FORCES-REQ] further requires that this FE model describe which logical functions (i.e., stages) are present in the FE and in which order these stages are performed. See [FORCES-REQ] for types of logical functions that this model must support. For each logical function, [FORCES-REQ] also requires that the FE model be able to describe each stageÆs "capabilities." Anderson [Page 8] A stage's capabilities clarify what the stage does but not how it does it. (There is a small exception to this described later for the case where the FE allows the CE to choose which algorithm the stage should use.) For example, a forwarding function may perform a lookup on destination IP address and mask to find a next hop IP address and egress interface. However, the fact that the forwarding function uses a Patricia Trie or a CAM to accomplish this lookup is not relevant to the CE. Stage capabilities are best illustrated by the following description of the logical packet-processing model of a stage. Stages logically process packets using the following process. First, the stage receives a packet and performs a classification step on the packet. This classification step finds the highest priority rule (i.e., filter) in the stage's rule set (i.e., classification or rule table) that matches the given packet. Next, the stage performs one or more actions associated with the matching rule. As part of this process, the stage may update certain statistics (e.g., number of packets processed, number of packets matching each filter rule) to reflect the types of packets it has processed. As one of the actions (or occasionally asynchronously), the stage may generate an event for further processing by the CE. For example, a stage may detect that the router alert IP option is present in a packet and would then generate a "packet redirection" event to send the packet to the CE. Finally, some stages may have tunable "knobs" that affect how they process packets. For example, a FE may provide various algorithms for performing a metering function (e.g., average rate, exponentially weighted moving average, token bucket). From this process, we see that the capabilities of stages can be modeled by describing the five logical sets of data maintained by each stage. The first two sets of data are the filtering rules and associated actions that are applied to each packet as they pass through the stage. The third set of data is the statistics maintained by the stage. The fourth set is the current state of the stage's tunable "knobs." Finally, the fifth set is the set of events for which the CE has registered to receive notifications from the stage. Manipulation of these five logical databases can be used as a model for control of each stage. 5.2. Model Approach There are many ways that one could model the packet processing capabilities of a FE. However, as we shall see, there is often a tradeoff between the flexibility of a FE model and the ease with which the CE can interpret that model to provide services. One approach to this problem is to define a number of simple "device types." Each of these device types would have well-known components connected together in well-known ways. For example, we could define a RFC1812 router device type that does a longest prefix match on Anderson [Page 9] destination IP address and mask and forwards packets to the associated next hop IP address. However, since many services (e.g., QoS, firewall, intrusion detection) are being added to network devices, the number of possible device types would be exponential in the number of services. Writing a CE that understood exponentially many device types would be a daunting task. Therefore, one would likely want to restrict the number of devices types to a small set of "likely" devices. Coming up with this set would be difficult. Furthermore, restricting device types would seem to disallow vendors from creating interesting new devices. One could attempt to solve this problem by allowing vendors to define their own proprietary device types but this only leads to another explosion of device types and introduces interoperability problems for CE vendors who do not have access to the description of FE vendors' proprietary device types. The FE model proposed in this document tries to strike a balance between flexibility of the model and ease of use by the CE. The model tries to strike this balance by describing packet processing in two levels of detail. The higher level of detail (Section 5.3) uses the concept of logical functions to make it easier for CEs to determine how to implement a service with a given model. The lower level of detail (Section 5.4) allows great flexibility to express the realization of a logical function chosen by a FE. The model allows arbitrary topologies to be described. While arbitrary topologies make it harder for the CE to understand the FE, it is asserted that static topology (or small set of topologies) is insufficient to describe the types of devices already in use. 5.3. Logical Functions and Topology There are two largely orthogonal parts to the FE model proposed in this draft. The first part provides a way to describe which logical functions are present in a FE and how packets flow between these logical functions. The concept of a logical function is akin to that of an abstract base class in object-oriented terminology. By saying that a FE supports a logical function, what we are really saying is that the FE implements a specific concrete "derived class" version of the logical function. The following inheritance diagram illustrates this concept. Stage / | \ / | \ / | \ / | \ / | \ Logical Forwarder Meter Shaper <======== Function / \ | \ Level / \ | \ / \ | \ RFC1812Fwder WebSwitch Token Leaky <===== Capability Anderson [Page 10] Bucket Bucket Level By describing the FE at this high level, the FE model is able to give a broad overview of what processing a FE may perform on packets. The goal of this part of the FE model is to provide a way for the CE to know which stage(s) to modify to achieve a given service. As such, this model allocates a namespace for the specification of different logical functions. (We expect about 15 to 20 logical functions to be defined initially, e.g., ingress port, egress port, forwarder, meter, marker, shaper, scheduler, queue, encapsulator, decapsulator, encrypter, decrypter, NAT, mux, demux, and editor.) Each FE allocates a FE-unique stage identifier (USI) to each of its stages and passes the USI along with the corresponding logical function name as part of the FE capability description. This allows there to be multiple instances of the same logical function in each FE's model. We will start with a simple version of the model illustrating a capability exchange. In subsequent sections, we will expand the model and refine the same capability exchange. The following is the first version of the capability exchange that indicates which logical functions are present and how they are connected together. - The number of stages supported. - For each stage: - The USI. - The logical function name (from the namespace) that this stage implements. - The number of downstream stages to which this stage can send packets. - For each downstream stage: - The USI of the downstream stage. - A label for this exit point (i.e., target) from the stage. This representation allows zero or more instances of each logical function to be present in a FE model. Furthermore, this representation encodes the topology of the provided stages. Since it is not possible to represent all possible FEs' processing models using a fixed topology, the model presented in this draft allows functions to be connected with largely arbitrary topologies. The only restrictions on topology relate to the source and sink natures of ingress and egress port functions respectively. For example, egress port functions must not have any downstream stages whereas no other stage may refer to an ingress port function as one of its downstream stages. Cycles in the topology are permitted. 5.4. Stage Capabilities This section defines how the capabilities of all the stages in our model can be expressed using a single methodology. We achieve this uniformity by viewing all stages as acting according to the classification/action paradigm. In this paradigm, when a packet logically enters a stage, the stage first performs a classification Anderson [Page 11] on the packet. This classification is performed according to a logical database of classification entries maintained by the stage. Next, the stage performs one or more actions associated with the matching classification entry. Each classification entry contains this set of actions that the stage should perform for all packets that match the entry. This paragraph provides several examples of how the stages identified in Section 3 can be viewed as acting according to the classification/action paradigm. This paradigm is most naturally applied to the generic filtering stages. In those stages, prioritized filters (e.g., ACLs) are installed in a stageÆs logical database. These filters specify which fields in the packet should be evaluated and which values should be present in those fields for the filter to match. In each filter, a pass or drop action is typically specified that determines the disposition of packets matching the filter. This paradigm maps to classical layer 3 forwarding in the following way. The logical database of classification/action entries corresponds to a forwarding table. The entries in this forwarding table have typically consisted of a network address, a network mask, a next hop IP address, and an egress interface number. The network address and mask make up the classification portion of this entry while the next hop IP address and egress interface correspond to a parameterized "forwarding decision" action. The typical longest-prefix match algorithms utilized by forwarding stages are nothing but classification algorithms optimized for a masked match against a packetÆs destination IP address. Finally, the metering stage can also be viewed in terms of classification and action. Meters take a flow specification and some rate limiting parameters (and optionally a rate limiting algorithm). This flow specification may be based on DSCP, 5-tuple or some other arbitrary packet contents. In any case, this flow specification essentially defines a classification entry. The rate limiting parameters are parameters to the specified rate limiting action (or to an assumed rate limiting algorithm when one is not explicitly specified). While most of the functionality of a stage can be described according to the classification/action paradigm, some additional functions remain. These additional functions relate to how the stage as a whole operates (as opposed to how the stage handles individual flows), the kinds of asynchronous notifications that the stage can send to the CE and the types of statistics the stage maintains. While we will often have no control over the algorithm the stage uses to perform its function, there may be certain knobs and dials that we can adjust to control the algorithm. We call these knobs and dials "parameters" to the stage because they resemble parameters to algorithms. For example, one can view an ingress port stage as running an ARP algorithm that responds to ARP requests. In order for the ARP algorithm to know when to respond to an ARP request, the ARP algorithm needs to know the IP addresses of Anderson [Page 12] each port. Thus, IP addresses can be viewed as parameters to the ingress port stage. Next, some stages can be viewed as the originators of asynchronous notifications, i.e., events. These events correspond to occurrences that the CE cannot anticipate. For example, the ingress and egress port stages may be able to send the link up/down event when they detect that their port link state has changed. Likewise, one or more stages may support the packet redirection event for sending well-known control packets to the CE. Since CEs may not want to receive all the events that a FE may generate, the ForCES protocol SHOULD support a registration/deregistration mechanism where the CE can signal its interest in receiving the events that it has discovered via this FE model. Finally, stages may maintain certain statistics related to their packet processing. In simplest terms, we describe the capabilities of each stage simply by listing the names of the items in each of the five categories that that stage supports. This approach is illustrated in the following updated capability exchange. - The number of stages supported. - For each stage: - The USI. - The logical function name (from the namespace) that this stage implements. - The number of properties supported by the stage. - For each property: - The name of the property from the property namespace. - The number of properties supported by the stage. - For each action: - The name of the action from the action namespace. - The number of parameters supported by the stage. - For each parameter: - The name of the parameter from the parameter namespace. - The number of events supported by the stage. - For each event: - The name of the event from the event namespace. - The number of statistics supported by the stage. - For each statistic: - The name of the statistic from the statistic namespace. - The number of downstream stages to which this stage can send packets. - For each downstream stage: - The USI of the downstream stage. - A label for this exit point (i.e., target) from the stage. Anderson [Page 13] The following paragraphs describe in more detail how the classification, action, parameter, event and statistics capabilities are expressed. 5.4.1. Classification Capabilities The classification capabilities of a stage are expressed in our model through a variable length sequence of "properties." Each property in the sequence indicates that the stage is capable of including that property in any of the classification entries for that stage. Properties come in two varieties: packet properties and metadata (tag) properties. Packet properties are those protocol fields that occur explicitly in packets. For example, in the IP protocol, the version, type of service bits, fragment offset, time- to-live, protocol, source address, and destination address are potentially useful packet properties for classification. Other examples of useful packet properties include UDP source/destination port, TCP source/destination port, and ICMP type and code fields. Metadata (tag) properties are those values associated with a packet that do not occur explicitly in the packet. For example, the "ingress port" tag may be associated with a packet by the ingress stage. This tag indicates by which port the packet entered the FE. This tag may be useful to classify on in subsequent stages. For example, some stages may give preferential treatment to packets arriving on a certain port because that port is associated with a customer receiving premium service. Without the "ingress port" tag, subsequent stages would have no way of knowing on which port a packet entered the FE. As another example, if the forwarder stage is processing a multicast packet, that stage may need to know what port the packet came in on so that the forwarder does not send the packet back along the original link. In order to exchange property information, we must agree on how to represent the presence of absence of a property. This model allocates a property namespace for this purpose. This namespace is shared across all stages because many stages will classify on the same properties (e.g., ingress/egress port number or destination IP address). 5.4.2. Action Capabilities Similarly, the action capabilities of a stage are represented by a logical sequence of "actions." Each action in the sequence indicates that the stage is capable of having that action associated with one of the stageÆs classification entries. Actions come in three varieties. The first type of action edits (e.g., changes a field, inserts/removes a header) the current packet being processed. The second type of action associates or dissociates a piece of metadata (tag) with the packet being processed. The third type of action selects a target (i.e., downstream stage) for the packet. For example, the action provided by the forwarder stage typically associates the "forwarding decision" tag with a packet. (The Anderson [Page 14] forwarding decision tag is a parameterized tag that specifies which interface(s) the packet should be sent out and what the next hop IP address is of the next router(s).) The egress stage then logically classifies on this forwarding decision tag to determine which interface to send the packet out. As another example, the Meter stage may be configured to either drop packets exceeding a certain rate limit or it may be configured to simply "tag" those packets (e.g., with the "exceeding guaranteed rate" tag). A subsequent stage may be configured to drop or pass packets tagged this way depending on some other characteristic of the system. In contrast, NAT stages would use the first type of action to edit the current packet by rewriting the source or destination IP address. Some stages may be configured to drop packets matching certain classifiers. Drop may be seen as removing all the headers and payload from the packet and removing all associated metadata properties as well. Like properties, this model allocates a namespace for the identification of different actions. This namespace is shared across all stages because different stages may share the same action (e.g., drop). 5.4.3. Parameter Capabilities The parameters supported by a stage are expressed by a logical sequence of "parameters." Each parameter in the sequence represents one of the knobs or dials used by the stage. A namespace is allocated for the identification of parameters. This namespace is shared across all stages because stages may share the same parameters. 5.4.4. Event Capabilities The events supported by a stage are expressed by a logical sequence of "events." Each event in the sequence represents one of the events that the FE may be configured to send to the CE when the event happens. A namespace is allocated for the identification of events. This namespace is shared across all stages because stages may share the same events (e.g., packet redirection or link up/down). 5.4.5. Statistics Capabilities The statistics collected by a stage are expressed by a logical sequence of "statistics." Each statistic in the sequence represents one of the statistics maintained by the stage. A namespace is allocated for the identification of statistics. This namespace is shared across all stages because stages may share the same statistics (e.g., number of packets processed). 5.5. Read-only Stages The FE model must be able to express that certain stages in a FE may not be modifiable by a CE. However, the model cannot simply ignore Anderson [Page 15] these stages, as it may be necessary to understand their functionality to predict the behavior of the FE. For example, consider the following subset of a FE model. While the FE may allow the Demux to be configured to select different kinds of traffic to be sent to the A, B, and X targets, the subsequent meters may not be programmable. However, the behavior of these meters must be known so that the CE can make decisions as to which traffic should be sent to which target (depending on the QoS desired for the traffic). +-----+ +-----+ | | | |---------------> Demux +->| |-->| | +-----+ +-----+ | | | | |---->| | | A|------ +-----+ +-----+ +-----+ --->| B|-----+ Marker1 Meter1 Absolute | X|---+ | Dropper1 +-----+ | | +-----+ +-----+ | | | | | |---------------> | +->| |-->| | +-----+ | | | | |---->| | | +-----+ +-----+ +-----+ | Marker2 Meter2 Absolute | Dropper2 | +-----+ +-----+ | | | | |---------------> |--->| |-->| | +-----+ | | | |---->| | +-----+ +-----+ +-----+ Marker3 Meter3 Absolute Dropper3 Two additions to the model are necessary to support read-only stages: first, a Boolean flag that indicates whether the stage is read-only or not, and second, an agreed upon way of expressing any static classification/action entries. (There may be static parameters as well, which will need a similar expression.) In each classification/action entry, there are zero or more properties and one or more actions. When multiple properties are present, the result is a logical AND of each property (e.g., if destination IP address==X AND IP protocol==TCP AND TCP destination port number==80). When multiple actions are present, all those actions are performed on matching packets. To represent each property or action, a type/length/value (TLV) approach is used. The names defined the property and action namespaces are suitable as the type in the TLV. The length of the TLV is an appropriately sized integer and represents the size of the "value" portion of the TLV. The value portion of the TLV may itself have some structure and it is therefore necessary to standardize a data structure that corresponds to each type in the namespace. Combining all these concepts together, the following model is used to express the static classification/action entries: Anderson [Page 16] - The number of static classification/action entries. - For each entry: - The number of properties. - The number of actions. - For each property: - The name of the property. - The length of the property. - The value of the property (using the data structure corresponding to the given name.) - For each action: - The name of the action. - The length of the action. - The value of the action (using the data structure corresponding to the given name.) 5.6. TLV Errata The capability exchange shown in Section 5.4 represents an all-or- nothing approach to the five categories of capabilities. For example, either you support all types of classification (e.g., equal to, not equal to, range matching, inverse range matching) for all values of a property or you support no classification for that field. However, in practice, things are often not as simple. For example, some stages may be able to classify on specific values for certain fields but no others, or a stage may be able to match the IP protocol field for either TCP or UDP but nothing else. The FE model must therefore be capable of expressing these sorts of restrictions on the values associated with any of the five categories of capabilities. To express these restrictions, no longer can we describe capabilities by listing the names of supported items in each of the five namespaces. Instead, along with each supported item, the model must describe any restrictions associated with that item. The model describes these restrictions in the following way. Like section 5.5, a TLV structure is used. However, each TLV contains two values instead of one. The first value represents the bottom of a range of allowable values for the item while the second value represents the top of a range of allowable values. It is important to note the difference between the ability to select one specific value in a range between A and B and the ability to select a range of values, C-D, between A and B (A < C < D < B). The two values in the TLV represent A and B but do not imply the ability to do range checking. In fact, several different kinds of matching are capable with the specific range of values. There is "equal to" matching (e.g., does field X have the value C, where A < C < B?), "not equal to" matching (e.g., is X not equal to C?), "less than" matching, "not less than" matching, "inside range" matching (e.g., is X in C-D?), and "not inside range" matching (e.g., is X not in C- D?). "Less than" and "not less than" matching are specialized forms of range matching and can be expressed in that form given an appropriate lower or upper bound. We therefore need four additional flags associated with each specified range (i.e., A-B). These flags Anderson [Page 17] indicate whether equal to, not equal to, inside range, or not inside range types of matching are allowed. Using the property category as an example, the capability expression model becomes the following: - The number of properties supported by the stage. - For each property: - The name of the property from the property namespace. - The length of the value portion associated with this property. - A flag indicating whether "equal to" classification is allowed. - A flag indicating whether "not equal to" classification is allowed. - A flag indicating whether "inside range" classification is allowed. - A flag indicating whether "not inside range" classification is allowed. - The bottom of a range of values, using the data structure associated with the given property. - The top of a range of values, using the data structure associated with the given property. The previous paragraph describes capabilities inside one contiguous range. This paragraph describes how capabilities are represented in non-contiguous ranges, as in the one that motivated this section (i.e., matching the IP protocol field for TCP or UDP only). To express capabilities for non-contiguous ranges, multiple capabilities entries are used, each having the same name from the chosen namespace. For example, to express our motivating example, the following two entries are used. - 2 properties entries to follow. - Entry 1: - Name: IP protocol - Length: two octets. - Equal to: True - Not equal to: False - Inside range: False - Not inside range: False - Bottom: 6, TCP - Top: 6, TCP - Entry 2: - Name: IP protocol - Length: two octets. - Equal to: True - Not equal to: False - Inside range: False - Not inside range: False - Bottom: 17, UDP - Top: 17, UDP Unlike properties, the other four categories have no need for the flags indicating the four types of classification. However, the other four categories still do need the bottom and top of range to Anderson [Page 18] indicate the range of allowable values from which the CE can select only one. 5.7. Completed Capability Exchange Having updated the capability exchange data model to express each stage's capabilities according to the five categories, the capability exchange consists of the following information: - The number of stages supported. - For each stage: - The USI. - The logical function name (from the namespace) that this stage implements. - The number of properties supported by the stage. - For each property: - The name of the property from the property namespace. - The length of the value portion associated with this property. - A flag indicating whether "equal to" classification is allowed. - A flag indicating whether "not equal to" classification is allowed. - A flag indicating whether "inside range" classification is allowed. - A flag indicating whether "not inside range" classification is allowed. - The bottom of a range of values, using the data structure associated with the given property. - The top of a range of values, using the data structure associated with the given property. - The number of actions supported by the stage. - For each action: - The name of the action from the action namespace. - The length of the value portion associated with this action. - The bottom of a range of values, using the data structure associated with the given action. - The top of a range of values, using the data structure associated with the given action. - The number of parameters supported by the stage. - For each parameter: - The name of the parameter from the parameter namespace. - The length of the value portion associated with this parameter. - The bottom of a range of values, using the data structure associated with the given parameter. - The top of a range of values, using the data structure associated with the given parameter. - The number of events supported by the stage. Anderson [Page 19] - For each event: - The name of the event from the event namespace. - The length of the value portion associated with this event. - The bottom of a range of values, using the data structure associated with the given event. - The top of a range of values, using the data structure associated with the given event. - The number of statistics supported by the stage. - For each statistic: - The name of the statistic from the statistic namespace. - The length of the value portion associated with this statistic. - The bottom of a range of values, using the data structure associated with the given statistic. - The top of a range of values, using the data structure associated with the given statistic. - A flag indicating whether the stage is read-only. - The number of static classification/action entries. - For each static classification/action entry: - The number of properties. - The number of actions. - For each property: - The name of the property. - The length of the property. - The value of the property (using the data structure corresponding to the given name.) - For each action: - The name of the action. - The length of the action. - The value of the action (using the data structure corresponding to the given name.) - The number of static parameters. - For each static parameter: - The name of the parameter. - The length of the parameter. - The value of the parameter (using the data structure corresponding to the given name.) - The number of downstream stages to which this stage can send packets. - For each downstream stage: - The USI of the downstream stage. - A label for this exit point (i.e., target) from the stage. 6. Applicability to RFC1812 [To be done.] Anderson [Page 20] 7. Security Considerations Significant security considerations need to be documented but were not done in time for submission. Next revision will begin to address these issues. 8. References [FORCES-REQ] T. Anderson, et. al., "Requirements for Separation of IP Control and Forwarding", work in progress, September 2001, . 9. Authors' Addresses Todd A. Anderson Intel Labs 2111 NE 25th Avenue Hillsboro, OR 97124 USA Phone: +1 503 712 1760 Email: todd.a.anderson@intel.com 1. Abstract........................................................1 2. Definitions.....................................................2 3. Introduction....................................................4 4. Architecture....................................................4 4.1. Control Elements...........................................5 4.2. Forwarding Elements........................................6 4.3. CE Managers................................................6 4.4. FE Managers................................................6 4.5. Gl Reference Point.........................................6 4.6. Gf Reference Point.........................................7 4.7. Gc Reference Point.........................................7 4.8. Gi Reference Point.........................................7 4.9. Gp Reference Point.........................................7 4.10. Gr Reference Point........................................8 5. FE Model........................................................8 5.1. Introduction...............................................8 5.2. Model Approach.............................................9 5.3. Logical Functions and Topology............................10 5.4. Stage Capabilities........................................11 5.4.1. Classification Capabilities..........................14 5.4.2. Action Capabilities..................................14 5.4.3. Parameter Capabilities...............................15 5.4.4. Event Capabilities...................................15 5.4.5. Statistics Capabilities..............................15 5.5. Read-only Stages..........................................15 5.6. TLV Errata................................................17 5.7. Completed Capability Exchange.............................19 6. Applicability to RFC1812.......................................20 7. Security Considerations........................................21 8. References.....................................................21 Anderson [Page 21] 9. Authors' Addresses.............................................21 Anderson [Page 22]