ForCES Working Group                 Jamal Hadi Salim
Internet Draft                       Znyx Networks
                                     Robert Haas
                                     IBM
                                     December 2002


                      Netlink2 as ForCES protocol
                  draft-jhsrha-forces-netlink2-00.txt


Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026. Internet-Drafts are working
   documents of the Internet Engineering Task Force (IETF), its areas,
   and its working groups. Note that other groups may also distribute
   working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as ``work in progress.''

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.


Conventions used in this document


     The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
     "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in
     this document are to be interpreted as described in [RFC-2119].


1.  Abstract


     This document describes Netlink2, which is an extension of Linux
     Netlink [Netlink].  This document is intended as a proposal for the
     ForCES IETF working group protocol.

     ForCES attempts to define a clear separation between the two enti-
     ties of the NE in order to have them evolve separetely as opposed


draft-jhsrha-forces-netlink2-00.txt                             ^L[Page 1]


jhsrha                               draft-jhsrha-forces-netlink2-00.txt


     to the current monolithic evolution.


2.  Introduction


     The concept of IP control and forwarding separation was first
     introduced in the early 1980s by the BSD 4.4 routing sockets
     [stevens].  The focus at that time was a simple IP(v4) forwarding
     service and how the control plane, either via a command line con-
     figuration tool or a dynamic route daemon, can control forwarding
     tables for that IPv4 forwarding service.

     The IP world has evolved considerably since then. Linux Netlink
     [Netlink], when observed from a service provisioning and management
     point of view, takes routing sockets one step further by breaking
     the narrow focus on IPv4 forwarding.  Since the Linux 2.1 kernel,
     Netlink has been providing the IP service abstraction to a few ser-
     vices other than classical RFC 1812 IPv4 forwarding.

     Netlink2 extends Linux Netlink to meet the requirements of the
     ForCES working group charter for a protocol. Netlink is extended to
     have a distributed addressing and transport scheme, and missing
     mechanisms are added to make Netlink2 meet the ForCES protocol
     requirements [forces_req].  We select to use Netlink as the base
     set because it is freely available.  Netlink is also already proven
     because it is widely deployed with the Linux operating system since
     the 2.1 kernel.

     Netlink2 operates in a mode where knowledge of the NE, its topology
     and modeling MAY have already been discovered, or is discovered
     within the Netlink2 protocol.


2.1.  Why Netlink-derived?


     Netlink was designed with a goal of solving the forwarding and con-
     trol separation. This means that many of the main issues have been
     thought through and resolved over the years. In other words Netlink
     is proven as a protocol addressing separation of forwarding and
     control. Netlink is also network-ready because it uses packet for-
     mating techniques and concepts (eg multicast addressing).  This and
     the availability of publicly running and tested code form a major
     motivator to base Netlink2 on Netlink.


draft-jhsrha-forces-netlink2-00.txt                             ^L[Page 2]


jhsrha                               draft-jhsrha-forces-netlink2-00.txt


2.2.  Definitions

We use the definitions provided in [forces_req], as well as the follow-
ing:

     Forwarding Element Component (FEC): same as Forwarding Engine Com-
     ponents as defined in [Netlink]. This is a component in the FE
     driven by the ForCES protocol in order to achieve a certain ser-
     vice.

     Control Element Component (CEC): same as defined in Control Plane
     Component in [Netlink]. This is a component in the CE that drives
     FEC(s) in order to achieve a certain service.


3.  Extensions to the Netlink Message Format


     To conform to the ForCES requirements [forces_req], the Netlink
     protocol [Netlink] is extended in the following respects:

     1) IP and Transport encapsulations to carry Netlink messages.

     2) Feature expandability extensions to accommodate current generic
     ForCES requirements and make it possible to add more in the future.
     This facilitates things such aspects as authentication, checksum-
     ming, etc, when required.

     With these changes to complement the existing Netlink functional-
     ity, Netlink2 fulfills the requirements to become the ForCES proto-
     col.


3.1.  Netlink Header Extensions


     1) PID redefinition and addition

     In Netlink, PID 0 referred to the equivalent of the FE (kernel).
     The equivalent of the CE (user process) was referred by its OS pro-
     cess id.

     In Netlink2 a PID of the unicastPID type is assigned to each FE and
     CE in the pre-association phase. Different types of PIDs are dis-
     cussed further below. In this way the CE uniquely identifies the FE


draft-jhsrha-forces-netlink2-00.txt                             ^L[Page 3]


jhsrha                               draft-jhsrha-forces-netlink2-00.txt


     and avoids any collision.  We maintain the name PID for historical
     purposes.

     - destination PID: the PID field is redefined as the destination
     PID field. This field identifies the parties on the wire that must
     process the message.

     - source PID: this field is introduced in the header to identify
     the source of the message.

     Different types of PIDs are discussed further below.


     2) The length has been reduced to 16 bits, with length 0 being
     reserved.  The rest of the old 32-bit length field is now split
     between a new version field and a new additional flags field.

     3) A Version field is introduced in the Netlink2 header.  This
     8-bit field is 4 bits major number and 4 bits minor number in the
     form of major:minor. For Netlink2, this becomes: 0x20.

     4) A new Extended Flags field is introduced to take over the
     remainder 8 bits from the 16-bit field taken from the Length. Turn-
     ing different bits on enables additional new features such as pro-
     claiming the presence of extended TLVs etc. Extended Flags also
     introduce the concept of a SYN message which is issued by the FE as
     the first message after the pre-association phase to indicate its
     presence. Also, a FIN flag issued last to indicate departure of the
     FE.

     5) Netlink2-specific TLVs come right after the older Netlink header
     (refer to diagram further below).  They are optional and their
     presence is only indicated if the Extended Flags indicate their
     presence. Typical use of Netlink2-specific TLVs is to compensate
     for capabilities lacking in a transport. For example in an IP net-
     work not deployed with IPSEC, the Netlink2-specific authentication
     TLV could be used to emulate IPSEC-AH.

     Other than these changes, all mechanisms provided by Netlink are
     sufficient to meet the requirements for ForCES. The reader is
     encouraged to refer to that document as a companion to this one.


draft-jhsrha-forces-netlink2-00.txt                             ^L[Page 4]


jhsrha                               draft-jhsrha-forces-netlink2-00.txt


3.1.1.  Netlink2-specific TLVs


     1) Authentication

     [TBD]


     2) Checksum
        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       | TLV Type =12  | TLV Length =2 |       Checksum (16 bits)      |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

     This TLV is optional. To compute the correct checksum, an implemen-
     tation MUST add the optional checksum TLV to the Netlink2 message
     with the initial checksum value of 0 and compute the checksum over
     such a netlink2 message. Refer to [RFC3358] for details on the
     Checksum TLV.


     3) Message Priority

        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       | TLV Type =13  | TLV Length =2 |      Priority  (16 bits)      |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

     This TLV is optional. It is used if the network does not support
     prioritization. This field is used to indicate priorities to the
     remote end.


3.2.  Addressing and Transport Extensions


     We extend Netlink to make it distributed. The focus is on making
     Netlink2 have a strong local scope view of the world while fitting
     well into a global scope when the hop distance between the FE and
     CE increases.

     If the network interconnecting the FE(s) and CE(s) is completely
     hidden from the outside (black-box view), for instance an internal
     Ethernet segment or a switching fabric in which CE(s) and FE(s) are
     connected within physical proximity, then communications between FE
     and CE are assumed to be of a local scope. On the other hand, if


draft-jhsrha-forces-netlink2-00.txt                             ^L[Page 5]


jhsrha                               draft-jhsrha-forces-netlink2-00.txt


     communications between FE and CE cross parts of the network that
     are not hidden from the outside, communications are considered to
     be of global scope.


3.2.1.  Transport Methods


     The ideal environment for Netlink2 is considered to be a multicast-
     capable medium with IP above it and with UDP/TCP/SCTP running over
     IP.

     Netlink2 will run over non-IP, non-multicast-capable environments;
     however, it will require extra processing and messaging by the
     ForCES layer to compensate for services that IP already offers.


3.2.1.1.  Why Multicast?


     Multicast is considered important to facilitate one-to-many/some
     communication.  For example, a single command from a CE can be mul-
     ticast to multiple FEs, which eases the scalability requirements
     mentioned in [forces_req]. This is discussed in later sections.

     When running Netlink2 over non-multicast-capable media, it is
     expected that mechanisms similar to those used in OSPF NBMA
     [RFC2328] networks will be put in place.

3.2.1.2.  Why IP?


     IP runs on virtually every link layer. Leveraging this fact alone
     helps deploying the protocol wider and faster.

     IP also provides numerous services such as assembly and fragmenta-
     tion, prioritization, and security, which are inherent requirements
     for the ForCES protocol.  This means to successfully run an alter-
     native to IP requires that similar services be provided by whatever
     is underneath in order to meet the requirements.

     Netlink2-specific optional TLVs can be used to compensate for lack-
     ing functionality if running on network transport other than IP or
     directly on the link layer.

     Netlink already allows the definition of multipart messages with IP
     segmenting/assembling when the path MTU is exceeded. When running
     on top of non-IP media, the Netlink2 message can be limited to not


draft-jhsrha-forces-netlink2-00.txt                             ^L[Page 6]


jhsrha                               draft-jhsrha-forces-netlink2-00.txt


     exceed the MTU; the multipart messages facility can be then be used
     to provide framing for assembling/segmenting.

     Netlink2-specific Authentication TLV can be used to carry authenti-
     cation signatures in a medium that does not have this capability.

     Netlink2-specific Checksum TLV can be used to carry checksums in a
     medium that does not have this capability.

     Netlink2-specific Message Priority TLV can be used to carry priori-
     tization if transports are not capable of making priorities in
     their headers.


3.2.1.3.  Why UDP/TCP/SCTP?


     On a local scope, it is assumed that multicast UDP over IP is the
     preferred mode of operation.


     On a global scope it is expected that TCP or SCTP would be used for
     enhanced reliability and internet congestion friendliness.

     All three protocols provide 16-bit ports, which are further
     address-demultiplexing points. Also, all three protocols provide
     checksum capability to enhance integrity of the Netlink2 message.
     In the case of UDP, the checksum is optional (which fits the model
     that the local scope is less error-prone than global scope and
     hence the integrity check could be turned on only when needed).

3.2.2.  The Netlink2 wire and bundle


     A Netlink2 wire displays the same behavior as a Netlink wire. It
     interconnects FEs and CEs in order to support services they jointly
     offer.

     The only conceptual difference between a Netlink2 wire and a
     Netlink wire is that whereas the Netlink wire is localized, the
     Netlink2 wire is distributed.

     We also introduce the concept of a Netlink2 bundle.  A Netlink2
     bundle interconnects a set of FE(s) and/or CE(s) by means of one or
     more Netlink2 wires. Note that a Netlink2 bundle does not necessar-
     ily mean a full-mesh interconnection (see examples later on).


draft-jhsrha-forces-netlink2-00.txt                             ^L[Page 7]


jhsrha                               draft-jhsrha-forces-netlink2-00.txt


     Parties (FEs and CEs) on a Netlink2 bundle share a common configu-
     ration, provisioning and event-notification end goals.


     A Netlink2 wire MAY be constructed using a multicast connection or
     a unicast connection or a multiple number of multicast and unicast
     connections.  A wire MUST belong to only one bundle. A bundle may
     have only a single wire (unicast or multicast). In most cases we
     believe there will only be one multicast address for a bundle,
     although scalability issues could require the use of unicast con-
     nections in addition.


     When a multicast IP address is used, a netlink2 wire MUST run over
     UDP - a UDP port is used to uniquely identify the wire. There MAY
     be multiple wires using the same multicast address as long as they
     run over different UDP ports.

     When a unicast IP address is used, the description of how to con-
     nect to an endpoint (CE/FE) is subject to the agreement between the
     CE and FE.  The connection could be directly over IP (do we need an
     IP protocol number?) or via transport-layer ports (TCP/UDP/SCTP).

     In both unicast and multicast wires, the necessary parameters (such
     as IP address and port numbers) can be discovered by the involve-
     ment of the FE and CE Managers.

3.2.2.1.  What wires go in a bundle?


     Netlink2 provides flexibility to have a bundle of purely unicast
     wires or multicast wires or a hybrid of both. The decision of what
     goes into a bundle can be made in the pre-association phase.

     A good analogy is to think of a multicast wire as a broadcast link
     (as is done in Netlink) in which CE(s) and FE(s) are parties
     attached to that broadcast link.

     Depending on the number of FEs and CEs on an NE, a choice of a sin-
     gle multicast wire in the bundle may be sufficient. Multicast
     allows one-to-some messagging.  A single message sent by an origi-
     nator is seen by all parties on the wire. This simplifies synchro-
     nization in an HA environment as well as implementation of the pro-
     tocol.

     The fact that multicast messages are seen by all parties could
     cause scalability issues as the number of nodes grows. Parties need
     to filter out messages not designated for them if they are not the


draft-jhsrha-forces-netlink2-00.txt                             ^L[Page 8]


jhsrha                               draft-jhsrha-forces-netlink2-00.txt


     destination. This can take compute or table resources if filtering
     is done in hardware. The extra messages also consume unnecessary
     bandwidth for FE(s) and CE(s) not interested in seeing these mes-
     sages.

     Unicast wires could be used to create point-to-point connections
     between the parties; when every party is connected to every other
     party, then this becomes a full mesh.

     A full unicast mesh topology removes the need to filter the unnec-
     essary messages but introduces scalability concerns as the number
     of connections required grows quadratically with the number of par-
     ties (FEs and CEs) present. This requires a lot more compute and
     state information to be maintained at each party.  A pure mesh
     topology also complicates HA because more state must be maintained
     (for instance, the IP addresses of the CEs and FEs that are active
     and what their backups are) and therefore needs to perform extra
     processing to achieve failover. This remains transparent if multi-
     cast is used among all parties.

     Netlink2 allows a bundle to have a hybrid of unicast and multicast
     connections. Note this is a model used by other protocols such as
     OSPF over broadcast links where the Hello protocol is multicast but
     responses to LSA updates are unicasted.


     We present some examples of Netlink2 bundles:

     1) A trivial case is a Netlink2 bundle consisting of a single uni-
     cast wire between the CE and FE it interconnects.

     2) Multiple FEs and a CE could be interconnected with a Netlink2
     bundle using a single multicast connection.

     3) In the same example as 2) above, the unicast address of the CE
     could in addition also be used, for instance, to deliver acknowl-
     edgments or notifications from the FEs to the CE, and not be seen
     by all other FEs. The unicast addresses of the FEs could also be
     used, for instance, to deliver certain messages only to a specific
     FE, such as a retransmission of a message in a two-phase commit
     only to an FE that did not respond.
      4) Multiple FEs and CEs could use a wire with two multicast con-
     nections: one for all FEs, the other for all CEs, so that messages
     only relevant to FEs are not seen by CEs and vice-versa.


draft-jhsrha-forces-netlink2-00.txt                             ^L[Page 9]


jhsrha                               draft-jhsrha-forces-netlink2-00.txt


3.2.3.  Redefining the Netlink PID Semantics


     We maintain the name PID for historical purposes and introduce a
     destination PID and a source PID as mentioned earlier.

     For every message received by each party on the wire, the destina-
     tion PID field indicates the recipient of the message. The
     addressed party could be either an FE or a CE.

     In addition to Netlink2 wires (unicast or multicast) defining the
     destination of a particular message delivered, the PID types pro-
     vide further control, namely to define which entity actually has to
     process the message. So if the bundle uses only a single multicast
     wire, messages will be heard by all parties on the wire, but only
     those with a matching PID will actually process these messages. We
     introduce special-purpose PIDs addressed to specific listeners on
     the wire.

     The following types of PIDs are defined and can be used in the
     Netlink2 messages. The actual values for the PID of an FE or CE
     must be the same across all wires of the same bundle and must be
     established during the pre-association phase.

     Default values are given. PIDs must be unique within a Netlink2
     wire. They may also be unique within the NE.

     1) unicastPID: allows one to uniquely address an FE or CE. Each
     FE/CE must have such a unicast PID. Only the FE or CE assigned to
     this PID must process an incoming message with such a destination
     PID. Other parties MAY silently discard the message.

     Default value: none.

     2) logicalPID: in addition to unicastPID, an FE/CE MAY have zero or
     more logical PIDs assigned to it. A logicalPID can be used for
     active-backup pairs of FEs: for instance, the active and the backup
     FE have the same logical PID.

     Default value: none.

     3) broadcastPID: all parties on the wire must process an incoming
     message with such a destination PID. An example of a message that
     might be broadcast is when a CE is brought down for maintenance.
     Default value: 0xffffffff

     4) FEbroadcastPID: all FEs on the wire must process an incoming
     message with such a destination PID. Typically a route update from


draft-jhsrha-forces-netlink2-00.txt                            ^L[Page 10]


jhsrha                               draft-jhsrha-forces-netlink2-00.txt


     the CE to all FEs. Other parties (CEs) can silently discard the
     message.

     Default value: 0xefffffff

     5) CEbroadcastPID: all CEs on the wire must process an incoming
     message with such a destination PID. Other parties (FEs) can
     silently discard the message.

     Default value: 0xdfffffff

     A Netlink2 message must have as destination PID one of the PIDs
     types defined above. The source PID of a Netlink message must be of
     the unicastPID or logicalPID type. In addition, if the NLM_F_ACK
     flag is set, then every party processing the message MUST reply
     with an acknowledgment after processing the message.


3.2.4.  Local Scope Addressing and Encapsulation


     At a local scope, the addressing used for a wire is a UDP port on
     top of a multicast IP address.

     Multiple wires can run on one multicast address with further demul-
     tiplex level based on the UDP port.

     The wire addressing parameters MAY be discovered during the pre-
     association phase.


3.2.5.  Global Scope Addressing and Encapsulation


     When addressing a non-local scope the Netlink2 message is encapsu-
     lated over a transport header and shuttled to the remote end where
     it is decapsulated and run as if originating from the local scope
     of that remote end.  The global scope addressing could use any
     transport protocol configured (SCTP, UDP or TCP) as agreed upon in
     the pre-association phase.

     This can be viewed as extensions of the local scope wires.


draft-jhsrha-forces-netlink2-00.txt                            ^L[Page 11]


jhsrha                               draft-jhsrha-forces-netlink2-00.txt


4.  Netlink2 Architecture


     An IP service accomplished by an FE is represented as an FE compo-
     nent (FEC) in the FE. CE components (CEC) in the CE interact with
     FECs over a Netlink2 bundle to execute a certain service. The
     interactions between FECs and CECs are proper to each service and
     are defined using templates as presented in [Netlink].

     For instance, the IPv4 Forwarding service (called NETLINK_ROUTE)
     defines a message template for handling IP routes and the messages
     types to insert, remove, or get a route. The routing CEC(s) and the
     IPv4 Forwarding FEC(s) interact using these message templates and
     message types over the Netlink2 bundle to execute the IPv4 Forward-
     ing service.  The message types in Netlink2 messages allow the FE
     to demultiplex messages to the appropriate FEC.

     Messages of a certain service destined to an FEC can travel on dif-
     ferent Netlink2 wires within the same bundle. Note that an FEC can
     process messages from different bundles.


     Netlink2 by itself does not constitute a protocol, but rather a set
     of base mechanisms that can be picked up depending on service
     requirements.

     The interaction between the FEC and the CPC, as in the Netlink con-
     text, would define a protocol.  Netlink2 provides mechanisms for
     the CP Component and the FE Component to define their own protocol.
     The FEC might continuously get updates from the control-plane com-
     ponent on how to operate the service (e.g. for V4 forwarding, or
     for route additions or deletions).

     Netlink2 messages and mechanisms are used to derive the protocol.
     For example: the FEC and CPC may choose to define a reliable or
     semi-reliable protocol between each other.  By default, however,
     Netlink2 provides an unreliable communication.


4.1.  Protocol Logical Model


     In the diagram below we show a simple FEC<->CEC logical relation-
     ship.  We use the IPv4 Forwarding FEC as an example.


draft-jhsrha-forces-netlink2-00.txt                            ^L[Page 12]


jhsrha                               draft-jhsrha-forces-netlink2-00.txt


                              CE-----------------------------------
                              |    /^^^^^       /^^^^^           |
                              |   |       |     / CEC-2           |
                              |   | CEC-1 |     | COPS  |          |
                              |   | ospfd |     |  PEP  |          |
                              |          /      _____/           |
                              |    _____/           |             |
                              |        |             |             |
                           ****************************************|
                           ************* NETLINK2 BUNDLE ***********
              FE---------- *****************************************.
              |       IPv4 Forwarding|    |           |             |
              |       FEC          |    |           |             |
              |       --------------/ ----|-----------|--------     |
              |       |            /      |           |       |     |
              |       |     .-------.  .-------.   .------.   |     |
              |       |     |ingress|  | IPv4  |   |Egress|   |     |
              |       |     |police |  |Forward|   | QoS  |   |     |
              |       |     |_______|  |_______|   |Sched |   |     |
              |       |                             ------    |     |
              |        ---------------------------------------      |
              |                                                     |
               -----------------------------------------------------


     Netlink2 logically models FECs and CECs in the form of service
     blocks interconnected to each other via a Netlink2 bundle.


     Acknowledgements and responses to messages do not have to be sent
     onto the same wire from which the triggering messages came from but
     MUST be sent on the same bundle to the same originating PID.  For
     instance, a wire interconnecting a CE with multiple FEs using a
     multicast address could be used to send route updates from the CE.
     On the other hand, independent unicast wires from each FE to the CE
     could be used to send back route events or acknowledgments. Note
     that sequencing is done per wire and source PID, and ACKs can
     travel back on any wire of a bundle.


     The Netlink2 wire can be shared or be specific to a service. There
     can be multiple Netlink2 wires bundled in a bundle carrying mes-
     sages of the same service.  In order to reduce (for example to
     avoid extra processing) or restrict the messaging accessible for
     partitioning or security reasons, additional Netlink2 wires can be
     used. A possible partitioning is a Netlink2 bundle per service. In
     the example above the IPv4 Forwarding FEC would be considered a


draft-jhsrha-forces-netlink2-00.txt                            ^L[Page 13]


jhsrha                               draft-jhsrha-forces-netlink2-00.txt


     service.

     Assuming capabilities have been discovered during the pre-associa-
     tion phase (between the FEM and CEM), blocks (CECs or FECs as
     illustrated above) connect to the agreed wires on the Netlink2 bun-
     dle, and listen to receive specific messages. CECs may connect to
     multiple Netlink2 wires if it helps them to control the service
     better.  All blocks (CECs and FECs) dump packets on the Netlink2
     wires.

     FECs or CECs join Netlink2 wires and listen to messages of interest
     for processing or monitoring purposes.

     All messages addressed to the FEC (for example the  IPv4 forwarding
     FEC illustrated above) will have the FE PID agreed upon by both the
     CE and the FE at the pre-association phase.

     FECs (as well as CECs) also process message with the broadcast
     PIDs. They may also process messages destined to other FECs (as
     well as CECs) for availability synchronization purposes.

     A further demultiplexing point is the command type in the Netlink2
     message.  Each of the blocks in an FEC (e.g., the ingress police
     block above) knows how to respond to a specific command-set as
     defined by the Netlink2 message type (refer to the Netlink2 message
     format and messaging further below).

4.2.  The Message Format


     There are three mandatory levels to a Netlink2 message: The general
     Netlink message header, the IP-service-specific template, and the
     IP-service-specific data. Netlink2-specific TLVs and IP-service-
     specific TLVs are optional.


draft-jhsrha-forces-netlink2-00.txt                            ^L[Page 14]


jhsrha                               draft-jhsrha-forces-netlink2-00.txt


       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      |                   Netlink2 message header                     |
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      |                   Netlink2-specific TLVs (optional)           |
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      |                  IP Service Template                          |
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      |                  IP-Service-specific data in TLVs             |
      |                          (optional)                           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


     The Netlink2 message header is generic for all services, whereas
     the IP Service Template header is specific to a service.  Each IP
     Service then carries parameterization data (CEC->FEC direction) or
     response (FEC->CEC direction). These parameterizations are in
     (Type-Length-Value) TLV format and unique to the service.

     Note that we maintain the same IP Service Templates as in Netlink,
     i.e., nothing has changed here.


4.3.  Protocol Model


     This section expands on how Netlink provides the mechanism for ser-
     vice-oriented FEC and CEC interaction.

4.3.1.  General Messaging


     The Netlink2 message is used to communicate between the FEC and CEC
     for parameterization of the FECs, asynchronous event notification
     of FEC events to the CECs, and statistics querying/gathering (typi-
     cally by a CEC). Other activities include transfer of control pack-
     ets between FEC and CEC.


draft-jhsrha-forces-netlink2-00.txt                            ^L[Page 15]


jhsrha                               draft-jhsrha-forces-netlink2-00.txt


4.3.2.  Service Addressing


     Connecting to a service is achieved by connecting to a defined
     Netlink2 bundle by both the CEC and FEC.  This Netlink2 bundle is
     derived in the pre-association phase.

     A service would typically be related to a specific Netlink2 bundle.
     Command types would be used to configure different FECs (and
     blocks).  This allows reuse of the 16-bit command type with every
     new bundle.

     Connecting to a service is followed (at any point during the life-
     time of the connection) by either issuing a service-specific com-
     mand mostly for configuration purposes (from the CEC to the FEC) or
     for statistics collection.  The FEC could also send event announce-
     ments to the CEC or respond or ACK queries issued by the CEC.


4.3.3.  Netlink2 Message Header

     Netlink2 messages are laid out exactly the same as Netlink mes-
     sages.  Each Netlink2 message contains a byte stream with a
     Netlink2 header followed by its associated payload.

     A single PDU may contain more than one Netlink2 message. This is
     referred to as batching. Netlink batching is reused in Netlink2 and
     allows for messages with different commands (such as adding routes
     and deleting a QoS policy) to be carried in the same batch message.

     A Netlink2 message may be split across multiple PDUs if it does not
     fit into the PDU. This is refereed to as a multipart Netlink2 mes-
     sage and is also inherited from Netlink.

     For multipart messages, the first and all following headers have
     the NLM_F_MULTI Netlink header flag set, except for the last
     header, which has the Netlink header type NLMSG_DONE.

     The Netlink2 message header is shown below.


draft-jhsrha-forces-netlink2-00.txt                            ^L[Page 16]


jhsrha                               draft-jhsrha-forces-netlink2-00.txt


    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
                    0               1               2             3
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |          Length             |         flags_e               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |            Type             |           Flags               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                      Sequence Number                        |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                        source  PID                          |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                       destination  PID                      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                        Optional TLVs                        |
    ~                                                             ~
    ~                                                             ~
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


   The fields in the header are:


          Length: 16 bits
          The length of the Netlink2 message in bytes including the
          header.

          flags_e: 16 bits
          These are extended flags.
                   NLM_F_SYN   Set on the first message.
                               Interpreted as a boot message.
                   NLM_F_FIN   Set on the last message.
                               Interpreted as a departure message.
                   NLM_F_ETLV  Set to indicate presence of extended
                               TLVs.
                   NLM_F_PRIO  Message priority:
                               1 for high and 0 for low. Additional
                               QoS level set in QOS TLV.

                   NLM_F_ASTR  Set the ACK strategy: 1 for partial
                               ACKs and 0 for full ACKs


draft-jhsrha-forces-netlink2-00.txt                            ^L[Page 17]


jhsrha                               draft-jhsrha-forces-netlink2-00.txt


          Type: 16 bits
          This field describes the message content.
          It can be one of the standard message types:
               NLMSG_NOOP  message is ignored
               NLMSG_ERROR the message signals an error and the payload
                           contains a nlmsgerr structure. This can be looked
                           at as a NACK and typically it is from FEC to CEC.
               NLMSG_DONE  message terminates a multipart message

          Individual IP Services specify more message types, for e.g.,
          NETLINK_ROUTE Service specifies several types such as RTM_NEWLINK,
          RTM_DELLINK, RTM_GETLINK, RTM_NEWADDR, RTM_DELADDR, RTM_NEWROUTE,
          RTM_DELROUTE, etc.

          Flags: 16 bits
          The standard flag bits used in Netlink are
                 NLM_F_REQUEST   Must be set on all request messages (typically
                                 from CE to FE)
                 NLM_F_MULTI     Indicates the message is part of a multipart
                                 message terminated by NLMSG_DONE
                 NLM_F_ACK       Request for an acknowledgment on success.
                                 Typical direction of request is from
                                 CEC to FEC.
                 NLM_F_ECHO      Echo this request. Typical direction of
                                 request is from CEC to FEC.

          Additional flag bits for GET requests on config information in
          the FEC.
                 NLM_F_ROOT     Return the complete table instead of a
                                single entry.
                 NLM_F_MATCH    Return all matching criteria passed in
                                message content
                 NLM_F_ATOMIC   Return an atomic snapshot of the table being
                                referenced. This may require special privileges
                                because it has the potential to interrupt
                                service in the FE for a longer time.

          Convenience macros for flag bits:
                 NLM_F_DUMP     This is NLM_F_ROOT or'ed with NLM_F_MATCH

          Additional flag bits for NEW requests
                 NLM_F_REPLACE   Replace existing matching config object with
                                 this request.
                 NLM_F_EXCL      Do not replace the config object if it already
                                 exists.
                 NLM_F_CREATE    Create config object if it does not already
                                 exist.
                 NLM_F_APPEND    Add to the end of the object list.


draft-jhsrha-forces-netlink2-00.txt                            ^L[Page 18]


jhsrha                               draft-jhsrha-forces-netlink2-00.txt


          For those familiar with BSDish use of such operations in route
          sockets, the equivalent translations are:

                    - BSD ADD operation equates NLM_F_CREATE or-ed
                      with NLM_F_EXCL
                    - BSD CHANGE operation equates NLM_F_REPLACE
                    - BSD Check operation equates NLM_F_EXCL
                    - BSD APPEND equivalent is actually mapped to
                      NLM_F_CREATE


          Sequence Number: 32 bits
          The sequence number of the message.

          Source PID: 32 bits
          The PID of the sender the message (unicast or logical PID).

          Destination PID: 32 bits The PID of the destination the message
          (unicast, logical, or broadcast PID).


4.3.4.  Mechanisms for Creating Protocols

     Mechanisms for reliable or non-reliable protocols creation are pro-
     vided. In addition, mechanisms for facilitating availability are
     embedded in Netlink2.


4.3.4.1.  Building Reliable Protocols

     By default the netlink2 header flags NLM_F_PRIO and NLM_F_ACK are
     not set so that Netlink2 messages are sent with a lower priority
     messages and do not require acknowledgements.

     One could create a reliable protocol between an FEC and a CEC by
     using the combination of sequence numbers, ACKs and retransmit
     timers. Both sequence numbers and ACKs are provided by Netlink2.
     Timers are provided by the operating system or hardware.

     Prioritization is an orthogonal mechanism to reliability. When a
     node runs out of resources, a message sent with a higher priority
     will get preferential treatment. For instance, if a FE has only
     enough memory to allocate one message in response to a message from
     the CE and it has to choose between one of two messages to respond
     to, then it will use that memory for the request which was sent
     with the higher priority. This also applies to other resources such


draft-jhsrha-forces-netlink2-00.txt                            ^L[Page 19]


jhsrha                               draft-jhsrha-forces-netlink2-00.txt


     as computing cycles and bandwidth. In other words, the NLM_F_PRIO
     is more than only the classical bandwidth prioritization of packets
     on a link.

     Another orthogonal mechanism provided by Netlink2 is the ACK strat-
     egy which is selected by the NLM_F_ASTR flag.

     We define two types of acknowledgement strategies:


     1) partial ACKs (using multicast ACK slotting and damping tech-
     niques [xtp]): receivers multicast an ACK after a random time if
     they have note yet seen an ACK sent by another receiver. This lim-
     its the number of ACKs returned to the source of the message and
     improves performance.  For messages which a CE sends to a group of
     FEs partial ACKs imply that anyone of the FEs generating an ACK
     back it is sufficient to deem the message was delivered.

     2) full ACKs: each receiver sends an ACK back to the source. This
     allows the source to immediately detect problems with receivers.
     In two-phase commits it is important that all FEs respond so that
     the full ACKs strategy should be used.


4.3.4.2.  Building Availability

     A protocol component or an application could passively listen to
     Netlink2 commands and events within one or several Netlink2 wires.
     Doing so allows a very simple way of building complex applications
     which are aware of all service components that affect them for HA
     reasons.

     To ensure transparent CE or FE redundancy for certain services, it
     is sufficient to ensure that the backup CEC/FEC is always attached
     to the same wires to which the active CEC/FEC is attached, so that
     the backup CEC/FEC receives all messages destined to the active
     CEC/FEC (whatever PID they are sent to) as well as all messages
     originating from the active CEC/FEC.

     One could create a heartbeat protocol between the FEC and CEC by
     using the ECHO flags and the NLMSG_NOOP message. The heartbeat, in
     addition to listening to FE or CE events, could be used to facili-
     tate takeover.


     This topic is beyond the scope of ForCES and will not be discussed
     further here. Note, however, that Netlink2 has the mechanisms


draft-jhsrha-forces-netlink2-00.txt                            ^L[Page 20]


jhsrha                               draft-jhsrha-forces-netlink2-00.txt


     required to enable this when required.


4.3.4.3.  The ACK Netlink2 Message


     This message is actually used to denote both an ACK and a NACK.
     Typically the direction is from FEC to CEC (in response to an ACK
     request message). However, CEC should be able to send ACKs back to
     FEC when requested. The semantics for this are IP service specific.

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       0               1               2               3
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                       Netlink2 message header                 |
      |                       type = NLMSG_ERROR                      |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                          error code                           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                       OLD Netlink2 message header             |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


     Error code: integer (typically 32 bits)

     An error code of zero indicates that the message is an ACK
     response.  An ACK response message contains the original Netlink2
     message header that can be used to compare against (sent sequence
     numbers, etc).

     A non-zero error code message is equivalent to a Negative ACK
     (NACK).  In such a situation, the Netlink2 data that was sent down
     to the kernel is returned appended to the original Netlink2 message
     header.


4.3.4.4.  Batching, Atomicity and Ordering of Transactions

     As mentioned earlier (repeated here for clarity) Standard Netlink
     multi-message batching looks as follows:

     NLMSG:NLMSG:NLMSG....

     where NLMSG is a Netlink2 header and its associated payload.


draft-jhsrha-forces-netlink2-00.txt                            ^L[Page 21]


jhsrha                               draft-jhsrha-forces-netlink2-00.txt


     This has the advantage of allowing inter-mixing of multiple com-
     mands (example adds/deletes) generally in a request from CE->FE.
     It is also useful for batching multiple events from the FE->CE.

     In a two-phase commit messages are bound into a relationship.  Typ-
     ically, the first and all following headers have the NLM_F_MULTI
     Netlink2 header flag set, except for the last header, which has the
     Netlink2 header type NLMSG_DONE. Typically, in netlink, the
     NLMSG_DONE shows up in separate PDUs to define a commit.

     Atomicity of a transaction including that of a batch is achieved by
     using the NLM_F_ATOMIC flag. Use of the NLM_F_ATOMIC is expensive
     because it may necessitate the locking of access to tables (depend-
     ing on the implementation.


5.  Protocol Architecture


     IP services are defined by using service templates.

     Refer to the Netlink document [Netlink] for the different templates
     used for IP services that fit within the current scope of the
     ForCES charter.

     ForCES in relation to NEs involves three phases: the Pre-Associa-
     tion phase, the association phase where the ForCES protocol oper-
     ates, and a termination phase where a party in the relationship
     leaves a bundle.

1)   The Pre-Association Phase

     In a simple setup, this phase is static. All the parameters for the
     association phase are well known (example multicast groups for each
     Netlink2 bundle and its wires, etc.).

     In the case of dynamic discovery, the FE Manager and the CE Manager
     agree on all the parameters and clearly articulate topology and
     other information to each other.


     Vendors may use their own proprietary service discovery protocol.
     As minimum, we assume a static configuration.

     On completion of the Service Discovery phase, the FEM will have
     established contact with the appropriate CEM component.  Initial-
     ization and Authentication will be complete at this point.  An FE


draft-jhsrha-forces-netlink2-00.txt                            ^L[Page 22]


jhsrha                               draft-jhsrha-forces-netlink2-00.txt


     is issued a service identifier which will be used for accounting,
     identification and authentication purposes. The identifier is
     translated as the PID in the association phase. The multicast and
     unicast addresses for communication are also known at this point.
     All capabilities may also have been discovered at this point.

2)   The Association Phase

     In this phase, the FE and CP components cooperate to deliver the IP
     service.  The CP component might be registered (in the pre-associa-
     tion phase) to receive FE-specific services (such as link events).
     Essentially, in this phase, the IP service is provisioned and exe-
     cuting.  The FE component might continuously get updates from the
     control plane component on how to operate the service (for example,
     the V4 forwarding route additions or deletions).

     The association phase is where Netlink2 operates as the ForCES pro-
     tocol.

     On startup, a SYN Netlink2 message with an ACK flag set is issued
     by the FE on the bundle(s) to which the FE is connected. The con-
     trolling CE will respond (given the ACK flag in the request) with
     either an ACK to imply that the FE has been accepted by the CE or a
     NACK, which is interpreted as a rejection of the FE by the CE. If
     no response is received within a timeout period a retry is
     attempted. After a configurable number of retries without response,
     it is assumed that a CE does not exist and control is handed to the
     FEM.

     The SYN state is followed by the synchronization phase where the FE
     is loaded with updates to tables.


3)   Service Termination

     Service termination could be issued by either component of the ser-
     vice abstraction.  Normally it will be issued by the FE component
     so that the latter does not continue to get billed for services.
     The FE component may also issue the termination message if it wants
     to change to a comparatively better CP service provider.

     FE or the CE initiating the termination will issue a BOOT command
     with a FIN extended flag. An ACK flag may be set if a response to
     the FIN is required.


draft-jhsrha-forces-netlink2-00.txt                            ^L[Page 23]


jhsrha                               draft-jhsrha-forces-netlink2-00.txt


6.  Putting together the base protocol for WG charter


7.  References


        [RFC1633]  R. Braden, D. Clark, and S. Shenker, "Integrated
     Services in the Internet Architecture: an Overview", RFC 1633,
     ISI, MIT, and PARC, June 1994.


        [RFC1812]  F. Baker, "Requirements for IP Version 4
     Routers", RFC 1812, June 1995.


        [RFC2475]  M. Carlson, W. Weiss, S. Blake, Z. Wang, D.
     Black, and E.  Davies, "An Architecture for Differentiated
     Services", RFC 2475, December 1998.


        [RFC2748] J. Boyle, R. Cohen, D. Durham, S. Herzog, R.
     Rajan, A. Sastry, "The COPS (Common Open Policy Service) Pro-
     tocol", RFC 2748, January 2000.


        [RFC2328] J. Moy, "OSPF Version 2", RFC 2328, April 1998.

        [RFC2844] T. Przygienda, P. Droz, R. Haas, "OSPF over ATM
     and Proxy-PAR", RFC 2844, May 2000.

        [RFC3358] T. Przygienda, "Optional Checksums in Intermedi-
     ate System to Intermediate System (ISIS)", RFC 3358, August
     2002.

        [RFC1157] J.D. Case, M. Fedor, M.L. Schoffstall, C. Davin,
     "Simple Network Management Protocol (SNMP)", RFC 1157, May
     1990.


        [RFC3036] L. Andersson, P. Doolan, N. Feldman, A. Fredette,
     B. Thomas "LDP Specification", RFC 3036, January 2001.


draft-jhsrha-forces-netlink2-00.txt                            ^L[Page 24]


jhsrha                               draft-jhsrha-forces-netlink2-00.txt


        [stevens] G.R Wright, W. Richard Stevens, "TCP/IP Illus-
     trated Volume 2, Chapter 20", June 1995.


        [netfilter] http://netfilter.samba.org

        [diffserv] http://diffserv.sourceforge.net

        [Netlink] J. H. Salim, H. Khosravi, A. Kleen, A. Kuznetsov,
     "Netlink as an IP Services Protocol", draft-ietf-forces-
     netlink-03.txt, June 2002.

        [forces_req] H. Khosravi, T. Anderson, "Requirements for
     Separation of IP Control and Forwarding", draft-ietf-forces-
     requirements-07.txt, October 2002.

        [xtp] XTP Forum, "Xpress Transport Protocol Specification,
     XTP Revision 4.0", March 1995.

8.  Author's  Address:

   Jamal Hadi Salim
   Znyx Networks
   Ottawa, Ontario
   Canada
   hadi@znyx.com

   Robert Haas
   IBM Research
   Zurich Research Laboratory
   Saeumerstrasse 4
   CH-8803 Rueschlikon
   Switzerland
   rha@zurich.ibm.com


9.  Appendix 1: Sample Service Hierarchy


     In the diagram below we show a simple IP service, foo, and the
     interaction it has between CP and FE components for the ser-
     vice(labels 1-3).

     The diagram is also used to demonstrate CP<->FE addressing. In this
     section we illustrate only the addressing semantics. In Appendix 2
     , the diagram is referenced again to define the protocol interac-
     tion between service foo's CEC and FEC (labels 4-10).


draft-jhsrha-forces-netlink2-00.txt                            ^L[Page 25]


jhsrha                               draft-jhsrha-forces-netlink2-00.txt


       CP
      [--------------------------------------------------------.
      |   .-----.                                              |
      |  |                        . --------.                 |
      |  |  CLI   |               /                           |
      |  |        |              | CP protocol |               |
      |         /->> -.         |  component  | <-.           |
      |    __ _/      |         |   For       |   |           |
      |                |         | IP service  |   ^           |
      |                Y         |    foo      |   |           |
      |                |          ___________/    ^           |
      |                Y   1,4,6,8,9 /  ^ 2,5,10   | 3,7       |
       --------------- Y------------/---|----------|-----------
                       |           ^    |          ^
                     **|***********|****|**********|**********
                     ************* Netlink2 layer ************
                     **|***********|****|**********|**********
             FE        |           |    ^          ^
             .-------- Y-----------Y----|--------- |----.
             |                    |              /     |
             |                    Y            /       |
             |          . --------^-------.  /         |
             |          |FE component/module|/          |
             |          |  for IP Service   |           |
      --->---|------>---|     foo           |----->-----|------>--
             |           -------------------            |
             |                                          |
             |                                          |
              ------------------------------------------


     The control plane protocol for IP service foo does the following to
     connect to its FE counterpart.  The steps below are also numbered
     in the diagram above.


1)   Connect to IP service foo through a socket connect. A typical con-
     nection would be via a call to: socket(AF_NETLINK, SOCK_RAW,
     NETLINK_FOO)

2)   Bind to listen to specific async events for service foo

3)   Bind to listen to specific async FE events


     Note that a wrapper socket can be created on top of the real sock-
     ets: depending on the dest PID given, it chooses the most


draft-jhsrha-forces-netlink2-00.txt                            ^L[Page 26]


jhsrha                               draft-jhsrha-forces-netlink2-00.txt


     appropriate socket to send the packet onto (if here are two multi-
     cast groups, one for all FEs, and one for all FEs and CEs, a packet
     from the CE to the FEs will use the first multicast group). The
     wrapper socket basically maps a message to the most appropriate
     wire in the bundle.

10.  Appendix 2: Sample Protocol for the foo IP Service


     Our proverbial IP service "foo" is used again to demonstrate how
     one can deploy a simple IP service control using Netlink2.

     These steps are continued from Appendix 1 (hence the numbering).

4)   query for current config of FE component

5)   receive response to 4) via channel on 3)

6)   query for current state of IP service foo

7)   receive response to 6) via channel on 2)

9)   register the protocol specific packets you would like the FE to
     forward to you

10)  send specific service foo commands and receive responses for them
     if needed


10.1.  Interacting with Other IP Services


     The diagram in Appendix 1 shows another control component configur-
     ing the same service. In this case, it is a proprietary Command
     Line Interface.  The CLI may or may not be using the Netlink proto-
     col to communicate with the foo component.  If the CLI should issue
     commands that will affect the policy of the FEC for service "foo",
     then the "foo" CEC is notified. It could then make algorithmic
     decisions based on this input. For example if an FE allowed another
     service to delete policies installed by a different service and a
     policy that foo installed was deleted by service bar, there might
     be a need to propagate this to all the peers of service "foo").

11.  Appendix 3: Examples


     In this example we show a simple configuration Netlink2 message
     sent from a TC CEC to an egress TC FIFO queue. This queue algorithm


draft-jhsrha-forces-netlink2-00.txt                            ^L[Page 27]


jhsrha                               draft-jhsrha-forces-netlink2-00.txt


     is based on packet counting and drops packets when the limit
     exceeds 100 packets.  We assume the queue is in hierarchical setup
     with a parent 100:0 and a classid of 100:1 and that it is to be
     installed on device with ifindex of 4.


      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
                      0               1               2             3
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |          Length             |         flags_e               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Type (RTM_NEWQDISC)           | Flags (NLM_F_EXCL |         |
      |                               |NLM_F_CREATE | NLM_F_REQUEST)|
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                      Sequence Number(arbitrary number)      |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                        source  PID                          |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                       destination  PID                      |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |Family(AF_INET)|  Reserved1    |         Reserved1           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                     Interface Index  (4)                    |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                      Qdisc handle  (0x1000001)              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                     Parent Qdisc   (0x1000000)              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                        TCM Info  (0)                        |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |            Type (TCA_KIND)   |           Length(4)          |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                        Value ("pfifo")                      |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |            Type (TCA_OPTIONS) |          Length(4)          |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                        Value (limit=100)                    |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


draft-jhsrha-forces-netlink2-00.txt                            ^L[Page 28]