ForCES Working Group                                    Jamal Hadi Salim
Internet Draft                                          Znyx Networks
Expires: December 2003                                  Robert Haas
                                                        IBM
                                                        Steven Blake
                                                        Ericsson
                                                        June 2003


                      Netlink2 as ForCES Protocol

                 <draft-jhsrha-forces-netlink2-01.txt>


Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.  Internet-Drafts are working
   documents of the Internet Engineering Task Force (IETF), its areas,
   and its working groups.  Note that other groups may also distribute
   working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as ``work in progress.''

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

Abstract

     This document describes Netlink2, which is an extension of Linux
     Netlink [Netlink].  This document is intended as a proposal for the
     ForCES IETF working group protocol.

     ForCES attempts to define a clear separation between the two enti-
     ties of the NE in order to have them evolve separately as opposed
     to the current monolithic evolution.

Conventions used in this document

     The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
     "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in
     this document are to be interpreted as described in [RFC-2119].


Salim/Haas/Blake          Expires December 2003                 [Page 1]


Internet-Draft         Netlink2 as ForCES Protocol             June 2003


1.  Introduction

     The concept of IP control and forwarding separation was first
     introduced in the early 1980s by the BSD 4.4 routing sockets
     [Stevens].  The focus at that time was to provide a simple IP(v4)
     forwarding service and allow the control plane, either via a com-
     mand line configuration tool or a dynamic route daemon, to control
     forwarding tables for that IPv4 forwarding service.

     The IP world has evolved considerably since then.  Linux Netlink
     [Netlink], when observed from a service provisioning and management
     point of view, takes routing sockets one step further by breaking
     the narrow focus on IPv4 forwarding.  Since the Linux 2.1 kernel,
     Netlink has been providing the IP service abstraction for a few
     additional services other than classical RFC 1812 IPv4 forwarding.

     Netlink was designed with a goal of solving the forwarding and con-
     trol separation.  This means that many of the main issues have been
     thought through and resolved over the years.  In other words
     Netlink is proven as a protocol addressing separation of forwarding
     and control.  Netlink is also network-ready because it uses packet
     formating techniques and concepts (e.g., multicast addressing).
     This, and the availability of publicly running and tested code
     which is widely deployed, form a major motivator to base Netlink2
     on Netlink.

     Netlink2 extends Linux Netlink to meet the requirements of the
     ForCES working group charter for a protocol.  Netlink is extended
     to have a distributed addressing and transport scheme, and missing
     mechanisms are added to make Netlink2 meet the ForCES protocol
     requirements [ForCES_REQ].

     Netlink2 operates in a mode where knowledge of the NE, its topol-
     ogy, and modeling MAY have already been discovered, or is discov-
     ered within the Netlink2 protocol.


2.  Definitions

     We use the definitions provided in [ForCES_REQ], as well as the
     following:

     Logical Functional Block (LFB): same as Forwarding Engine Compo-
     nents as defined in [Netlink].  This is a forwarding datapath com-
     ponent in the FE driven by the ForCES protocol in order to achieve
     a certain service.


Salim/Haas/Blake          Expires December 2003                 [Page 2]


Internet-Draft         Netlink2 as ForCES Protocol             June 2003


     Control Element Component (CEC): same as defined in Control Plane
     Component in [Netlink].  This is a component in the CE that drives
     LFB(s) in order to achieve a certain service.


3.  Netlink2 Overview

     An IP forwarding service accomplished by a FE is represented as a
     logical functional block (LFB) in the FE.  CE components (CEC) in
     the CE interact with LFBs over a Netlink2 bundle (described in Sec-
     tion 6.2) to execute a certain service.  The interactions between
     LFBs and CECs are proper to each service and are defined using tem-
     plates as presented in [Netlink].

     The Netlink2 message is used to communicate between the FE and CEC
     for configuration of the LFBs, asynchronous event notification of
     LFB events to the CECs, and statistics querying/gathering (typi-
     cally by a CEC).  Other activities include transfer of control
     packets between FE and CEC.

     For instance, the IPv4 Forwarding service (called NETLINK_ROUTE)
     defines a message template for handling IP routes and the message
     types to insert, remove, or query a route.  The routing CEC(s) and
     the IPv4 Forwarding LFB(s) interact using these message templates
     and message types over the Netlink2 bundle to execute the IPv4 For-
     warding service.  The message types in Netlink2 messages allow the
     FE to demultiplex messages to the appropriate LFB.

     Messages of a certain service destined to a LFB can travel on dif-
     ferent Netlink2 wires within the same bundle.  Note that a LFB can
     process messages from different bundles.

     Netlink2 by itself does not constitute a protocol, but rather a set
     of base mechanisms that can be utilized depending on service
     requirements.

     The interaction between the LFB and the CEC, as in the Netlink con-
     text, would define a protocol.  Netlink2 provides mechanisms for
     the CE Component and the FE Component to define their own protocol.
     The LFB might continuously get updates from the control-element
     component on how to operate the service (e.g., for IPv4 forwarding,
     or for route additions or deletions).

     Netlink2 messages and mechanisms are used to derive the protocol.
     For example: the LFB and CEC may choose to define a reliable or
     semi-reliable protocol between each other.  By default, however,
     Netlink2 transactions are unreliable.


Salim/Haas/Blake          Expires December 2003                 [Page 3]


Internet-Draft         Netlink2 as ForCES Protocol             June 2003


4.  Netlink2 Modifications to Netlink

     To conform to the ForCES requirements [ForCES_REQ], the Netlink
     protocol [Netlink] is extended in the following respects:

     1) Base header modifications

     2) Feature expandability extensions by means of optional header
     TLVs to accommodate current generic ForCES requirements and to make
     it possible to add more in the future.  This facilitates adding
     such features as authentication, checksumming, etc., when required.

     3) IP and Transport encapsulations to carry Netlink messages.

     With these complementary changes to the existing Netlink function-
     ality, Netlink2 fulfills the requirements to become the ForCES pro-
     tocol.


4.1.  Header Modifications

     1) PID field redefinition and addition

     In Netlink, PID 0 referred to the equivalent of the FE (kernel).
     The equivalent of the CE (user process) was referred by its OS pro-
     cess id.

     In Netlink2 a PID of the unicastPID type is assigned to each FE and
     CE in the pre-association phase.  In this way the CE uniquely iden-
     tifies the FE and avoids any collision.  We maintain the name PID
     for historical purposes.

     - Destination PID: the PID field is redefined as the Destination
     PID field.  This field identifies the parties on the wire that must
     process the message.

     - Source PID: this field is introduced in the header to identify
     the source of the message.

     Different types of PIDs are discussed in Section 6.3*.

     2) The Length field has been reduced to 16 bits, with length 0
     being reserved.  The rest of the old 32-bit Length field is now
     split between a new version field and a new extended flags field.

     3) A Version field is introduced in the Netlink2 header.  This
     8-bit field is 4 bits major number and 4 bits minor number in the
     form of major:minor.  For Netlink2, this becomes: 0x20.


Salim/Haas/Blake          Expires December 2003                 [Page 4]


Internet-Draft         Netlink2 as ForCES Protocol             June 2003


     4) A new Extended Flags field is introduced to take over the
     remaining 8 bits from the 16-bits taken from the original 32-bit
     Length field in Netlink.  Turning different bits on enables addi-
     tional new features such as proclaiming the presence of extended
     TLVs, etc.  Extended Flags also introduce the concept of a SYN mes-
     sage which is issued by the FE as the first message after the pre-
     association phase to indicate its presence.  Also, a FIN flag is
     issued last to indicate the departure of the FE.

     5) Netlink2-specific TLVs follow directly after the Netlink2 base
     header.  They are optional and their presence is indicated only by
     an extended flag bit.  Typical use of Netlink2-specific TLVs is to
     compensate for capabilities lacking in a underlying transport.  For
     example, in an IP network not deployed with IPSEC, the
     Netlink2-specific authentication TLV could be used to emulate the
     features provided by IPSEC-AH.

     Other than these changes, all mechanisms provided by Netlink are
     sufficient to meet the requirements for ForCES.  The reader is
     encouraged to refer to [Netlink] as a companion to this one.


4.2.  Addressing and Transport Extensions

     1) Support for UDP/TCP/SCTP transport over unicast/multicast IP
     (Section 6.1).

     2) Support for bundles (Section 6.2).

     3) Message recipient scoping using the Destination PID (Section
     6.3).

     4) Support for both local scope and global scope addressing (Sec-
     tions 6.4 and 6.5).


5.  Netlink2 Message Format

     There are three mandatory levels to a Netlink2 message: The general
     Netlink message header, the IP-service-specific template, and the
     IP-service-specific data.  Netlink2-specific TLVs and IP-service-
     specific TLVs are optional.


Salim/Haas/Blake          Expires December 2003                 [Page 5]


Internet-Draft         Netlink2 as ForCES Protocol             June 2003


       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      |                   Netlink2 message header                     |
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      |                   Netlink2-specific TLVs (optional)           |
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      |                  IP Service Template                          |
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      |                  IP-Service-specific data in TLVs             |
      |                          (optional)                           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


     The Netlink2 message header is generic for all services, whereas
     the IP Service Template header is specific to a service.  Each IP
     Service then carries configuration parameters (CEC->LFB direction)
     or query responses (LFB->CEC direction).  These parameters are in
     (Type-Length-Value) TLV format and unique to the particular ser-
     vice.

     Note that we maintain the same IP Service Templates as in Netlink,
     i.e., nothing has changed here.


5.1.  Netlink2 Message Header

     Netlink2 messages are laid out exactly the same as Netlink mes-
     sages.  Each Netlink2 message contains a byte stream with a
     Netlink2 header followed by its associated payload.

     A single PDU may contain more than one Netlink2 message.  This is
     referred to as batching.  Netlink batching is reused in Netlink2
     and allows for messages with different commands (such as adding
     routes and deleting a QoS policy) to be carried in the same batch
     message.

     A Netlink2 message may be split across multiple PDUs if it does not
     fit into the PDU.  This is referred to as a multipart Netlink2 mes-
     sage and is also inherited from Netlink.


Salim/Haas/Blake          Expires December 2003                 [Page 6]


Internet-Draft         Netlink2 as ForCES Protocol             June 2003


     For multipart messages, the first and all following headers have
     the NLM_F_MULTI Netlink header flag set, except for the last
     header, which has the Netlink header type NLMSG_DONE.

     The Netlink2 message header is shown below.


     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
                     0               1               2             3
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |    Version    |   Flags_E     |             Length            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |             Type              |             Flags             |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                        Sequence Number                        |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                          Source PID                           |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                        Destination PID                        |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                         Optional TLVs                         |
    ~                                                               ~
    ~                                                               ~
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


   The fields in the header are:

          Version: 8 bits
          The version field is split into major:minor (4:4 bits) sub-
          fields.  The value for Netlink2 is 0x20.

          Flags_E: 16 bits
          These are extended flags:
                   NLM_F_SYN   Set on the first message.
                               Interpreted as a boot message.
                   NLM_F_FIN   Set on the last message.
                               Interpreted as a departure message.
                   NLM_F_ETLV  Set to indicate presence of extended
                               TLVs.
                   NLM_F_PRIO  Message priority:
                               1 for high and 0 for low.  Additional
                               QoS level set in QOS TLV.
                   NLM_F_ASTR  Set the ACK strategy: 1 for partial
                               ACKs and 0 for full ACKs


Salim/Haas/Blake          Expires December 2003                 [Page 7]


Internet-Draft         Netlink2 as ForCES Protocol             June 2003


          Length: 16 bits
          The length of the Netlink2 message in bytes including the
          header.

          Type: 16 bits
          This field describes the message content.
          It can be one of the standard message types:
               NLMSG_NOOP  message is ignored
               NLMSG_ERROR the message signals an error and the
                           payload contains a nlmsgerr structure.
                              This can be looked at as a NACK and
                              typically it is from LFB to CEC.
               NLMSG_DONE  message terminates a multipart message

          Individual IP Services specify more message types, for e.g.,
          NETLINK_ROUTE Service specifies several types such as
          RTM_NEWLINK, RTM_DELLINK, RTM_GETLINK, RTM_NEWADDR,
          RTM_DELADDR, RTM_NEWROUTE, RTM_DELROUTE, etc.

          Flags: 16 bits
          The standard flag bits used in Netlink are:
                 NLM_F_REQUEST   Must be set on all request messages
                                 (typically from CE to FE)
                 NLM_F_MULTI     Indicates the message is part of a
                                 multipart message terminated by
                                 NLMSG_DONE
                 NLM_F_ACK       Request for an acknowledgment on
                                 success.  Typical direction of request
                                 is from  CEC to LFB.
                 NLM_F_ECHO      Echo this request.  Typical direction of
                                 request is from CEC to LFB.

          Additional flag bits for GET requests on config information in
          the LFB:
                 NLM_F_ROOT     Return the complete table instead of a
                                single entry.
                 NLM_F_MATCH    Return all matching criteria passed in
                                message content
                 NLM_F_ATOMIC   Return an atomic snapshot of the table
                                being referenced.  This may require
                                special privileges because it has the
                                potential to interrupt service in the FE
                                for a longer time.

          Convenience macros for flag bits:
                 NLM_F_DUMP     This is NLM_F_ROOT or'ed with
                                NLM_F_MATCH


Salim/Haas/Blake          Expires December 2003                 [Page 8]


Internet-Draft         Netlink2 as ForCES Protocol             June 2003


          Additional flag bits for NEW requests:
                 NLM_F_REPLACE   Replace existing matching config object
                                 with this request.
                 NLM_F_EXCL      Do not replace the config object if it
                                 already exists.
                 NLM_F_CREATE    Create config object if it does not
                                 already exist.
                 NLM_F_APPEND    Add to the end of the object list.

          For those familiar with BSDish use of such operations in route
          sockets, the equivalent translations are:

                    - BSD ADD operation equates NLM_F_CREATE or-ed
                      with NLM_F_EXCL
                    - BSD CHANGE operation equates NLM_F_REPLACE
                    - BSD Check operation equates NLM_F_EXCL
                    - BSD APPEND equivalent is actually mapped to
                      NLM_F_CREATE


          Sequence Number: 32 bits
          The sequence number of the message.

          Source PID: 32 bits
          The PID of the sender of the message (unicast or logical PID).

          Destination PID: 32 bits
          The PID of the destination of the message (unicast, logical, or broadcast PID).


5.2.  Netlink2-specific TLVs

5.2.1.  Authentication

     [TBD]

5.2.2.  Checksum


     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | TLV Type =12  | TLV Length =2 |       Checksum (16 bits)      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

     This TLV is optional.  To compute the correct checksum, an imple-
     mentation MUST add the optional checksum TLV to the Netlink2 mes-
     sage with the initial checksum value of 0 and compute the checksum


Salim/Haas/Blake          Expires December 2003                 [Page 9]


Internet-Draft         Netlink2 as ForCES Protocol             June 2003


     over such a Netlink2 message.  Refer to [RFC3358] for details on
     the Checksum TLV.

5.2.3.  Message Priority


     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | TLV Type =13  | TLV Length =2 |      Priority  (16 bits)      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

     This TLV is optional.  It is used if the network does not support
     prioritization.  This field is used to indicate priorities to the
     remote end.


6.  Addressing and Transport Extensions

     We extend Netlink to make it distributed.  The focus is on making
     Netlink2 have a strong local scope view of the world while fitting
     well into a global scope when the hop distance between the FE and
     CE increases.

     If the network interconnecting the FE(s) and CE(s) is completely
     hidden from the outside (black-box view), for instance an internal
     Ethernet segment or a switching fabric in which CE(s) and FE(s) are
     connected within physical proximity, then communications between FE
     and CE are assumed to be of a local scope.  On the other hand, if
     communications between FE and CE cross parts of the network that
     are not hidden from the outside, communications are considered to
     be of global scope.


6.1.  Transport Methods

     The ideal environment for Netlink2 is considered to be a multicast-
     capable medium with IP above it and with UDP/TCP/SCTP running over
     IP.

     Netlink2 will run over non-IP, non-multicast-capable environments;
     however, it will require extra processing and messaging by the
     ForCES layer to compensate for services that IP already offers.

6.1.1.  Why Multicast?

     Multicast is considered important to facilitate one-to-many/some
     communication.  For example, a single command from a CE can be


Salim/Haas/Blake          Expires December 2003                [Page 10]


Internet-Draft         Netlink2 as ForCES Protocol             June 2003


     multicast to multiple FEs, which eases the scalability requirements
     mentioned in [ForCES_REQ].  This is discussed in later sections.

     When running Netlink2 over non-multicast-capable media, it is
     expected that mechanisms similar to those used in OSPF NBMA
     [RFC2328] networks will be put in place.

6.1.2.  Why IP?

     IP runs on virtually every link layer. Leveraging this fact alone
     helps deploying the protocol wider and faster.

     IP also provides numerous services such as fragmentation and
     reassembly, prioritization, and security, which are inherent
     requirements for the ForCES protocol.  This means that to success-
     fully run an alternative to IP requires that similar services be
     provided by whatever is underneath in order to meet the require-
     ments.

     Netlink2-specific optional TLVs can be used to compensate for lack-
     ing functionality if running on network transport other than IP or
     directly on the link layer.

     Netlink already allows the definition of multipart messages with IP
     segmenting/reassembling when the path MTU is exceeded.  When run-
     ning on top of non-IP media, the Netlink2 message can be limited to
     not exceed the MTU; the multipart messages facility can be then be
     used to provide framing for segmenting/reassembling.

     Netlink2-specific Authentication TLV can be used to carry authenti-
     cation signatures in a medium that does not have this capability.

     Netlink2-specific Checksum TLV can be used to carry checksums in a
     medium that does not have this capability.

     Netlink2-specific Message Priority TLV can be used to carry priori-
     tization if transports are not capable of making priorities in
     their headers.

6.1.3.  Why UDP/TCP/SCTP?

     On a local scope, it is assumed that multicast UDP over IP is the
     preferred mode of operation.

     On a global scope it is expected that TCP or SCTP would be used for
     enhanced reliability and internet congestion friendliness.


Salim/Haas/Blake          Expires December 2003                [Page 11]


Internet-Draft         Netlink2 as ForCES Protocol             June 2003


     All three protocols provide 16-bit ports, which are further
     address-demultiplexing points.  Also, all three protocols provide
     checksum capability to enhance integrity of the Netlink2 message.
     In the case of UDP, the checksum is optional (which fits the model
     that the local scope is less error-prone than global scope and
     hence the integrity check could be turned on only when needed).


6.2.  The Netlink2 wire and bundle

     A Netlink2 wire displays the same behavior as a Netlink wire.  It
     interconnects FEs and CEs in order to support services they jointly
     offer.

     The only conceptual difference between a Netlink2 wire and a
     Netlink wire is that whereas the Netlink wire is localized, the
     Netlink2 wire is distributed.

     We also introduce the concept of a Netlink2 bundle.  A Netlink2
     bundle interconnects a set of FE(s) and/or CE(s) by means of one or
     more Netlink2 wires.  Note that a Netlink2 bundle does not neces-
     sarily mean a full-mesh interconnection (see examples later on).

     Parties (FEs and CEs) on a Netlink2 bundle share a common configu-
     ration, provisioning and event-notification end goals.

     A Netlink2 wire MAY be constructed using a multicast connection or
     a unicast connection or a multiple number of multicast and unicast
     connections.  A wire MUST belong to only one bundle.  A bundle may
     have only a single wire (unicast or multicast).  In most cases we
     believe there will only be one multicast address for a bundle,
     although scalability issues could require the use of unicast con-
     nections in addition.

     When a multicast IP address is used, a Netlink2 wire MUST run over
     UDP - a UDP port is used to uniquely identify the wire.  There MAY
     be multiple wires using the same multicast address as long as they
     run over different UDP ports.

     When a unicast IP address is used, the description of how to con-
     nect to an endpoint (CE/FE) is subject to the agreement between the
     CE and FE.  The connection could be directly over IP (do we need an
     IP protocol number?) or via transport-layer ports (TCP/UDP/SCTP).

     In both unicast and multicast wires, the necessary parameters (such
     as IP address and port numbers) can be discovered by the involve-
     ment of the FE and CE Managers.


Salim/Haas/Blake          Expires December 2003                [Page 12]


Internet-Draft         Netlink2 as ForCES Protocol             June 2003


6.2.1.  What wires go in a bundle?

     Netlink2 provides flexibility to have a bundle of purely unicast
     wires or multicast wires or a hybrid of both.  The decision of what
     goes into a bundle can be made in the pre-association phase.

     A good analogy is to think of a multicast wire as a broadcast link
     (as is done in Netlink) in which CE(s) and FE(s) are parties
     attached to that broadcast link.

     Depending on the number of FEs and CEs on an NE, a choice of a sin-
     gle multicast wire in the bundle may be sufficient.  Multicast
     allows one-to-some messagging.  A single message sent by an origi-
     nator is seen by all parties on the wire.  This simplifies synchro-
     nization in an HA environment as well as implementation of the pro-
     tocol.

     The fact that multicast messages are seen by all parties could
     cause scalability issues as the number of nodes grows.  Parties
     need to filter out messages not designated for them if they are not
     the destination.  This can take compute or table resources if fil-
     tering is done in hardware.  The extra messages also consume unnec-
     essary bandwidth for FE(s) and CE(s) not interested in seeing these
     messages.

     Unicast wires could be used to create point-to-point connections
     between the parties; when every party is connected to every other
     party, then this becomes a full mesh.

     A full unicast mesh topology removes the need to filter the unnec-
     essary messages but introduces scalability concerns as the number
     of connections required grows quadratically with the number of par-
     ties (FEs and CEs) present.  This requires a lot more compute and
     state information to be maintained at each party.  A pure mesh
     topology also complicates HA because more state must be maintained
     (for instance, the IP addresses of the CEs and FEs that are active
     and what their backups are) and therefore needs to perform extra
     processing to achieve failover.  This remains transparent if multi-
     cast is used among all parties.

     Netlink2 allows a bundle to have a hybrid of unicast and multicast
     connections.  Note this is a model used by other protocols such as
     OSPF over broadcast links where the Hello protocol is multicast but
     responses to LSA updates are unicasted.

     We present some examples of Netlink2 bundles:


Salim/Haas/Blake          Expires December 2003                [Page 13]


Internet-Draft         Netlink2 as ForCES Protocol             June 2003


     1) A trivial case is a Netlink2 bundle consisting of a single uni-
     cast wire between the CE and FE it interconnects.

     2) Multiple FEs and a CE could be interconnected with a Netlink2
     bundle using a single multicast connection.

     3) In the same example as 2) above, the unicast address of the CE
     could in addition also be used, for instance, to deliver acknowl-
     edgments or notifications from the FEs to the CE, and not be seen
     by all other FEs.  The unicast addresses of the FEs could also be
     used, for instance, to deliver certain messages only to a specific
     FE, such as a retransmission of a message in a two-phase commit
     only to an FE that did not respond.

      4) Multiple FEs and CEs could use a wire with two multicast con-
     nections: one for all FEs, the other for all CEs, so that messages
     only relevant to FEs are not seen by CEs and vice-versa.


6.3.  Redefining the Netlink PID Semantics

     We maintain the name PID for historical purposes and introduce a
     Destination PID and a Source PID as mentioned earlier.

     For every message received by each party on the wire, the destina-
     tion PID field indicates the recipient of the message.  The
     addressed party could be either a FE or a CE, respectively a LFB or
     a CEC.

     In addition to Netlink2 wires (unicast or multicast) defining the
     destination of a particular message delivered, the PID types pro-
     vide further control, namely to define which entity actually has to
     process the message.  So if the bundle uses only a single multicast
     wire, messages will be heard by all parties on the wire, but only
     those with a matching PID will actually process these messages.  We
     introduce special- purpose PIDs addressed to specific listeners on
     the wire.

     The following types of PIDs are defined and can be used in the
     Netlink2 messages.  The actual values for the PID of a FE or CE
     must be the same across all wires of the same bundle and must be
     established during the pre-association phase.

     Default values are given.  PIDs must be unique within a Netlink2
     wire.  They may also be unique within the NE. PIDs are subdivided
     into two 16-bit subfields named wire and party in the form
     wire:party.


Salim/Haas/Blake          Expires December 2003                [Page 14]


Internet-Draft         Netlink2 as ForCES Protocol             June 2003


     1) unicastPID: allows one to uniquely address a FE or CE.  Each
     FE/CE must have such a unicast PID.  Only the FE or CE assigned to
     this PID must process an incoming message with such a Destination
     PID.  Other parties MAY silently discard the message. The wire sub-
     field is a unique identifier of the FE or CE. The party subfield
     acts as a port number: it can for instance be used to further
     demultiplex a message to the appropriate process in a CE (CEC) or
     the appropriate LFB in an FE.

     Default value: none.

     2) logicalPID: in addition to unicastPID, a FE/CE MAY have zero or
     more logical PIDs assigned to it.  A logicalPID can be used for
     active-backup pairs of FEs: for instance, the active and the backup
     FE have the same logical PID or at least the same wire subfield.
     The wire subfield is an identifier of the group of FEs and/or CEs
     participating in the group. Pre-association configuration ensures
     that the same party identifier is not assigned twice to different
     CECs or LFBs on the same wire.

     Default value: none.

     3) broadcastPID: all parties on all wires must process an incoming
     message with such a Destination PID.  An example of a message that
     might be broadcast is when a CE is brought down for maintenance.
     Default value: 0xffffffff

     4) FEbroadcastPID: all FEs on all wires must process an incoming
     message with such a Destination PID.  Typically a route update from
     the CE to all FEs.  Other parties (CEs) can silently discard the
     message.

     Default value: 0xffffefff

     5) CEbroadcastPID: all CEs on all wires must process an incoming
     message with such a Destination PID.  Other parties (FEs) can
     silently discard the message.

     Default value: 0xffffdfff

     A Netlink2 message must have as Destination PID one of the PIDs
     types defined above.  The Source PID of a Netlink message must be
     of the unicastPID or logicalPID type.  In addition, if the
     NLM_F_ACK flag is set, then every party processing the message MUST
     reply with an acknowledgment after processing the message, unless
     the NLM_F_ASTR flag is used to prevent ACK implosion.


Salim/Haas/Blake          Expires December 2003                [Page 15]


Internet-Draft         Netlink2 as ForCES Protocol             June 2003


     Pre-configured translation tables are used to map a given PID into
     the underlying wire in a bundle, i.e., an IP unicast or multicast
     address.


6.4.  Local Scope Addressing and Encapsulation

     At a local scope, the addressing used for a wire is a UDP port on
     top of a multicast IP address.

     Multiple wires can run on one multicast address with further demul-
     tiplex level based on the UDP port.

     The wire addressing parameters MAY be discovered during the pre-
     association phase.


6.5.  Global Scope Addressing and Encapsulation

     When addressing a non-local scope the Netlink2 message is encapsu-
     lated over a transport header and shuttled to the remote end where
     it is decapsulated and run as if originating from the local scope
     of that remote end.  The global scope addressing could use any
     transport protocol configured (SCTP, UDP or TCP) as agreed upon in
     the pre-association phase.

     This can be viewed as extensions of the local scope wires.


7.  Protocol Architecture

7.1.  Protocol Phases

     ForCES in relation to NEs involves three phases: the Pre-Associa-
     tion phase, the association phase where the ForCES protocol oper-
     ates, and a termination phase where a party in the relationship
     leaves a bundle.

7.1.1.  The Pre-Association Phase

     In a simple setup, this phase is static.  All the parameters for
     the association phase are well known (example multicast groups for
     each Netlink2 bundle and its wires, etc.).


Salim/Haas/Blake          Expires December 2003                [Page 16]


Internet-Draft         Netlink2 as ForCES Protocol             June 2003


     In the case of dynamic discovery, the FE Manager and the CE Manager
     agree on all the parameters and clearly articulate topology and
     other information to each other.

     Vendors may use their own proprietary service discovery protocol.
     As minimum, we assume a static configuration.

     On completion of the Service Discovery phase, the FEM will have
     established contact with the appropriate CEM component.  Initial-
     ization and Authentication will be complete at this point.  An FE
     is issued a service identifier which will be used for accounting,
     identification and authentication purposes.  The identifier is
     translated as the PID in the association phase.  The multicast and
     unicast addresses for communication are also known at this point.
     All capabilities may also have been discovered at this point.

7.1.2.  The Association Phase

     In this phase, the FE and CP components cooperate to deliver the IP
     service.  The CP component might be registered (in the pre-associa-
     tion phase) to receive FE-specific services (such as link events).
     Essentially, in this phase, the IP service is provisioned and exe-
     cuting.  The FE component might continuously get updates from the
     control plane component on how to operate the service (for example,
     the V4 forwarding route additions or deletions).

     The association phase is where Netlink2 operates as the ForCES pro-
     tocol.

     On startup, a SYN Netlink2 message with an ACK flag set is issued
     by the FE on the bundle(s) to which the FE is connected.  The con-
     trolling CE will respond (given the ACK flag in the request) with
     either an ACK to imply that the FE has been accepted by the CE or a
     NACK, which is interpreted as a rejection of the FE by the CE.  If
     no response is received within a timeout period a retry is
     attempted.  After a configurable number of retries without
     response, it is assumed that a CE does not exist and control is
     handed to the FEM.

     The SYN state is followed by the synchronization phase where the FE
     is loaded with updates to tables.

7.1.3.  Service Termination

     Service termination could be issued by either component of the ser-
     vice abstraction.  Normally it will be issued by the FE component
     so that the latter does not continue to get billed for services.
     The FE component may also issue the termination message if it wants


Salim/Haas/Blake          Expires December 2003                [Page 17]


Internet-Draft         Netlink2 as ForCES Protocol             June 2003


     to change to a comparatively better CP service provider.

     FE or the CE initiating the termination will issue a BOOT command
     with a FIN extended flag.  An ACK flag may be set if a response to
     the FIN is required.


7.2.  Protocol Logical Model

     In the diagram below we show a simple LFB<->CEC logical relation-
     ship.  We use the IPv4 Forwarding LFB as an example.


                              CE-----------------------------------
                              |    /^^^^^       /^^^^^           |
                              |   |       |     / CEC-2           |
                              |   | CEC-1 |     | COPS  |          |
                              |   | ospfd |     |  PEP  |          |
                              |          /      _____/           |
                              |    _____/           |             |
                              |        |             |             |
                           ****************************************|
                           ************* NETLINK2 BUNDLE ***********
              FE---------- *****************************************.
              |       IPv4 Forwarding|    |           |             |
              |       LFBs           |    |           |             |
              |       --------------/ ----|-----------|--------     |
              |       |            /      |           |       |     |
              |       |     .-------.  .-------.   .------.   |     |
              |       |     |ingress|  | IPv4  |   |Egress|   |     |
              |       |     |police |  |Forward|   | QoS  |   |     |
              |       |     |_______|  |_______|   |Sched |   |     |
              |       |                             ------    |     |
              |        ---------------------------------------      |
              |                                                     |
               -----------------------------------------------------


     Netlink2 logically models LFBs and CECs in the form of service
     blocks interconnected to each other via a Netlink2 bundle.

     Acknowledgements and responses to messages do not have to be sent
     onto the same wire from which the triggering messages came from but
     MUST be sent on the same bundle to the same originating PID.  For
     instance, a wire interconnecting a CE with multiple FEs using a
     multicast address could be used to send route updates from the CE.
     On the other hand, independent unicast wires from each FE to the CE
     could be used to send back route events or acknowledgments.  Note


Salim/Haas/Blake          Expires December 2003                [Page 18]


Internet-Draft         Netlink2 as ForCES Protocol             June 2003


     that sequencing is done per wire and Source PID, and ACKs can
     travel back on any wire of a bundle.  The Netlink2 wire can be
     shared or be specific to a service.  There can be multiple Netlink2
     wires bundled in a bundle carrying messages of the same service.
     In order to reduce (for example to avoid extra processing) or
     restrict the messaging accessible for partitioning or security rea-
     sons, additional Netlink2 wires can be used.  A possible partition-
     ing is a Netlink2 bundle per service.  In the example above the
     IPv4 Forwarding LFB would be considered a service.

     Assuming capabilities have been discovered during the pre-associa-
     tion phase (between the FEM and CEM), blocks (CECs or LFBs as
     illustrated above) connect to the agreed wires on the Netlink2 bun-
     dle, and listen to receive specific messages.  CECs may connect to
     multiple Netlink2 wires if it helps them to control the service
     better.  All blocks (CECs and LFBs) dump packets on the Netlink2
     wires.

     LFBs or CECs join Netlink2 wires and listen to messages of interest
     for processing or monitoring purposes.

     All messages addressed to the LFB (for example the IPv4 forwarding
     LFB illustrated above) will have the FE PID agreed upon by both the
     CE and the FE at the pre-association phase.

     LFBs (as well as CECs) also process message with the broadcast
     PIDs.  They may also process messages destined to other LFBs (as
     well as CECs) for availability synchronization purposes.

     A further demultiplexing point is the command type in the Netlink2
     message.  Each of the LFBs (e.g., the ingress police LFB above)
     knows how to respond to a specific command-set as defined by the
     Netlink2 message type.


7.3.  Service Addressing

     Connecting to a service is achieved by connecting to a defined
     Netlink2 bundle by both the CEC and LFB.  This Netlink2 bundle is
     derived in the pre-association phase.

     A service would typically be related to a specific Netlink2 bundle.
     Command types would be used to configure different LFBs.  This
     allows reuse of the 16-bit command type with every new bundle.

     Connecting to a service is followed (at any point during the life-
     time of the connection) by either issuing a service-specific com-
     mand mostly for configuration purposes (from the CEC to the LFB) or


Salim/Haas/Blake          Expires December 2003                [Page 19]


Internet-Draft         Netlink2 as ForCES Protocol             June 2003


     for statistics collection.  The LFB could also send event announce-
     ments to the CEC or respond or ACK queries issued by the CEC.


7.4.  IP Service Templates

     IP services are defined by using service templates.

     Refer to the Netlink document [Netlink] for the different templates
     used for IP services that fit within the current scope of the
     ForCES charter.


7.5.  Mechanisms for Creating Protocols

     Mechanisms for reliable or non-reliable protocols creation are pro-
     vided.  In addition, mechanisms for facilitating availability are
     embedded in Netlink2.

7.5.1.  Building Reliable Protocols

     By default the Netlink2 header flags NLM_F_PRIO and NLM_F_ACK are
     not set so that Netlink2 messages are sent with a lower priority
     messages and do not require acknowledgements.

     One could create a reliable protocol between an LFB and a CEC by
     using the combination of sequence numbers, ACKs and retransmit
     timers.  Both sequence numbers and ACKs are provided by Netlink2.
     Timers are provided by the operating system or hardware.

     Prioritization is an orthogonal mechanism to reliability.  When a
     node runs out of resources, a message sent with a higher priority
     will get preferential treatment.  For instance, if a FE has only
     enough memory to allocate one message in response to a message from
     the CE and it has to choose between one of two messages to respond
     to, then it will use that memory for the request which was sent
     with the higher priority.  This also applies to other resources
     such as computing cycles and bandwidth.  In other words, the
     NLM_F_PRIO is more than only the classical bandwidth prioritization
     of packets on a link.

     Another orthogonal mechanism provided by Netlink2 is the ACK strat-
     egy which is selected by the NLM_F_ASTR flag.

     We define two types of acknowledgement strategies:

     1) partial ACKs (using multicast ACK slotting and damping tech-
     niques [XTP]): receivers multicast an ACK after a random time if


Salim/Haas/Blake          Expires December 2003                [Page 20]


Internet-Draft         Netlink2 as ForCES Protocol             June 2003


     they have note yet seen an ACK sent by another receiver.  This lim-
     its the number of ACKs returned to the source of the message and
     improves performance.  For messages which a CE sends to a group of
     FEs partial ACKs imply that anyone of the FEs generating an ACK
     back it is sufficient to deem the message was delivered.

     2) full ACKs: each receiver sends an ACK back to the source.  This
     allows the source to immediately detect problems with receivers.
     In two-phase commits it is important that all FEs respond so that
     the full ACKs strategy should be used.

7.5.2.  Building Availability

     A protocol component or an application could passively listen to
     Netlink2 commands and events within one or several Netlink2 wires.
     Doing so allows a very simple way of building complex applications
     which are aware of all service components that affect them for HA
     reasons.

     To ensure transparent CE or FE redundancy for certain services, it
     is sufficient to ensure that the backup CEC/LFB is always attached
     to the same wires to which the active CEC/LFB is attached, so that
     the backup CEC/LFB receives all messages destined to the active
     CEC/LFB (whatever PID they are sent to) as well as all messages
     originating from the active CEC/LFB.

     One could create a heartbeat protocol between the LFB and CEC by
     using the ECHO flags and the NLMSG_NOOP message.  The heartbeat, in
     addition to listening to FE or CE events, could be used to facili-
     tate takeover.

     This topic is beyond the scope of ForCES and will not be discussed
     further here.  Note, however, that Netlink2 has the mechanisms
     required to enable this when required.


7.5.3.  The ACK Netlink2 Message

     This message is actually used to denote both an ACK and a NACK.
     Typically the direction is from LFB to CEC (in response to an ACK
     request message).  However, CEC should be able to send ACKs back to
     LFB when requested.  The semantics for this are IP service spe-
     cific.


Salim/Haas/Blake          Expires December 2003                [Page 21]


Internet-Draft         Netlink2 as ForCES Protocol             June 2003


       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       0               1               2               3
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                       Netlink2 message header                 |
      |                       type = NLMSG_ERROR                      |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                          error code                           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                       OLD Netlink2 message header             |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


     Error code: integer (typically 32 bits)

     An error code of zero indicates that the message is an ACK
     response.  An ACK response message contains the original Netlink2
     message header that can be used to compare against (sent sequence
     numbers, etc).

     A non-zero error code message is equivalent to a Negative ACK
     (NACK).  In such a situation, the Netlink2 data that was sent down
     to the kernel is returned appended to the original Netlink2 message
     header.

7.5.4.  Batching, Atomicity and Ordering of Transactions

     As mentioned earlier (repeated here for clarity) Standard Netlink
     multi-message batching looks as follows:

     NLMSG:NLMSG:NLMSG....

     where NLMSG is a Netlink2 header and its associated payload.

     This has the advantage of allowing inter-mixing of multiple com-
     mands (example adds/deletes) generally in a request from CE->FE.
     It is also useful for batching multiple events from the FE->CE.

     In a two-phase commit messages are bound into a relationship.  Typ-
     ically, the first and all following headers have the NLM_F_MULTI
     Netlink2 header flag set, except for the last header, which has the
     Netlink2 header type NLMSG_DONE.  Typically, in netlink, the
     NLMSG_DONE shows up in separate PDUs to define a commit.

     Atomicity of a transaction including that of a batch is achieved by
     using the NLM_F_ATOMIC flag.  Use of the NLM_F_ATOMIC is expensive
     because it may necessitate the locking of access to tables


Salim/Haas/Blake          Expires December 2003                [Page 22]


Internet-Draft         Netlink2 as ForCES Protocol             June 2003


     (depending on the implementation.


8.  Putting together the base protocol for WG charter

     [TBF]


9.  References

     [RFC1633]  R. Braden, D. Clark, and S. Shenker, "Integrated Ser-
     vices in the Internet Architecture: an Overview", RFC 1633, ISI,
     MIT, and PARC, June 1994.

     [RFC1812]  F. Baker, "Requirements for IP Version 4 Routers", RFC
     1812, June 1995.

     [RFC2475]  M. Carlson, W. Weiss, S. Blake, Z. Wang, D. Black, and
     E.  Davies, "An Architecture for Differentiated Services", RFC
     2475, December 1998.

     [RFC2748] J. Boyle, R. Cohen, D. Durham, S. Herzog, R. Rajan, A.
     Sastry, "The COPS (Common Open Policy Service) Protocol", RFC 2748,
     January 2000.

     [RFC2328] J. Moy, "OSPF Version 2", RFC 2328, April 1998.

     [RFC2844] T. Przygienda, P. Droz, R. Haas, "OSPF over ATM and
     Proxy-PAR", RFC 2844, May 2000.

     [RFC3358] T. Przygienda, "Optional Checksums in Intermediate System
     to Intermediate System (ISIS)", RFC 3358, August 2002.

     [RFC1157] J.D. Case, M. Fedor, M.L. Schoffstall, C. Davin, "Simple
     Network Management Protocol (SNMP)", RFC 1157, May 1990.

     [RFC3036] L. Andersson, P. Doolan, N. Feldman, A. Fredette, B.
     Thomas "LDP Specification", RFC 3036, January 2001.

     [Stevens] G.R Wright, W. Richard Stevens, "TCP/IP Illustrated Vol-
     ume 2, Chapter 20", June 1995.

     [Netfilter] http://netfilter.samba.org

     [Diffserv] http://diffserv.sourceforge.net

     [Netlink] J. H. Salim, H. Khosravi, A. Kleen, A. Kuznetsov,
     "Netlink as an IP Services Protocol", draft-ietf-forces-


Salim/Haas/Blake          Expires December 2003                [Page 23]


Internet-Draft         Netlink2 as ForCES Protocol             June 2003


     netlink-03.txt, June 2002.

     [ForCES_REQ] H. Khosravi, T. Anderson, "Requirements for Separation
     of IP Control and Forwarding", draft-ietf-forces-require-
     ments-07.txt, October 2002.

     [XTP] XTP Forum, "Xpress Transport Protocol Specification, XTP
     Revision 4.0", March 1995.


10.  Author's  Address:

   Jamal Hadi Salim
   Znyx Networks
   Ottawa, Ontario
   Canada
   hadi@znyx.com

   Robert Haas
   IBM Research
   Zurich Research Laboratory
   Saeumerstrasse 4
   CH-8803 Rueschlikon
   Switzerland
   rha@zurich.ibm.com

   Steven Blake
   Ericsson IP Infrastructure
   920 Main Campus Drive, Suite 500
   Raleigh, NC  27606
   steven.blake@ericsson.com


11.  Appendix 1: Sample Service Hierarchy

     In the diagram below we show a simple IP service, foo, and the
     interaction it has between CP and FE components for the ser-
     vice(labels 1-3).

     The diagram is also used to demonstrate CP<->FE addressing.  In
     this section we illustrate only the addressing semantics.  In
     Appendix 2 , the diagram is referenced again to define the protocol
     interaction between service foo's CEC and LFB (labels 4-10).


Salim/Haas/Blake          Expires December 2003                [Page 24]


Internet-Draft         Netlink2 as ForCES Protocol             June 2003


       CP
      [--------------------------------------------------------.
      |   .-----.                                              |
      |  |                        . --------.                 |
      |  |  CLI   |               /                           |
      |  |        |              | CP protocol |               |
      |         /->> -.         |  component  | <-.           |
      |    __ _/      |         |   For       |   |           |
      |                |         | IP service  |   ^           |
      |                Y         |    foo      |   |           |
      |                |          ___________/    ^           |
      |                Y   1,4,6,8,9 /  ^ 2,5,10   | 3,7       |
       --------------- Y------------/---|----------|-----------
                       |           ^    |          ^
                     **|***********|****|**********|**********
                     ************* Netlink2 layer ************
                     **|***********|****|**********|**********
             FE        |           |    ^          ^
             .-------- Y-----------Y----|--------- |----.
             |                    |              /     |
             |                    Y            /       |
             |          . --------^-------.  /         |
             |          |FE component/module|/          |
             |          |  for IP Service   |           |
      --->---|------>---|     foo           |----->-----|------>--
             |           -------------------            |
             |                                          |
             |                                          |
              ------------------------------------------


     The control plane protocol for IP service foo does the following to
     connect to its FE counterpart.  The steps below are also numbered
     in the diagram above.


1)   Connect to IP service foo through a socket connect.  A typical con-
     nection would be via a call to: socket(AF_NETLINK, SOCK_RAW,
     NETLINK_FOO)

2)   Bind to listen to specific async events for service foo

3)   Bind to listen to specific async FE events


     Note that a wrapper socket can be created on top of the real sock-
     ets: depending on the dest PID given, it chooses the most


Salim/Haas/Blake          Expires December 2003                [Page 25]


Internet-Draft         Netlink2 as ForCES Protocol             June 2003


     appropriate socket to send the packet onto (if here are two multi-
     cast groups, one for all FEs, and one for all FEs and CEs, a packet
     from the CE to the FEs will use the first multicast group).  The
     wrapper socket basically maps a message to the most appropriate
     wire in the bundle.

12.  Appendix 2: Sample Protocol for the foo IP Service


     Our proverbial IP service "foo" is used again to demonstrate how
     one can deploy a simple IP service control using Netlink2.

     These steps are continued from Appendix 1 (hence the numbering).

4)   query for current config of FE component

5)   receive response to 4) via channel on 3)

6)   query for current state of IP service foo

7)   receive response to 6) via channel on 2)

9)   register the protocol specific packets you would like the FE to
     forward to you

10)  send specific service foo commands and receive responses for them
     if needed


12.1.  Interacting with Other IP Services

     The diagram in Appendix 1 shows another control component configur-
     ing the same service.  In this case, it is a proprietary Command
     Line Interface.  The CLI may or may not be using the Netlink proto-
     col to communicate with the foo component.  If the CLI should issue
     commands that will affect the policy of the LFB for service "foo",
     then the "foo" CEC is notified.  It could then make algorithmic
     decisions based on this input.  For example if a FE allowed another
     service to delete policies installed by a different service and a
     policy that foo installed was deleted by service bar, there might
     be a need to propagate this to all the peers of service "foo").

13.  Appendix 3: Examples

     In this example we show a simple configuration Netlink2 message
     sent from a TC CEC to an egress TC FIFO queue.  This queue algo-
     rithm is based on packet counting and drops packets when the limit
     exceeds 100 packets.  We assume the queue is in hierarchical setup


Salim/Haas/Blake          Expires December 2003                [Page 26]


Internet-Draft         Netlink2 as ForCES Protocol             June 2003


     with a parent 100:0 and a classid of 100:1 and that it is to be
     installed on device with ifindex of 4.


       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
                       0               1               2             3
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |    Version    |    Flags_E    |             Length            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Type (RTM_NEWQDISC)           | Flags (NLM_F_EXCL |           |
      |                               |NLM_F_CREATE | NLM_F_REQUEST)  |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                      Sequence Number (arbitrary number)       |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                           Source PID                          |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                         Destination PID                       |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |Family(AF_INET)|  Reserved1    |         Reserved1             |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                     Interface Index  (4)                      |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                      Qdisc handle  (0x1000001)                |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                     Parent Qdisc   (0x1000000)                |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                        TCM Info  (0)                          |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |            Type (TCA_KIND)    |          Length(4)            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                        Value ("pfifo")                        |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |            Type (TCA_OPTIONS) |          Length(4)            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                        Value (limit=100)                      |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


Salim/Haas/Blake          Expires December 2003                [Page 27]