Network Working Group R. R. Stewart INTERNET-DRAFT Cisco Systems Q. Xie Motorola expires in six months June 01,2001 Aggregate Server Access Protocol (ASAP) Status of This Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of [RFC2026]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract Aggregate Server Access Protocol (ASAP) in conjunction with ENRP [ENRP] provides a high availability data transfer mechanism over IP networks. ASAP uses a name-based addressing model which isolates a logical communication endpoint from its IP address(es), thus effectively eliminating the binding between the communication endpoint and its physical IP address(es) which normally constitutes a single point of failure. In addition, ASAP defines each logical communication destination as a pool, providing full transparent support for server-pooling and load sharing. It also allows dynamic system scalability - members of a server pool can be added or removed at any time without interrupting the service. ASAP is designed to take full advantage of the network level redundancy provided by the Stream Transmission Control Protocol (SCTP) [SCTP]. The high availability server pooling is gained by combining two protocols, namely ASAP and the Endpoint Name Resolution Protocol (ENRP). ASAP provides the user interface for name to address translation, load sharing management, and fault management. ENRP defines the high availability name translation service. Table Of Contents Stewart, Xie [Page 1] Internet Draft Aggregate Server Access Protocol June 2001 1. Introduction................................................ 3 1.1 Definitions............................................... 3 1.2 Organization of this document............................. 5 1.3 Scope of ASAP............................................. 5 1.3.1 Extent of the name space............................... 5 2. Conventions................................................. 5 3. Message Summary............................................. 5 3.1 PE Parameter Definition................................... 6 3.2 REGISTRATION message...................................... 6 3.3 DEREGISTRATION message.................................... 7 3.4 REGISTRATION_RESPONSE message............................. 7 3.5 NAME_RESOLUTION message................................... 8 3.6 NAME_RESOLUTION_RESPONSE message.......................... 8 3.7 NAME_UNKNOWN message...................................... 9 3.8 UPDATE_POLICY_VALUE message............................... 9 3.9 ENDPOINT_KEEP_ALIVE message............................... 9 3.10 ENDPOINT_UNREACHABLE message ............................10 3.11 SERVER_HUNT message .....................................10 3.12 SERVER_HUNT_RESPONSE message.............................11 4. The ASAP Interfaces.........................................11 4.1 Registration.Request Primitive............................11 4.2 Deregistration.Request Primitive..........................12 4.3 Cache.Populate.Request Primitive..........................12 4.4 Cache.Purge.Request Primitive.............................12 4.5 Data.Send.Request Primitive...............................12 4.5.1 Sending to a Pool.......................................13 4.5.2 Pool Element Selection.................................14 4.5.2.1 Pool selection policy - Round Robin.................14 4.5.2.2 Pool Selection Policy - Least Used Policy...........15 4.5.2.3 Pool Selection Policy - Least Used with Degradation Policy..............................................15 4.5.2.4 Pool Selection Policy - Weighted round robin........15 4.5.3 Sending to a Pool Element Handle.......................15 4.5.4 Send by Transport Address..............................16 4.5.5 Options................................................16 4.6 Data.Received Notification................................18 4.7 Error.Report Notification.................................18 4.8 SCTP primitives...........................................18 4.8.1 SCTP SEND Primitive....................................18 4.8.2 SCTP RECEIVE Primitive.................................19 4.8.3 SCTP SET.PRIMARY Primitive.............................19 4.8.4 SCTP DATA.ARRIVE Notification..........................19 4.8.5 SCTP SEND.FAILURE Notification.........................19 4.8.6 SCTP COMMUNICATION.LOST Notification...................20 4.8.7 SCTP NETWORK.STATUS.CHANGE Notification................20 4.9 Examples..................................................20 4.9.1 Send to an Unknown Name................................20 4.9.2 Send to a Cached Name..................................21 4.10 Handle ASAP to ENRP Communication Failures...............22 4.10.1 SCTP Send Failure.....................................22 4.10.2 T1-ENRPrequest Timer Expiration.......................22 4.10.3 Handle ENDPOINT_KEEP_ALIVE Messages...................22 5. Variables, Timer Values, and Thresholds....................23 5.1 Timer values..............................................23 Stewart, Xie [Page 2] Internet Draft Aggregate Server Access Protocol June 2001 5.2 Thresholds................................................23 6. References..................................................23 7. Acknowledgements............................................24 8. Authors' Addresses.........................................24 1. Introduction Aggregate Server Access Protocol (ASAP) in conjunction with ENRP [ENRP] provides a high availability data transfer mechanism over IP networks. ASAP uses a name-based addressing model which isolates a logical communication endpoint from its IP address(es), thus effectively eliminating the binding between the communication endpoint and its physical IP address(es) which normally constitutes a single point of failure. When multiple receiver instances exist under the same name, a.k.a, a server pool, ASAP will select one Pool Element (PE), based on the current load sharing policy indicated by the server pool, and deliver the message to the selected PE. While delivering the message, ASAP monitors the reachability of the selected PE. If it is found unreachable, before notifying the sender of the failure, ASAP can automatically select another PE (if one exists) under that Pool and attempt to deliver the message to that PE. In other words, ASAP is capable of transparent fail-over amongst instances of a server pool. ASAP uses the Endpoint Name Resolution Protocol (ENRP) to provide a high availability name space. ASAP is responsible for the abstraction of the underlying transport technologies, load distribution management, fault management, as well as the presentation to the upper layer (i.e., the ASAP user) a unified primitive interface. When SCTP [RFC2960] is used as the transport layer protocol, ASAP can seamlessly incorporate the link-layer redundancy provided by the SCTP. This document defines ASAP portion of the high availability server pool. ASAP depends on the services of a high availiablity name space a.k.a. ENRP. 1.1 Definitions This document uses the following terms: Operation scope: The part of the network visible to pool users by a specific instance of the reliable server pooling protocols. Server pool (or Pool): A collection of servers providing the same application functionality. Stewart, Xie [Page 3] Internet Draft Aggregate Server Access Protocol June 2001 Pool handle (or pool name): A logical pointer to a pool. Each server pool will be identifiable in the operation scope of the system by a unique pool handle or "name". Pool element (PE): A server entity having registered to a pool. Pool user (PU): A server pool user. Pool element handle (or endpoint handle): A logical pointer to a particular pool element in a pool, ENRP server: A server program running on a host that manages the name space collectively with its peer ENRP servers and replies to the service requests from any Pool user or Pool Element. Home ENRP server: The ENRP server to which a Pool element currently belongs. Pool elements normally choose the ENRP server on their local host as the home ENRP server (if one exists). A Pool element shall only have one home ENRP server at any given time. Both the PE and the server shall keep track of this master/slave relationship between them. ENRP server takeover: The event that a remote ENRP server takes the ownership of all the Pool elements on a host and becomes their home server. Caretaker ENRP server: The ENRP server on a remote host which takes ownership of all PEs on a host because of the absence of an active local ENRP server. PU channel: The communication channel through which a ASAP Pool User requests for ENRP service. The PU channel is usually defined by the transport address of the home server and a well known port number. ENRP server channel: Defined by a well known multicast IP address and a well known port number, or a well known list of transport addresses for a group of ENRP servers spanning an operational scope. All ENRP servers in an operation scope can communicate with one another through this channel. ENRP name domain: Defined by the combination of the PU channel and the ENRP server channel in the operation scope. Stewart, Xie [Page 4] Internet Draft Aggregate Server Access Protocol June 2001 Network Byte Order: Most significant byte first, a.k.a Big Endian. 1.2 Organization of this document Chapter 3 details ASAP message formats. In Chapter 4 we give the details of the ASAP interface, focusing on the communication primitives between the applications above ASAP and ASAP itself, and the communications primitives between ASAP and SCTP (or other transport layer). Also included in this discussion is relevant timers and configurable parameters as appropriate. Chapter 5 provides settable protocol values. 1.3 Scope of ASAP The requirements for high availability and scalability do not imply requirements on shared state and data. ASAP does not provide transaction failover. If a host or application fails during processing of a transaction this transaction may be lost. Some services may provide a way to handle the failure, but this is not guaranteed. ASAP MAY provide hooks to assist an application in building a mechanism to share state but ASAP in itself will NOT share any state. 1.3.1 Extent of the name space The scope of the ASAP/ENRP is NOT Internet wide. The namespace is neither hierarchical nor arbitrarily large like DNS. We propose a flat peer-to-peer model. Pools of servers will exist in different administrative domains. For example, suppose I want to use ASAP/ENRP. First, the PU will use DNS to contact an ENRP server. Suppose a PU in North America wish to contact the server pool in Japan instead of North America. The PU would use DNS to get the IP address of the Japanese server pool domain, that is, the address of an ENRP server('s) in Japan. From there the PU would query the ENRP server and then directly contact the PE's of interest. 2. Conventions The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, NOT RECOMMENDED, MAY, and OPTIONAL, when they appear in this document, are to be interpreted as described in [RFC2119]. 3. Message Summary All messages as well as their fields described below MUST be in Network Byte Order during transmission. For fields with a length bigger than 4 octets, a number in a pair of parentheses may follow the filed name to indicate the length of the field in number of octets. Stewart, Xie [Page 5] Internet Draft Aggregate Server Access Protocol June 2001 3.1 PE Parameter Definition This parameter is used in multiple ASAP message to represent a PEP or endpoint and the associated information, such as its transport address(es), load control, and other operational status information of the PE. The parameter is defined to support PEs with up to 8 different IPv4 transport addresses. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IP address #0 | +-------------------------------+-------------------------------+ | IP address #1 | +-------------------------------+-------------------------------+ \ \ \ \ / / / / \ \ \ \ +-------------------------------+-------------------------------+ | IP address #7 | +-------------------------------+-------------------------------+ | SCTP Port | Padding | +-------------------------------+-------------------------------+ | Load sharing policy type | Policy Value | +---------------+---------------+---------------+---------------+ The size of a PE Parameter is 40 octets. 3.2 REGISTRATION message 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ENRP endpoint message identifier #1 = 0x18038688 | +-------------------------------+-------------------------------+ | ENRP endpoint message identifier #2 = 0x77734683 | +-------------------------------+-------------------------------+ | Type = 0x3 | +-------------------------------+-------------------------------+ | | | pool handle (32) | | | +-------------------------------+-------------------------------+ | | | PE Parameter (40) | | | +-------------------------------+-------------------------------+ The pool handle field specifies the name to be registered, that shall be composed of up to 32 characters. The PE Parameter field shall be filled in by the registrant endpoint to declare its Stewart, Xie [Page 6] Internet Draft Aggregate Server Access Protocol June 2001 transport addresses, server pooling policy and value, and other operation preferences. 3.3 DEREGISTRATION message 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ENRP endpoint message identifier #1 = 0x18038688 | +-------------------------------+-------------------------------+ | ENRP endpoint message identifier #2 = 0x77734683 | +-------------------------------+-------------------------------+ | Type = 0x4 | +-------------------------------+-------------------------------+ | | | pool handle (32) | | | +-------------------------------+-------------------------------+ | | | PE Parameter (40) | | | +-------------------------------+-------------------------------+ The endpoint sending the DEREGISTRATION shall fill in the name and the PE Parameter in order to allow the ENRP server to verify the identity of the endpoint. 3.4 REGISTRATION_RESPONSE message 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ENRP endpoint message identifier #1 = 0x18038688 | +-------------------------------+-------------------------------+ | ENRP endpoint message identifier #2 = 0x77734683 | +-------------------------------+-------------------------------+ | Type = 0x5 | +-------------------------------+-------------------------------+ | | | pool handle (32) | | | +-------------------------------+-------------------------------+ | Result = (see below) | +-------------------------------+-------------------------------+ | Requested action = (see below) | +-------------------------------+-------------------------------+ | | | PE Parameter (40) | | | +-------------------------------+-------------------------------+ In response to a REGISTRATION, the 'Requested action' field shall be Stewart, Xie [Page 7] Internet Draft Aggregate Server Access Protocol June 2001 set to 0x0, and the 'Result' field shall take the following values: 0x0 -- registration granted 0x1 -- registration rejected In response to a DEREGISTRATION, the 'Requested action' field shall be set to 0x1, and the 'Result' field shall take the following values: 0x2 -- de-registration granted 0x3 -- de-registration rejected: endpoint not found 0x4 -- de-registration rejected: other failures. PE Parameter shall be filled in for verification purposes. 3.5 NAME_RESOLUTION message 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ENRP endpoint message identifier #1 = 0x18038688 | +-------------------------------+-------------------------------+ | ENRP endpoint message identifier #2 = 0x77734683 | +-------------------------------+-------------------------------+ | Type = 0x1 | +-------------------------------+-------------------------------+ | | | requested name (32) | | | +-------------------------------+-------------------------------+ 3.6 NAME_RESOLUTION_RESPONSE message 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ENRP endpoint message identifier #1 = 0x18038688 | +-------------------------------+-------------------------------+ | ENRP endpoint message identifier #2 = 0x77734683 | +-------------------------------+-------------------------------+ | Type = 0x2 | +-------------------------------+-------------------------------+ | | | pool handle (32) | | | +-------------------------------+-------------------------------+ | number of entries = n (see below) | +-------------------------------+-------------------------------+ | | | PE Parameter 1 (40) | | | +-------------------------------+-------------------------------+ / / Stewart, Xie [Page 8] Internet Draft Aggregate Server Access Protocol June 2001 \ \ / / +-------------------------------+-------------------------------+ | | | PE Parameter n (40) | | | +-------------------------------+-------------------------------+ 3.7 NAME_UNKNOWN message 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ENRP endpoint message identifier #1 = 0x18038688 | +-------------------------------+-------------------------------+ | ENRP endpoint message identifier #2 = 0x77734683 | +-------------------------------+-------------------------------+ | Type = 0x0 | +-------------------------------+-------------------------------+ | | | requested name (32) | | | +-------------------------------+-------------------------------+ 3.8 UPDATE_POLICY_VALUE message 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ENRP endpoint message identifier #1 = 0x18038688 | +-------------------------------+-------------------------------+ | ENRP endpoint message identifier #2 = 0x77734683 | +-------------------------------+-------------------------------+ | Type = 0x11 | +-------------------------------+-------------------------------+ | | | pool handle (32) | | | +-------------------------------+-------------------------------+ | | | PE Parameter (40) | | | +-------------------------------+-------------------------------+ | New policy value | +-------------------------------+-------------------------------+ 3.9 ENDPOINT_KEEP_ALIVE message 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Stewart, Xie [Page 9] Internet Draft Aggregate Server Access Protocol June 2001 | ENRP endpoint message identifier #1 = 0x18038688 | +-------------------------------+-------------------------------+ | ENRP endpoint message identifier #2 = 0x77734683 | +-------------------------------+-------------------------------+ | Type = 0x6 | +-------------------------------+-------------------------------+ | | | pool handle (32) | | | +-------------------------------+-------------------------------+ 3.10 ENDPOINT_UNREACHABLE message 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ENRP endpoint message identifier #1 = 0x18038688 | +-------------------------------+-------------------------------+ | ENRP endpoint message identifier #2 = 0x77734683 | +-------------------------------+-------------------------------+ | Type = 0xa | +-------------------------------+-------------------------------+ | | | pool handle (32) | | | +-------------------------------+-------------------------------+ | Endpoint IP address | +-------------------------------+-------------------------------+ | Endpoint's SCTP port | padding | +-------------------------------+-------------------------------+ | Type of severity (see below) | +-------------------------------+-------------------------------+ The 'pool handle' is not required to be filled in for this message, however the IP address and SCTP port number of the endpoint must be supplied in the message. 'Type of severity' shall take one of the following values: 0x0 --- NORMAL_REPORT: warning to the server. 0x1 --- FINAL_REPORT: the specified endpoint must be removed by the server immediately. 3.11 SERVER_HUNT message 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ENRP endpoint message identifier #1 = 0x18038688 | +-------------------------------+-------------------------------+ | ENRP endpoint message identifier #2 = 0x77734683 | +-------------------------------+-------------------------------+ | Type = 0xb | +-------------------------------+-------------------------------+ Stewart, Xie [Page 10] Internet Draft Aggregate Server Access Protocol June 2001 | | | pool handle (32) | | | +-------------------------------+-------------------------------+ | criticality (see below) | +-------------------------------+-------------------------------+ The 'criticality' field shall take one of the following values: 0x1 --- LOW_CRITICALITY 0x2 --- MED_CRITICALITY 0x3 --- HIGH_CRITICALITY 3.12 SERVER_HUNT_RESPONSE message 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ENRP endpoint message identifier #1 = 0x18038688 | +-------------------------------+-------------------------------+ | ENRP endpoint message identifier #2 = 0x77734683 | +-------------------------------+-------------------------------+ | Type = 0xc | +-------------------------------+-------------------------------+ | | | pool handle (32) | | | +-------------------------------+-------------------------------+ 4. The ASAP Interfaces This chapter will focus primarily on the primitives and notifications that form the interface between the ASAP-user and the ASAP and that between ASAP and its lower layer transport protocol (e.g., SCTP). Appropriate timers and recovery actions for failure detection and management are also discussed. An ASAP User (PU OR PE) passes primitives to the ASAP sub-layer to request certain actions. Upon the completion of those actions or upon the detection of certain events, the ASAP will notify the ASAP-user. 4.1 Registration.Request Primitive Format: registration.request(endpointName) where the endpointName parameter contains a NULL terminated ASCII string of fixed length. The ASAP user invokes this primitive to add itself to the name Stewart, Xie [Page 11] Internet Draft Aggregate Server Access Protocol June 2001 space, thus becoming a Pool Element. The ASAP user must register its name with the ENRP server by using this primitive before other ASAP endpoints using the name space can send message(s) to this ASAP user by name or by Pool element handle (see Sections ? and ?). In response to the registration primitive, the ASAP layer will send a REGISTRATION message to the home ENRP server (See section ?), and start a T2-registration timer. If the T2-registration timer expires before receiving a REGISTRATION_RESPONSE message, or a SEND.FAILURE notification is received from the SCTP layer, the ASAP layer shall start the Server Hunt procedure (see Section ?) in an attempt to get service from a remote ENRP server. 4.2 Deregistration.Request Primitive Format: deregistration.request() The ASAP user invokes this primitive to remove itself from the Server Pool. This should be used as a part of the graceful shutdown process by the application. A DEREGISTRATION message will be sent by ASAP layer to the home ENRP server (see Section ?). 4.3 Cache.Populate.Request Primitive Format: cache.populate.request(destinationAddress, typeOfAddress) If the address type is a Pool handle (or name) and a local name translation cache exists, the ASAP layer should initiate a mapping information query on the Pool handle and update it local cache when the response comes back from the ENRP server. 4.4 Cache.Purge.Request Primitive Format: cache.purge.request(destinationAddress, typeOfAddress) If the address type is a Pool handle (or name) and local name translation cache exists, the ASAP layer should remove the mapping information on the Pool handle from its local cache. 4.5 Data.Send.Request Primitive Format: data.send(destinationAddress, typeOfAddress, message, sizeOfMessage, Options); This primitive requests ASAP to send a message to some specified Pool or Pool element within the current Operational scope. Depending on the address type used for the send request, the sender's ASAP layer may perform address translation and Pool Element Stewart, Xie [Page 12] Internet Draft Aggregate Server Access Protocol June 2001 selection before sending the message out. The data.send primitive can take the following forms of address type. 4.5.1 Sending to a Pool In this case the destinationAddress and typeOfAddress together indicates a Pool handle. This is the simplest form of data.request primitive. By default, this directs ASAP to send the message to one of the pool elements in the server pool. Before sending the message out to the Pool, the sender's ASAP layer MUST first perform a name to address translation. It may also need to perform Pool Element selection if multiple Pool Elements exist in the Server Pool. If the sender's ASAP implementation does not support a local cache of the translation information or if it does not have the translation information on that Pool in its local cache, it will transmit a request for the name mapping information to the ENRP server, and MUST hold the outbound message in queue while awaiting the response from the ENRP server (any further send request to this Pool before the ENRP server responds SHOULD also be queued). Once the necessary mapping information arrives from the ENRP server, the sender's ASAP will: A) map the name into a list of transport addresses of the destination PE(s), B) If multiple PEs exist in that Pool, ASAP will choose one of them and transmit the message to it. In that case, the choice of the PE is made by ASAP layer of the sender based on the server pooling policy as discussed in section ?. C) if no association exists towards the destination, establish a new SCTP association, NOTE: if the underlying SCTP implementation supports implicit association setup, this step is not needed (see [RFC2960]). D) send out the queued message to the SCTP association using the SEND primitive (see [RFC2960]), and, E) if the local cache is implemented, append/update the local cache with the mapping information received in the ENRP server's response. Also, record the local SCTP association id, if a new association was created. For more on the ENRP server request procedures see [ENRP]. Stewart, Xie [Page 13] Internet Draft Aggregate Server Access Protocol June 2001 Optionally, the ASAP layer of the sender may return a Pool Element handle of the selected PE to the application after sending the message. This handle can then be used for future transmissions to that PE (see Section ?). Section ? defines the fail-over procedures for cases where the selected PE is found unreachable. 4.5.2 Pool Element Selection Each time a ASAP user sends a message to a Pool that contains more than one PE, the sender's ASAP layer must select one of the PEs in the Pool as the receiver of the current message. The selection is done according to the current server pooling policy of the Pool to which the message is sent. Note, no selection is needed if the ASAP_SEND_TOALL option is set (see Section ?). When joining a Pool, along with its registration each PE specifies its preferred server pooling policy for receiving messages sent to this Pool. But only the server pooling policy specified by the first PE joining the Pool will become the current server pooling policy of the group. Moreover, together with the server pooling policy, each PE can also specify a Policy Value for itself at the registration time. The meaning of the policy value depends on the current server pooling policy of the group. A PE can also change its policy value whenever it desires, see Section ? for details. Note, if this first PE removes itself from the Pool (e.g., by de-registration from the name space) and the remaining PEs have specified conflicting server pooling policies at their corresponding registrations, it is implementation specific to determine the new current server pooling policy. Four basic server pooling policies are defined in ASAP, namely the Round Robin, Least Used, Least Used Degrading and weighted round robin. The following sections describes each of these policies. 4.5.2.1 Pool selection policy - Round Robin When a ASAP endpoint sends messages by Pool Handle and Round-Robin is the current policy of that Pool, the ASAP layer of the sender will select the receiver for each outbound message by round-Robining through all the registered PEs in that Pool, in an attempt to achieve an even distribution of outbound messages. Note that in a large server pool, the ENRP deamon may NOT send back all PEs to the ASAP client. In this case the client or PU will be performing a round robin policy on a subset of the entire Pool. Stewart, Xie [Page 14] Internet Draft Aggregate Server Access Protocol June 2001 4.5.2.2 Pool Selection Policy - Least Used Policy When the destination Pool is under the Least Used server pooling policy, the ASAP layer of the message sender will select the PE that has the lowest policy value in the group as the receiver of the current message. If more than one PE from the group share the same lowest policy value, the selection will be done round Robin amongst those PEs. It is important to note that this policy means that the same PE will be always selected as the message receiver by the sender until the load control information of the Pool is updated and changed in the local cache of the sender (see section ?). 4.5.2.3 Pool Selection Policy - Least Used with Degradation Policy This policy is the same as the Least Used policy with the exception that, each time the PE with the lowest policy value is selected from the Pool as the receiver of the current message, its policy value is incremented, and thus it may no longer be the lowest value in the Pool. This provides a degradation of the policy towards round Robin policy over time. As with the Least Used policy, every local cache update at the sender will bring the policy back to Least Used with Degradation. 4.5.2.4 Pool Selection Policy - Weighted round robin [TBD] 4.5.3 Sending to a Pool Element Handle In this case the destinationAddress and typeOfAddress together indicates a ASAP pool element handle. This requests the ASAP layer to deliver the message to the PE identified by the pool element handle. The pool element handle should contains the name and the primary destination transport address of the destination PE. The ASAP layer shall use the address to identify the SCTP association (or to setup a new one if necessary) and then invoke the SCTP SEND primitive to send the message to the PE. If local cache is supported and the mapping information for the Pool found in the pool element handle is not available in the local cache, the sender's ASAP layer SHOULD, after sending the message, also transmit a request for the Pool handle mapping information to the ENRP server. Once the necessary mapping information arrives from the ENRP server, the sender's ASAP will update its local cache with Stewart, Xie [Page 15] Internet Draft Aggregate Server Access Protocol June 2001 the newly received mapping information for that Pool handle. Section ? defines the fail-over procedures for cases where the PE pointed to by the Pool Element handle is found unreachable. Optionally, the ASAP layer may return the actual Pool Elment handle to which the message was sent (this may be different from the Pool Element handle specified when the primitive is invoked, due to the possibility of automatic fail-over). 4.5.4 Send by Transport Address In this case the destinationAddress and typeOfAddress together indicates an SCTP transport address. This directs the sender's ASAP layer to send the message out to the specified transport address. No endpoint fail-over is support when this form of send request is used. This mode of sending effectively by-passes the ASAP layer. 4.5.5 Options The Options parameter passed in the various forms of the above data.request primitive gives directions to the sender's ASAP layer on special handling of the message delivery. Options can be grouped as follows: - PE fail-over (allowed, or prohibited), - whether to send to one PE or to the whole Pool, - whether to send to the same PE last sent to within the Pool, and - options passed to the SCTP transport protocol. The complete list of Options is as follows: ASAP_USE_DEFAULT: 0x0000 Use default setting. ASAP_SEND_FAILOVER: 0x0001 Enables PE fail-over on this message. In case where the first selected PE or the PE pointed to by the handle is found unreachable, this option allows the sender's ASAP layer to re-select an alternate PE from the same Pool if one exists, and silently re-send the message to this newly selected endpoint. Endpoint unreachable is normally indicated by the Communication Lost or Send Failure notification from SCTP. Stewart, Xie [Page 16] Internet Draft Aggregate Server Access Protocol June 2001 ASAP_SEND_NO_FAILOVER: 0x0002 This option prohibits the sender's ASAP layer from re-sending the message to any alternate PE in case that the first selected PE or the PE to by the handle is found unreachable. Instead, the sender's ASAP layer shall notify its upper layer about the unreachability with an Error.Indication and any unsent data. ASAP_SEND_TO_LAST: 0x0004 This option requests the sender's ASAP layer to send the message to the same PE in the Pool that the previous message was sent to. ASAP_SEND_TOALL: 0x0008 When sending by Pool Handle, this option directs the sender's ASAP layer to send a copy of the message to all the PEs, except for the sender itself (if the sender is a PE), in that Pool. ASAP_SEND_TOSELF: 0x0010. This option only applies in combination with ASAP_SEND_TOALL option. It permits the sender's ASAP layer also deliver a copy of the message to itself (i.e., loopback). ASAP_SCTP_BUNDLE: 0x0100 This option allows the local SCTP transport layer to bundle the outbound messages whenever possible into bigger datagrams before transmitting them onto the network. ASAP_SCTP_NO_BUNDLE: 0x0200 This option disallows the local SCTP transport layer to bundle outbound messages. ASAP_SCTP_HB_ON: 0x0400 This option instructs the local SCTP transport layer to turn on heartbeat on the SCTP association indicated by the destinationAddress parameter. ASAP_SCTP_HB_OFF: 0x0800 This option instructs the local SCTP transport layer to turn off heartbeat on the SCTP association indicated by the destinationAddress parameter. ASAP_SCTP_UNORDER: 0x1000 This option instructs the SCTP transport layer to send the current message using un-ordered delivery. Stewart, Xie [Page 17] Internet Draft Aggregate Server Access Protocol June 2001 4.6 Data.Received Notification Format: data.received(messageReceived, sizeOfMessage, senderAddress, typeOfAddress) When a new user message is received, the ASAP layer of the receiver uses this notification to pass the message to its upper layer. Along with the message being passed, the ASAP layer of the receiver should also indicate to its upper layer the message sender's address. The sender's address can be in the form of either an SCTP association id, or a ASAP pool element handle. A) If the name translation local cache is implemented at the receiver's ASAP layer, a reverse mapping from the sender's IP address to the pool handle should be performed and if the mapping is successful, the sender's ASAP pool element handle should be constructed and passed in the senderAddress field. B) If there is no local cache or the reverse mapping is not successful, the SCTP association id should be passed in the senderAddress field. 4.7 Error.Report Notification Format: error.report(destinationAddress, typeOfAddress, failedMessage, sizeOfMessage) An error.report should be generated to notify the ASAP user about failed message delivery as well as other abnormalities (see Section ? for details). The destinationAddress and typeOfAddress together indicates to whom the message was originally sent. The address type can be either a ASAP pool element handle, association id, or a transport address. The original message (or the first portion of it if the message is too big) and its size should be passed in the failedMessage and sizeOfMessage fields, respectively. 4.8 SCTP primitives 4.8.1 SCTP SEND Primitive Basic Format: SEND(association id, buffer address, byte count, options) -> result The outbound message will be held in the buffer when this primitive is invoked. The ASAP layer shall identify the SCTP association which connects to the intended destination and fill in the 'association Stewart, Xie [Page 18] Internet Draft Aggregate Server Access Protocol June 2001 id'. The options field will hold the options destined to the SCTP transport layer (see ?). The returned 'result' can indicate whether there is any local error executing the primitive. 4.8.2 SCTP RECEIVE Primitive Basic Format: RECEIVE(association id, buffer address, buffer size) -> byte count This primitive reads the first user message out from the SCTP stack (if there is one available) and put it into the specified buffer. The size of the message read, in octets, will also be returned. 4.8.3 SCTP SET.PRIMARY Primitive Basic Format: SET.PRIMARY(association id, destination transport address) -> result This can be used to instructs SCTP to use the specified destination transport address as the new primary destination address for sending messages. 4.8.4 SCTP DATA.ARRIVE Notification SCTP layer invokes this notification when a user message is successfully received and ready for retrieval. This shall prompt the ASAP layer to invoke the RECEIVE primitive to get the data (see ?). 4.8.5 SCTP SEND.FAILURE Notification If a message can not be delivered to the specified association id, for any reason, SCTP will invoke this notification to notify ASAP. In response, the ASAP shall take the following steps: A) If the message was originally sent by Pool handle or Pool Element handle and with option ASAP_SEND_FAILOVER set, retransmit the message to an alternate PE of the same Pool if one exists in the Server Pool. The proper server pooling policy shall be followed if more than one alternates exist in the group. B) If no alternate exists or option ASAP_SEND_FAILOVER is not set when the message was originally sent, generate an Error.report to report the failure to the ASAP user. Stewart, Xie [Page 19] Internet Draft Aggregate Server Access Protocol June 2001 4.8.6 SCTP COMMUNICATION.LOST Notification When SCTP loses communication to an PE completely or detects that the PE has performed an abort or graceful shutdown operation, it invokes this notification to notify ASAP layer. When handling this notification ASAP shall report this event to its ENRP server via an ENDPOINT_UNREACHABLE message with the severity level set to NORMAL_REPORT (see ?). If local mapping cache is implemented, the ASAP layer should also mark the PE as unreachable in its local cache. And if all the PEs in that Pool are marked as unreachable, the ASAP layer should remove the Pool from its local cache. 4.8.7 SCTP NETWORK.STATUS.CHANGE Notification The SCTP sends this notification to the ASAP layer when the reachability status of a transport address of a specific SCTP association has changed. If the local mapping cache is supported, the ASAP layer, upon reception of this notification, should look up the information of this PE in its local cache and record the reachability change. If the address in question becomes unreachable and is the primary address of the association, the ASAP layer MAY also elect a new primary for this association by invoking the SET.PRIMARY primitive (Section ?). If the local cache is not support or the reverse look up does not succeed, ASAP takes no action. 4.9 Examples 4.9.1 Send to an Unknown Name This example shows the event sequence when the Pool user, Endpoint 1 (EP1) sends the message "hello world" to a name which is not in the local mapping cache (assuming local caching is supported). ENRP Server EP1 new-name:EP2 | | | | +---+ | | | 1 | | | 2. NAME_RESOLUTION +---+ | Stewart, Xie [Page 20] Internet Draft Aggregate Server Access Protocol June 2001 |<----------------------------------| | | +---+ | | | 3 | | | 4. NAME_RESOLUTION_REPONSE +---+ | |---------------------------------->| | | +---+ | | | 5 | | | +---+ 6. "hello" | | |----------------->| | | | 1) The user at EP1 invokes: data.send("new-name", name-type, "hello", 5 0); The ASAP layer, in response, looks up the name "new-name" in its local cache but fails to find it. 2) The ASAP layer of EP1 queues the message, and sends a NAME_RESOLUTION request to the ENRP server asking for all information about "new-name". 3) A T1-ENRPrequest timer is started while the ASAP layer is waiting for the response from the ENRP server. 4) The ENRP Server responds to the query with a NAME_RESOLUTION_REPONSE message that contains all the information about "new-name". 5) ASAP at EP1 cancels the T1-ENRPrequest timer and populate its local cache with information on "new-name". 6) Based on the server pooling policy of "new-name", ASAP at EP1 selects the PE (EP2), sets up, if necessary, an SCTP association towards EP2 (explicitly or implicitly), and send out "hello" message. 4.9.2 Send to a Cached Name This shows the event sequence when the ASAP user at EP1 sends another message to the "new-name". ENRP Server EP1 new-name:EP2 | | | | +---+ | | | 1 | | | +---+ 2. "hello world 2" | | |------------------------->| | | | Stewart, Xie [Page 21] Internet Draft Aggregate Server Access Protocol June 2001 1) The user at EP1 invokes: data.request("new-name", name-type, "hello world 2", 13, 0); The ASAP layer, in response, looks up the name "new-name" in its local cache and find the mapping information. 2) Based on the server pooling policy of "new-name", ASAP at EP1 selects the PE (assume EP2 is selected again), and sends out "hello world 2" message (assume the SCTP association is already set up). 4.10 Handle ASAP to ENRP Communication Failures Three types of failure may occur when the ASAP layer at an endpoint tries to communicate with the ENRP server: A) SCTP send failure B) T1-ENRPrequest timer expiration C) Registration failure Registration failure is discussed in section ?. 4.10.1 SCTP Send Failure This indicates that the SCTP layer failed to deliver a message sent to the ENRP server. In other words, the ENRP server is currently unreachable. In such a case, the ASAP layer should not re-send the failed message. Instead, it should discard the failed message and start the ENRP server hunt procedure as described in Section ?. 4.10.2 T1-ENRPrequest Timer Expiration When a T1-ENRPrequest timer expires, the ASAP should re-send the original request to the ENRP server and re-start the T1-ENRPrequest timer. In parallel, a SERVER_HUNT message should be issued per Section ?. This should be repeated up to 'max-request-retransmit' times. After that, an Error.Report notification should be generated to inform the ASAP user and the ENRP request message associated with the timer should be discarded. 4.10.3 Handle ENDPOINT_KEEP_ALIVE Messages At times, a ASAP endpoint may receive ENDPOINT_KEEP_ALIVE messages (see section 3.1.2.1?) from the ENRP server. This message requires no response and should be silently discarded by the ASAP Stewart, Xie [Page 22] Internet Draft Aggregate Server Access Protocol June 2001 layer. However, each time when an ENDPOINT_KEEP_ALIVE is received, the receiver should update its home ENRP server to the sender of the latest Keep Alive message. 5. Variables, Timer Values, and Thresholds The following is a summary of the variables, time values, and pre-set thresholds used in ASAP and ENRP protocol. 5.1 Timer values T1-ENRPrequest - A timer started when a request is sent by ASAP to the ENRP server (providing application information is queued). Normally set to 15 seconds. T2-registration - A timer started when sending a registration request to the local ENRP server, normally set to 30 seconds. T3-registration-reattempt - If the registration cycle does not complete this timer is begun to restart the registration process. Normal value for this timer is 10 minutes. T4-reregistration - This timer is started after successful registration into the ASAP name space and is used to cause a re-registration at a periodic interval. This timer is normally set to 10 minutes. 5.2 Thresholds Timeout-registration --- pre-set threshold; how long an PE will wait for the REGISTRATION_RESPONSE from its home ENRP server. Timeout-server-hunt --- pre-set threshold; how long an endpoint will wait for the REGISTRATION_RESPONSE from its home ENRP server. num-of-serverhunts - The current count of server hunt messages that have been transmitted. registration-count - The current count of attempted registrations. max-reg-attempt - The maximum number of registration attempts to be made before a server hunt is issued. max-request-retransmit - The maximum number of attempts to be made when requesting information from the local ENRP server before a server hunt is issued. 6. References [RFC2026] Bradner, S., "The Internet Standards Process -- Revision Stewart, Xie [Page 23] Internet Draft Aggregate Server Access Protocol June 2001 3", BCP 9, RFC 2026, October 1996. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2960] R. Stewart, Q. Xie, K. Morneault, C. Sharp, H. Schwarzbauer, T. Taylor, I. Rytina, M. Kalla, L. Zhang, and, V. Paxson, "Stream Control Transmission Protocol," , October 2000. [ENRP] Q. Xie, R. R. Stewart "Endpoint Name Resolution Protocol", draft-ietf-rserpool-enrp-00.txt, work in progress. 7. Acknowledgements The authors wish to thank John Loughney, Lyndon Ong, and Maureen Stillman and many others for their invaluable comments. 8. Authors' Addresses Randall R. Stewart 24 Burning Bush Trail. Crystal Lake, IL 60012 USA Phone: +1-815-477-2127 EMail: rrs@cisco.com Qiaobing Xie Motorola, Inc. 1501 W. Shure Drive, #2309 Arlington Heights, IL 60004 USA Phone: +1-847-632-3028 EMail: qxie1@email.mot.com Stewart, Xie [Page 24]