Network Working Group                                        M. Schwartz
Internet-Draft                                     Code On The Road, LLC
Expires: April 7, 2002                                   October 7, 2001


        The ANTACID Replication Service: Protocol and Algorithms
                   draft-schwartz-antacid-protocol-00

Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC 2026 except that the right to
   produce derivative works is not granted.  (If this document becomes
   part of an IETF working group activity, then it will be brought into
   full compliance with Section 10 of RFC 2026.)

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on April 7, 2002.

Copyright Notice

   Copyright (C) The Internet Society (2001).  All Rights Reserved.

Abstract

   This memo specifies the protocol and algorithms of the ANTACID
   Replication Service, designed to replicate hierarchically named
   repositories of XML documents for business-critical, internetworked
   applications.

   ASCII and HTML versions of this document are available at
   http://www.codeontheroad.com/papers/draft-schwartz-antacid-
   protocol.txt and http://www.codeontheroad.com/papers/draft-schwartz-
   antacid-protocol.html, respectively.


Schwartz                  Expires April 7, 2002                 [Page 1]

Internet-Draft              ANTACID Protocol                October 2001


Table of Contents

   1.      Introduction . . . . . . . . . . . . . . . . . . . . . . .  4
   2.      Walk-Through of Example ARS Interactions . . . . . . . . .  5
   2.1     ARS Commit-and-Propagate Protocol (ars-c)  . . . . . . . .  7
   2.2     ARS Submission-Propagation Protocol (ars-s)  . . . . . . . 12
   2.3     ARS Encoding Negotiation Protocol (ars-e)  . . . . . . . . 18
   2.4     ARS Service Implementing All Three Sub-Protocols . . . . . 19
   3.      ARS Syntax and Semantics . . . . . . . . . . . . . . . . . 24
   3.1     Identifiers, Data Representation, and Error Signaling  . . 24
   3.1.1   ARS Server Identification  . . . . . . . . . . . . . . . . 24
   3.1.2   Sequence Numbers . . . . . . . . . . . . . . . . . . . . . 26
   3.1.3   DataWithOps Encoding . . . . . . . . . . . . . . . . . . . 27
   3.1.4   ARSError . . . . . . . . . . . . . . . . . . . . . . . . . 28
   3.2     ARS Message Semantics  . . . . . . . . . . . . . . . . . . 32
   3.2.1   ARS Commit-and-Propagate Protocol (ars-c)  . . . . . . . . 33
   3.2.1.1 SubmitUpdate . . . . . . . . . . . . . . . . . . . . . . . 33
   3.2.1.2 SubmittedUpdateResultNotification  . . . . . . . . . . . . 36
   3.2.1.3 PushCommittedUpdates . . . . . . . . . . . . . . . . . . . 36
   3.2.1.4 PullCommittedUpdates . . . . . . . . . . . . . . . . . . . 37
   3.3     ARS Submission-Propagation Protocol (ars-s)  . . . . . . . 38
   3.3.1   PropagateSubmittedUpdate . . . . . . . . . . . . . . . . . 38
   3.3.2   SubmittedUpdateResultNotification Extended Semantics . . . 39
   3.4     ARS Encoding Negotiation Protocol (ars-e)  . . . . . . . . 39
   4.      Algorithms and Implementation Details  . . . . . . . . . . 41
   4.1     ARS Meta-Data Management . . . . . . . . . . . . . . . . . 41
   4.1.1   Document State . . . . . . . . . . . . . . . . . . . . . . 41
   4.1.2   Committed Update State Management  . . . . . . . . . . . . 41
   4.1.3   Committed Update Collapsing  . . . . . . . . . . . . . . . 42
   4.1.4   Per Server Sequence Number State . . . . . . . . . . . . . 45
   4.1.5   Locking  . . . . . . . . . . . . . . . . . . . . . . . . . 46
   4.1.6   Server Configuration Data  . . . . . . . . . . . . . . . . 47
   4.1.6.1 Replication Topology (Normative) . . . . . . . . . . . . . 47
   4.1.6.2 Local Implementation Settings (Non-Normative)  . . . . . . 50
   4.2     Protocol Processing  . . . . . . . . . . . . . . . . . . . 51
   4.2.1   ARS Commit-and-Propagate Protocol (ars-c)  . . . . . . . . 51
   4.2.1.1 SubmitUpdate Processing  . . . . . . . . . . . . . . . . . 52
   4.2.1.2 SubmittedUpdateResultNotification Processing . . . . . . . 55
   4.2.1.3 PushCommittedUpdates Processing  . . . . . . . . . . . . . 56
   4.2.1.4 PullCommittedUpdates Processing  . . . . . . . . . . . . . 56
   4.2.1.5 Submitted Update Collapsing for Infrequently Synchronized
           Peers  . . . . . . . . . . . . . . . . . . . . . . . . . . 57
   4.2.2   ARS Submission-Propagation Protocol (ars-s) Processing . . 58
   4.2.2.1 Non-Primary SubmitUpdate Processing  . . . . . . . . . . . 58
   4.2.2.2 Non-Primary PropagateSubmittedUpdate Processing  . . . . . 59
   4.2.2.3 Primary PropagateSubmittedUpdate Processing  . . . . . . . 61
   4.2.2.4 PushCommittedUpdates and PullCommittedUpdates Scheduling . 62
   4.2.2.5 PullCommittedUpdates Synchronization . . . . . . . . . . . 63


Schwartz                  Expires April 7, 2002                 [Page 2]

Internet-Draft              ANTACID Protocol                October 2001


   4.2.2.6 SubmittedUpdateResultNotification Synchronization  . . . . 64
   4.2.2.7 Submitted Update Reordering Details  . . . . . . . . . . . 66
   4.2.3   ARS Encoding Negotiation Protocol (ars-e) Processing . . . 68
   4.3     Example State Transition Diagrams  . . . . . . . . . . . . 69
   5.      Security Considerations  . . . . . . . . . . . . . . . . . 72
           Author's Address . . . . . . . . . . . . . . . . . . . . . 73
   A.      Acknowledgements . . . . . . . . . . . . . . . . . . . . . 74
   B.      Future Enhancements and Investigations . . . . . . . . . . 75
   C.      ANTACID Replication Service Registration . . . . . . . . . 77
   D.      ARS Top-Level DTD  . . . . . . . . . . . . . . . . . . . . 78
   E.      ars-c DTD  . . . . . . . . . . . . . . . . . . . . . . . . 80
   F.      ars-s DTD  . . . . . . . . . . . . . . . . . . . . . . . . 86
   G.      ars-e DTD  . . . . . . . . . . . . . . . . . . . . . . . . 88
   H.      ARS Topology Configuration DTD . . . . . . . . . . . . . . 90
   I.      Current Encodings and Registration Procedures  . . . . . . 93
   I.1     Currently Defined Encodings  . . . . . . . . . . . . . . . 93
           References . . . . . . . . . . . . . . . . . . . . . . . . 73
   I.2     Encoding Registration Procedures . . . . . . . . . . . . . 96
           Full Copyright Statement . . . . . . . . . . . . . . . . . 97


Schwartz                  Expires April 7, 2002                 [Page 3]

Internet-Draft              ANTACID Protocol                October 2001


1. Introduction

   This document specifies the protocol and algorithms used to implement
   the ANTACID Replication Service (ARS).  Readers are referred to [1]
   for a motivation of the problem addressed, the replication
   architecture, and terminology used in the current document.  The
   current document assumes the reader has already read that document,
   and that the reader is familiar with XML [2].  Moreover, since the
   ARS protocol is defined in terms of a BEEP [3] profile, readers are
   referred to that document for background.

   We begin (Section 2) in by walking through example ARS interactions,
   to give the reader a concrete flavor for how the protocol works.  We
   then (Section 3) present the ARS syntax and semantics, and then
   (Section 4) provide algorithms and implementation details.


Schwartz                  Expires April 7, 2002                 [Page 4]

Internet-Draft              ANTACID Protocol                October 2001


2. Walk-Through of Example ARS Interactions

    ARS updates follow a simple pattern, with Submit Sequence Numbers
   (SSN's) assigned by each submission server flowing up the DAG and
   Commit Sequence Numbers (CSN's) assigned by the primary flowing back
   down after a submission has committed at the primary.  As an example,
   consider the DAG illustrated below:


                                svr3
                               |    |
                              \|/  \|/
                             svr2<-svr4
                               |    |
                              \|/  \|/
                             svr1  svr5


   In this diagram, arc directions indicate the "is a downstream server
   from" relationship.  Thus, svr3 is the zone primary, svr1, svr2,
   svr4, and svr5 are non-primaries, svr2 and svr4 are downstream from
   svr3, svr2 is downstream from svr4, svr1 is downstream from svr2, and
   server 5 is downstream from server 4.

   Given this DAG, an update submitted at svr1 might be assigned SSN 1
   by svr1, and then be propagated by svr1 to svr2, and then from svr2
   to svr4, and then from svr4 to svr3.  svr3 serializes the update
   submission, commits the update, and assigns it a CSN of, say, 2.  At
   this point the committed update propagates back down the DAG, for
   example first to svr4 and svr2 (from svr3), and then in parallel from
   svr4 to svr5 and from svr2 to svr1.  As this example illustrates, the
   path by which committed updates propagate down the DAG may differ
   from the path by which submissions are propagated up the DAG.

   This DAG represents a set of ARS servers that implement ars-c as well
   as ars-s, which supports updates being submitted to non-primary
   servers and propagated up to the primary.  In an ARS service that
   implements only ars-c all updates must be submitted to the primary.
   For that case, the only propagation that occurs is when committed
   updates propagate from the primary to all downstream servers.

   Given this basic understanding of how submitted and committed updates
   propagate across the DAG, we now walk through examples of the
   protocol content exchanged between a set of ARS peers.  We start with
   a server that implements the minimal required ARS protocol elements
   (ars-c).  We then show the additional functionality of ars-s and ars-
   e, each in turn.


Schwartz                  Expires April 7, 2002                 [Page 5]

Internet-Draft              ANTACID Protocol                October 2001


    The examples in this section are based on a pair of servers
   configured as follows:


                      /->svr1 (primary)
                     /    |
               client     |
                     \   \|/
                      \->svr2 (non-primary)


   In some of the examples the client makes requests of the primary.  In
   other examples the client makes requests of the non-primary.  Here
   the DAG between the servers is just a single edge, but in general
   there could be many servers upstream and downstream from each server
   (except for the primary, which never has upstream servers unless it
   is also a non-primary for other zones).

    In the examples we list the communication endpoints flush left on
   the page, with the transmitted content indented, like so:


   client->svr1:
       <ARSRequest ReqNum='1'>
           ...
       </ARSRequest>


   The "client->svr1:" above is only for labeling the flow as going from
   the client to svr1, and is not part of the transmitted ARS content.
   The indented text is the transmitted ARS content.

   The examples here all use the Blocks [4] name space.


Schwartz                  Expires April 7, 2002                 [Page 6]

Internet-Draft              ANTACID Protocol                October 2001


2.1 ARS Commit-and-Propagate Protocol (ars-c)

    For the simplest case, the interaction begins when the client
   performs a "SubmitUpdate" request to the zone primary:


   client->svr1:
       <ARSRequest ReqNum='1'>
           <SubmitUpdate NotifyHost='client.example.com'
           NotifyPort='10201'>
               <UpdateGroup>
                   <DataWithOps>
                       <DatumAndOp Name='blocks:test.schwartz.blk1'
                       CSN='0' Action='create'>
                           <block name='test.schwartz.blk1' csn='0'>
                               ...
                           </block>
                       </DatumAndOp>
                       <DatumAndOp Name='blocks:test.schwartz.blk2'
                       CSN='0' Action='create'>
                           <block name='test.schwartz.blk2' csn='0'>
                               ...
                           </block>
                       </DatumAndOp>
                   </DataWithOps>
               </UpdateGroup>
           </SubmitUpdate>
       </ARSRequest>


   Here the client generates and passes a request number that can be
   used to correlate the response with the request, to support
   concurrent requests.  The client makes a SubmitUpdate request,
   passing the host name and port number to which notification should be
   sent when the update completes or fails, as well as a single
   UpdateGroup containing all updates to be performed.  The request uses
   the DataWithOps encoding, since in this basic example the ARS client-
   server pair do not support any other encodings.  The DataWithOps
   encoding contains a set of (in this case 2) documents, each of which
   has an associated operation (in this case "create") to be performed
   and attributes containing the document's name and CSN.  Because the
   data in this example come from the Blocks name space, the name and
   CSN information are also contained as attributes within the Blocks.
   This redundancy happens because Blocks require these attributes to be
   present in the root XML element, as additional structure beyond that
   imposed by ARS.  Since ARS only assumes documents and not the more
   constrained structure of Blocks, the name and CSN need to be included
   in the ARS encoding.  Finally, note that because the documents have


Schwartz                  Expires April 7, 2002                 [Page 7]

Internet-Draft              ANTACID Protocol                October 2001


   not yet been created in the datastore, the CSN is not meaningful.
   The CSN value is only meaningful once the content has been created in
   the datastore.

    The server responds as follows:


   svr1->client:
       <ARSResponse ReqNum='1'>
           <ARSAnswer>
               <GlobalSubmitID SubmisSvrHost='svr1.example.com'
               SubmisSvrPort='10201' SubmisSvrIncarn='979428854'
               ssn='1'>
           </ARSAnswer>
       </ARSResponse>


   The ARSAnswer element contains the server's host name, port,
   incarnation stamp, and a 64 bit Submit Sequence Number assigned by
   the submission server.  Together, these four pieces of data
   constitute an identification of the update submission that is
   globally unique for all time, called the GlobalSubmitID.

   This ARSAnswer indicates that the submission was successfully
   received, and that the server has entered into the client-server
   promise described in [1].  If an error had occurred the response
   would have contained a ARSError (Section 3.1.4) instead of an
   ARSAnswer.

    At some later time, the server performs a
   SubmittedUpdateResultNotification request to notify the client that
   the update has been successfully committed, and the client
   acknowledges receipt of this notification:


   svr1->client:
       <ARSRequest ReqNum='5'>
           <SubmittedUpdateResultNotification
           SubmisSvrHost='svr1.example.com' SubmisSvrPort='10201'
           SubmisSvrIncarn='979428854' ssn='1' csn='2'
           ZoneTopNodeName='blocks:test.schwartz' />
       </ARSRequest>

   client->svr1:
       <ARSResponse ReqNum='5'>
           <ARSAnswer />
       </ARSResponse>


Schwartz                  Expires April 7, 2002                 [Page 8]

Internet-Draft              ANTACID Protocol                October 2001


   The ReqNum here is 5 because the server happens to have performed 4
   other requests before this one.  The
   SubmittedUpdateResultNotification element contains the four
   attributes that constitute the GlobalSubmitID, as well as two other
   attributes: the 64 bit CSN that was assigned by the primary when it
   committed this update, and the URI [5] of the top node in the zone
   within which this update occurred.  The URI in effect names the zone.
   It is needed because servers can handle multiple zones, and CSN's are
   allocated per zone.  Together, the URI and CSN constitute an
   identification of the update commit event that is globally unique for
   all time.

   This ARSAnswer indicates that the submission was successfully
   committed.  If it had failed the SubmittedUpdateResultNotification
   would have contained an ARSError element describing the error, and
   the CSN would have been 0.

    At a time determined by the local implementation's configuration
   settings, the primary performs a PushCommittedUpdates request to
   suggest to the non-primary that new committed updates are available
   to be pulled.  The non-primary acknowledges this PushCommittedUpdates
   with an ARSResponse:


   svr1->svr2:
       <ARSRequest ReqNum='6'>
           <PushCommittedUpdates UpstreamHost='svr1.example.com'
           UpstreamPort='10201' />
       </ARSRequest>

   svr2->svr1:
       <ARSResponse ReqNum='6'>
           <ARSAnswer />
       </ARSResponse>


   The PushCommittedUpdates request specifies the host and port from
   which the request was initiated.  This is done rather than relying on
   looking up this information from the underlying transport service
   (BEEP) because the transmission could arrive on a different port than
   the advertised port on which the server accepts requests.  In fact, a
   local implementation may chose to split receiving and sending onto
   separate machines to distribute load and failure modes, similar to
   how some commercial email services split processing for POP [6] and
   SMTP [7].


Schwartz                  Expires April 7, 2002                 [Page 9]

Internet-Draft              ANTACID Protocol                October 2001


    At this point, the non-primary performs a PullCommittedUpdates
   request to request newly available updates:


   svr2->svr1:
       <ARSRequest ReqNum='4'>
           <PullCommittedUpdates DownstreamHost='svr2.example.com'
           DownstreamPort='10201'>
               <ReplState>
                   <TopNodeOfZoneToReplicate>
                       blocks:test.schwartz
                   </TopNodeOfZoneToReplicate>
                   <LastSeenCSN>
                       0
                   </LastSeenCSN>
               </ReplState>
           </PullCommittedUpdates>
       </ARSRequest>


   Similar to the PushCommittedUpdates request, the PullCommittedUpdates
   request specifies the host and port from which the request was
   initiated.  The PullCommittedUpdates request names the URI of the
   zone for which it wants updates, and the last CSN it has seen for
   that zone.  By specifying a LastSeenCSN of 0, the non-primary is
   requesting the entire zone content (the first valid CSN is defined to
   be 1).


Schwartz                  Expires April 7, 2002                [Page 10]

Internet-Draft              ANTACID Protocol                October 2001


    The primary responds with the requested updates:


   svr1->svr2:
       <ARSResponse ReqNum='4'>
           <ARSAnswer>
               <UpdateGroup>
                   <DataWithOps>
                       <DatumAndOp Name='blocks:test.schwartz.blk1'
                       CSN='2' Action='write'>
                           <block name='test.schwartz.blk1' csn='2'>
                               ...
                           </block>
                       </DatumAndOp>
                       <DatumAndOp Name='blocks:test.schwartz.blk2'
                       CSN='2' Action='write'>
                           <block name='test.schwartz.blk2' csn='2'>
                               ...
                           </block>
                       </DatumAndOp>
                   </DataWithOps>
               </UpdateGroup>
           </ARSAnswer>
       </ARSResponse>


   Note that the documents have their CSN's set, per the value assigned
   by the primary at commit time.  Also, the operations sent are "write"
   (rather than the "create" specified when the update was submitted) in
   order to ensure that the operation succeeds in the case where update
   collapsing (see [1]) is performed.  Collapsing will be discussed in
   more detail later.


Schwartz                  Expires April 7, 2002                [Page 11]

Internet-Draft              ANTACID Protocol                October 2001


2.2 ARS Submission-Propagation Protocol (ars-s)

    We begin with the client submitting an update request to the non-
   primary server:


   client->svr2:
       <ARSRequest ReqNum='1'>
           <SubmitUpdate NotifyHost='client.example.com'
           NotifyPort='10201'>
               <UpdateGroup>
                   <DataWithOps>
                       <DatumAndOp Name='blocks:test.schwartz.blk1'
                       CSN='0' Action='create'>
                           <block name='test.schwartz.blk1' csn='0'>
                               ...
                           </block>
                       </DatumAndOp>
                       <DatumAndOp Name='blocks:test.schwartz.blk2'
                       CSN='0' Action='create'>
                           <block name='test.schwartz.blk2' csn='0'>
                               ...
                           </block>
                       </DatumAndOp>
                   </DataWithOps>
               </UpdateGroup>
           </SubmitUpdate>
       </ARSRequest>


   The content of this request is identical to that discussed in the
   earlier example (Section 2.1).  Only the destination of the request
   has changed.


Schwartz                  Expires April 7, 2002                [Page 12]

Internet-Draft              ANTACID Protocol                October 2001


    The server responds, noting that it has successfully received the
   request:


   svr2->client:
       <ARSResponse ReqNum='1'>
           <ARSAnswer>
               <GlobalSubmitID SubmisSvrHost='svr2.example.com'
               SubmisSvrPort='10201' SubmisSvrIncarn='979428854'
               ssn='1'>
           </ARSAnswer>
       </ARSResponse>


   Again the content is identical to that shown in the earlier example
   (Section 2.1), but with a different source (and SubmisSvrHost) for
   the response.


Schwartz                  Expires April 7, 2002                [Page 13]

Internet-Draft              ANTACID Protocol                October 2001


    At this point, the non-primary server relays the request by making a
   PropagateSubmittedUpdate request to the primary server:


   svr2->svr1:
       <ARSRequest ReqNum='3'>
           <PropagateSubmittedUpdate SubmisSvrHost='svr2.example.com'
           SubmisSvrPort='10201' SubmisSvrIncarn='979428854'
           ssn='1' NotifyHost='svr2.example.com'
           NotifyPort='10201'>
               <UpdateGroup>
                   <DataWithOps>
                       <DatumAndOp Name='blocks:test.schwartz.blk1'
                       CSN='2' Action='create'>
                           <block name='test.schwartz.blk1' csn='2'>
                               ...
                           </block>
                       </DatumAndOp>
                       <DatumAndOp Name='blocks:test.schwartz.blk2'
                       CSN='2' Action='create'>
                           <block name='test.schwartz.blk2' csn='2'>
                               ...
                           </block>
                       </DatumAndOp>
                   </DataWithOps>
               </UpdateGroup>
           </PropagateSubmittedUpdate>
       </ARSRequest>


   The PropagateSubmittedUpdate request contains the GlobalSubmitID of
   the request (indicating that the request was submitted at svr2), but
   has re-written the NotifyHost and NotifyPort to refer to svr2, so
   that it will find out when the request completes or fails.
   Otherwise, the content of the request is identical to what svr2
   received from the client.


Schwartz                  Expires April 7, 2002                [Page 14]

Internet-Draft              ANTACID Protocol                October 2001


   The primary then responds, acknowledging that it has successfully
   received the PropagateSubmittedUpdate request and has entered into
   the client-server promise, providing a chain of responsibility from
   client to svr2 to svr1:


   svr1->svr2:
       <ARSResponse ReqNum='3'>
           <ARSAnswer />
       </ARSResponse>


    At some later time, the primary commits the update and performs a
   SubmittedUpdateResultNotification to inform the non-primary that the
   request has completed successfully.  The non-primary acknowledges
   this SubmittedUpdateResultNotification with an ARSResponse:


   svr1->svr2:
       <ARSRequest ReqNum='1'>
           <SubmittedUpdateResultNotification
           SubmisSvrHost='svr2.example.com' SubmisSvrPort='10201'
           SubmisSvrIncarn='979428854' ssn='1' csn='2'
           ZoneTopNodeName='blocks:test.schwartz' />
       </ARSRequest>

   svr2->svr1:
       <ARSResponse ReqNum='1'>
           <ARSAnswer />
       </ARSResponse>


   Note that ars-s is re-using the SubmittedUpdateResultNotification
   element defined by ars-c, for informing a downstream server about the
   completion status of a pending update.


Schwartz                  Expires April 7, 2002                [Page 15]

Internet-Draft              ANTACID Protocol                October 2001


   At some later time, the primary performs a PushCommittedUpdates, the
   non-primary follows with a PullCommittedUpdates, and the primary
   responds with the requested updates:


   svr1->svr2:
       <ARSRequest ReqNum='2'>
           <PushCommittedUpdates UpstreamHost='svr1.example.com'
           UpstreamPort='10201' />
       </ARSRequest>

   svr2->svr1:
       <ARSResponse ReqNum='2'>
           <ARSAnswer />
       </ARSResponse>

   svr2->svr1:
       <ARSRequest ReqNum='4'>
           <PullCommittedUpdates DownstreamHost='svr2.example.com'
           DownstreamPort='10201'>
               <ReplState>
                   <TopNodeOfZoneToReplicate>
                       blocks:test.schwartz
                   </TopNodeOfZoneToReplicate>
                   <LastSeenCSN>
                       0
                   </LastSeenCSN>
               </ReplState>
           </PullCommittedUpdates>
       </ARSRequest>


Schwartz                  Expires April 7, 2002                [Page 16]

Internet-Draft              ANTACID Protocol                October 2001


   svr1->svr2:
       <ARSResponse ReqNum='4'>
           <ARSAnswer>
               <UpdateGroup>
                   <DataWithOps>
                       <DatumAndOp Name='blocks:test.schwartz.blk1'
                       CSN='2' Action='write'>
                           <block name='test.schwartz.blk1' csn='2'>
                               ...
                           </block>
                       </DatumAndOp>
                       <DatumAndOp Name='blocks:test.schwartz.blk2'
                       CSN='2' Action='write'>
                           <block name='test.schwartz.blk2' csn='2'>
                               ...
                           </block>
                       </DatumAndOp>
                   </DataWithOps>
               </UpdateGroup>
           </ARSAnswer>
       </ARSResponse>


    At this point, the non-primary performs a
   SubmittedUpdateResultNotification, to notify the client that its
   update submission has successfully committed, and the client
   acknowledges receipt of this notification:


   svr2->client:
       <ARSRequest ReqNum='5'>
           <SubmittedUpdateResultNotification
           SubmisSvrHost='svr2.example.com' SubmisSvrPort='10201'
           SubmisSvrIncarn='979428854' ssn='1' csn='2'
           ZoneTopNodeName='blocks:test.schwartz' />
       </ARSRequest>

   client->svr2:
       <ARSResponse ReqNum='5'>
           <ARSAnswer />
       </ARSResponse>


Schwartz                  Expires April 7, 2002                [Page 17]

Internet-Draft              ANTACID Protocol                October 2001


   Note that this SubmittedUpdateResultNotification indicates that the
   update has now committed at the non-primary.  This is important
   because it means the client can now interact with the non-primary
   copy and expect to see the committed update.  The client can
   correlate this response to the submission it had made based on the
   GlobalSubmitID information (host, port, incarnation stamp, and SSN)
   contained in the SubmittedUpdateResultNotification attributes.

2.3 ARS Encoding Negotiation Protocol (ars-e)

    ContentEncodingNegotiation can be performed between any pair of ARS
   peers, to determine if an expanded set of encodings is available
   beyond the default DataWithOps encoding.  As an example, the non-
   primary server might perform a ContentEncodingNegotiation with the
   primary as follows:


   svr2->svr1:
       <ARSRequest ReqNum='2'>
           <ContentEncodingNegotiation
           ZoneTopNodeName='blocks:test.schwartz' />
               <ContentEncodingsSupported>
                   <ContentEncodingName>
                       DataWithOps
                   </ContentEncodingName>
                   <ContentEncodingName>
                       AllZoneData
                   </ContentEncodingName>
                   <ContentEncodingName>
                       EllipsisNotation
                   </ContentEncodingName>
               </ContentEncodingsSupported>
           </ContentEncodingNegotiation>
       </ARSRequest>

   svr1->svr2:
       <ARSResponse ReqNum='2'>
           <ARSAnswer>
               <ContentEncodingsSupported>
                   <ContentEncodingName>
                       DataWithOps
                   </ContentEncodingName>
                   <ContentEncodingName>
                       AllZoneData
                   </ContentEncodingName>
               </ContentEncodingsSupported>
           </ARSAnswer>
       </ARSResponse>


Schwartz                  Expires April 7, 2002                [Page 18]

Internet-Draft              ANTACID Protocol                October 2001


   The ContentEncodingNegotiation element contains a ZoneTopNodeName
   attribute specifying the URI of the top node in the zone to which
   this encoding is to apply, because the set of encodings supported may
   vary by zone.  The ContentEncodingNegotiation also contains one or
   more ContentEncodingName elements corresponding to content encodings
   the initiator supports.  The responder sends back the subset of the
   requested encodings that it supports.

2.4 ARS Service Implementing All Three Sub-Protocols

    Below we put together all of the protocol pieces discussed in the
   last three sub-sections, showing how a system supporting all three
   ARS sub-protocols might function:


   client->svr2:
       <ARSRequest ReqNum='1'>
           <SubmitUpdate NotifyHost='client.example.com'
           NotifyPort='10201'>
               <UpdateGroup>
                   <DataWithOps>
                       <DatumAndOp Name='blocks:test.schwartz.blk1'
                       CSN='0' Action='create'>
                           <block name='test.schwartz.blk1' csn='0'>
                               ...
                           </block>
                       </DatumAndOp>
                       <DatumAndOp Name='blocks:test.schwartz.blk2'
                       CSN='0' Action='create'>
                           <block name='test.schwartz.blk2' csn='0'>
                               ...
                           </block>
                       </DatumAndOp>
                   </DataWithOps>
               </UpdateGroup>
           </SubmitUpdate>
       </ARSRequest>

   svr2->client:
       <ARSResponse ReqNum='1'>
           <ARSAnswer>
               <GlobalSubmitID SubmisSvrHost='svr2.example.com'
               SubmisSvrPort='10201' SubmisSvrIncarn='979428854'
               ssn='1'>
           </ARSAnswer>
       </ARSResponse>


Schwartz                  Expires April 7, 2002                [Page 19]

Internet-Draft              ANTACID Protocol                October 2001


   svr2->svr1:
       <ARSRequest ReqNum='1'>
           <ContentEncodingNegotiation
           ZoneTopNodeName='blocks:test.schwartz' />
               <ContentEncodingsSupported>
                   <ContentEncodingName>
                       DataWithOps
                   </ContentEncodingName>
                   <ContentEncodingName>
                       AllZoneData
                   </ContentEncodingName>
                   <ContentEncodingName>
                       EllipsisNotation
                   </ContentEncodingName>
               </ContentEncodingsSupported>
           </ContentEncodingNegotiation>
       </ARSRequest>

   svr1->svr2:
       <ARSResponse ReqNum='1'>
           <ARSAnswer>
               <ContentEncodingsSupported>
                   <ContentEncodingName>
                       DataWithOps
                   </ContentEncodingName>
                   <ContentEncodingName>
                       AllZoneData
                   </ContentEncodingName>
               </ContentEncodingsSupported>
           </ARSAnswer>
       </ARSResponse>


Schwartz                  Expires April 7, 2002                [Page 20]

Internet-Draft              ANTACID Protocol                October 2001


   svr2->svr1:
       <ARSRequest ReqNum='2'>
           <PropagateSubmittedUpdate SubmisSvrHost='svr2.example.com'
           SubmisSvrPort='10201' SubmisSvrIncarn='979428854'
           ssn='1' NotifyHost='svr2.example.com'
           NotifyPort='10201'>
               <UpdateGroup>
                   <DataWithOps>
                       <DatumAndOp Name='blocks:test.schwartz.blk1'
                       CSN='0' Action='create'>
                           <block name='test.schwartz.blk1' csn='0'>
                               ...
                           </block>
                       </DatumAndOp>
                       <DatumAndOp Name='blocks:test.schwartz.blk2'
                       CSN='0' Action='create'>
                           <block name='test.schwartz.blk2' csn='0'>
                               ...
                           </block>
                       </DatumAndOp>
                   </DataWithOps>
               </UpdateGroup>
           </PropagateSubmittedUpdate>
       </ARSRequest>

   svr1->svr2:
       <ARSResponse ReqNum='2'>
           <ARSAnswer />
       </ARSResponse>

   svr1->svr2:
       <ARSRequest ReqNum='1'>
           <PushCommittedUpdates UpstreamHost='svr1.example.com'
           UpstreamPort='10201' />
       </ARSRequest>

   svr2->svr1:
       <ARSResponse ReqNum='1'>
           <ARSAnswer />
       </ARSResponse>

   svr1->svr2:
       <ARSRequest ReqNum='2'>
           <SubmittedUpdateResultNotification
           SubmisSvrHost='svr2.example.com' SubmisSvrPort='10201'
           SubmisSvrIncarn='979428854' ssn='1' csn='2'
           ZoneTopNodeName='blocks:test.schwartz' />
       </ARSRequest>


Schwartz                  Expires April 7, 2002                [Page 21]

Internet-Draft              ANTACID Protocol                October 2001


   svr2->svr1:
       <ARSRequest ReqNum='3'>
           <PullCommittedUpdates>
               <ReplState>
                   <TopNodeOfZoneToReplicate>
                       blocks:test.schwartz
                   </TopNodeOfZoneToReplicate>
                   <LastSeenCSN>
                       0
                   </LastSeenCSN>
               </ReplState>
           </PullCommittedUpdates>
       </ARSRequest>

   svr1->svr2:
       <ARSResponse ReqNum='3'>
           <ARSAnswer>
               <UpdateGroup>
                   <AllZoneData>
                       <DatumAndOp Name='blocks:test.schwartz.blk1'
                       CSN='2' Action='write'>
                           <block name='test.schwartz.blk1' csn='2'>
                               ...
                           </block>
                       </DatumAndOp>
                       <DatumAndOp Name='blocks:test.schwartz.blk2'
                       CSN='2' Action='write'>
                           <block name='test.schwartz.blk2' csn='2'>
                               ...
                           </block>
                       </DatumAndOp>
                   </AllZoneData>
               </UpdateGroup>
           </ARSAnswer>
       </ARSResponse>

   svr2->svr1:
       <ARSResponse ReqNum='2'>
           <ARSAnswer />
       </ARSResponse>

   svr2->client:
       <ARSRequest ReqNum='4'>
           <SubmittedUpdateResultNotification
           SubmisSvrHost='svr2.example.com' SubmisSvrPort='10201'
           SubmisSvrIncarn='979428854' Submit SeqNum='1' csn='2'
           ZoneTopNodeName='blocks:test.schwartz' />
       </ARSRequest>


Schwartz                  Expires April 7, 2002                [Page 22]

Internet-Draft              ANTACID Protocol                October 2001


   client->svr2:
       <ARSResponse ReqNum='4'>
           <ARSAnswer />
       </ARSResponse>


   Several subtleties of the protocol can be observed from this example:

   o  When it first connects to the primary, the non-primary performs a
      ContentEncodingNegotiation, and finds that the primary supports
      fewer content encodings than it does.

   o  The non-primary uses the DataWithOps for propagating the update
      submission, but the primary uses a different encoding
      (AllZoneData) to propagate the committed updates.  The AllZoneData
      encoding is used because the non-primary's PullCommittedUpdates
      request asked for all updates performed on the zone
      (LastSeenCSN=0).  (The use of the AllZoneData encoding is
      discussed in more detail later (Appendix I.1).)

   o  The primary happened to perform a PushCommittedUpdates to inform
      the non-primary that new committed updates are available BEFORE
      the primary performs the SubmittedUpdateResultNotification for
      this update.  This can happen because these two types of messages
      are asynchronous and decoupled from one another, for reasons that
      will be discussed later (Section 4).  For this reason, the non-
      primary cannot perform the SubmittedUpdateResultNotification to
      the client until it has received the
      SubmittedUpdateResultNotification from the primary AND performed
      the PullCommittedUpdates request (so that it knows the CSN
      assigned corresponding to the non-primary's assigned SSN).

   o  The non-primary's PullCommittedUpdates request overlaps with the
      non-primary's receipt of the SubmittedUpdateResultNotification
      from the primary.  This can happen because, again, result
      notification and the scheduling of
      PushCommittedUpdates/PullCommittedUpdates requests are
      asynchronous processes.

   o  Once both the SubmittedUpdateResultNotification and the
      PullCommittedUpdates requests have completed, the non-primary
      detects that it is now time to notify the client of the committed
      update's success -- which it does, providing both the original
      GlobalSubmitID that it had assigned and the CSN that the primary
      assigned for this update.


Schwartz                  Expires April 7, 2002                [Page 23]

Internet-Draft              ANTACID Protocol                October 2001


3. ARS Syntax and Semantics

   In this section we present the ARS syntax and semantics.  We begin
   with how ARS identifies and encodes information within its messages:
   server identification, submitted and committed update sequence
   numbers, default data encodings, and error signaling between ARS
   peers.  We then describe the structure and meaning of messages
   exchanged between ARS peers.

3.1 Identifiers, Data Representation, and Error Signaling

3.1.1 ARS Server Identification

   Each ARS server has a global server identifier (GlobalServerID),
   which consists of a Domain Name System (DNS [8]) name, server
   incarnation stamp, and port number.  The GlobalServerID must be
   unique for all time.  If the server moves to a machine with a
   different DNS name, its GlobalServerID changes.  A level of naming
   indirection can be used to minimize operational problems from this
   (e.g., a DNS CNAME called ars.example.com that points to
   host3.example.com).

   A GlobalServerID-identified server must never use the same SSN for
   two different update submissions.  The incarnation stamp provides a
   way for a server that loses track of its last assigned SSN (e.g., due
   to a disk crash) to assign a new incarnation stamp and restart its
   SSN allocation sequence.  If not for the incarnation stamp, a server
   losing its SSN state would be forced to move to a different host name
   or port number, which would be an ARS peer-visible change.  Note that
   ARS peers contact each other using only the host and port
   information.  The incarnation stamp is only used as part of
   GlobalServerID's, which in turn provide a key for looking up
   replication state (such as the last seen SSN from a particular
   server).

   Note that if a new server incarnation is established, no ordering
   constraints are defined with respect to the previous server
   incarnation.  For example, an update submitted to the newly
   incarnated server might be serialized before an update that had been
   submitted chronologically earlier at the previous server incarnation.


Schwartz                  Expires April 7, 2002                [Page 24]

Internet-Draft              ANTACID Protocol                October 2001


   At present there is no recovery mechanism if a primary server loses
   track of its last assigned CSN.  Primary servers must therefore be
   run with more failure-resilient technology than non-primary servers -
   - for example using RAID-5 plus hot backups.  Note that an
   incarnation stamp approach would be problematic for primary servers
   because it would mean that updates committed after server re-
   incarnation would have no defined serialization relationship with
   those committed before re-incarnation, which in turn violates
   convergent consistency requirements.

   The incarnation stamp is a 64 bit number generated from the time-of-
   day clock on the server for which the incarnation stamp is being
   generated There is no clock synchronization requirement, since the
   stamp for any particular server is always generated by a single
   machine.  Nor is there a requirement that the time stamp be formed
   according to any particular clock format (e.g., the UNIX seconds-
   since-midnight-1970 epoch -- although the examples in this document
   use that format).  The only requirement is that a newly generated
   incarnation stamp must be at least one greater than the previously
   assigned incarnation stamp for that server.

   The reason for using a timestamp rather than a simple counter is that
   using a timestamp reduces the chances for an administrative error
   that would assign an incarnation number that had already been
   assigned.  In particular, the only state needed to generate a new
   incarnation stamp is the current time-of-day clock, which is readily
   available without access to any previous replication server state
   (which may have been completely destroyed by a disk crash).

   Incarnation stamp 0 is defined to be invalid, and thus can be used by
   the server implementation as a pre-initialized value to ensure a
   valid incarnation stamp has been received during later processing.


Schwartz                  Expires April 7, 2002                [Page 25]

Internet-Draft              ANTACID Protocol                October 2001


3.1.2 Sequence Numbers

   ARS uses 64 bit unsigned integer sequence numbers to provide unique-
   for-all-time identification of submitted and committed updates being
   processed by individual servers.  For example, this counter size
   would allow one million updates per second to a particular zone for
   585,000 years without wrapping.  There are two types of sequence
   numbers:

   1.  SSN: used by ars-s, the Submit Sequence Number (SSN) is allocated
       per submission server per zone to serialize all update
       submissions to a zone/server pair.  The SSN plus GlobalServerID
       constitutes a GlobalSubmitID that uniquely identifies a
       submission for all time.  The SSN imposes a total ordering over
       all updates submitted at that server, and a partial ordering over
       all updates globally.

   2.  CSN: used by ars-c, the Commit Sequence Number (CSN) is allocated
       per update per zone by the zone primary server after an entire
       update submission has been received and checked for various
       problems (discussed below).  Each successfully committed
       UpdateGroup is assigned a CSN (the value for which is
       subsequently associated with all documents in that UpdateGroup),
       which in turn serializes all update submissions to a zone so that
       updates are committed in the same order globally.  More formally,
       the CSN imposes a global ordering on all updates that respects
       the partial orderings imposed by the SSN's from all submission
       servers for the zone.

   Note that there is no need for logical clocks [9] for sequence
   numbers because updates are not applied at database replicas until
   they have been serialized at the primary.  In fact, logical clocks
   must not be used because that would cause gaps in the SSN sequence,
   which would appear to the primary as missing update submissions.

   In the case of an UpdateGroup a single CSN must be assigned to the
   entire update (rather than one CSN per document within the update
   submission).


Schwartz                  Expires April 7, 2002                [Page 26]

Internet-Draft              ANTACID Protocol                October 2001


   Sequence number 0 (for both SSN's and CSN's) is defined to be
   invalid.  It is used in three cases:

   1.  as the value of the CSN field in the
       SubmittedUpdateResultNotification response for a failed update;

   2.  for documents that are not replicated (e.g., if local state
       information about the replication system is stored in a Blocks
       datastore, the CSN values for each of those documents should be
       0); and,

   3.  as the value of LastSeenCSN when requesting an entire zone in the
       PullCommittedUpdates request.

   CSN 1 is defined to be the first valid Commit Sequence Number, and is
   used only for the case of a data item that lacks a current CSN
   attribute (i.e., CSN value 1 is the value used as the default for
   this IMPLIED attribute).  The first CSN assigned by an ARS server in
   response to a successfully committed update is 2.  This definition is
   specifically used to allow a datastore not previously replicated by
   ARS to be replicated without requiring a special tool to add CSN's
   (see the SubmitUpdate Processing section (Section 4.2.1.1)).
   Instead, ARS interprets a missing CSN attribute at '1', in effect
   treating all previous updates applied to a non-replicated datastore
   as being rolled up into a starting state with CSN=1.  From then on,
   ARS assigns CSN's for successfully committed updates starting at CSN
   value 2.

   When a zone is divided/delegated, the newly created zone initializes
   its CSN to be the highest CSN value set from the zone from which it
   has been delegated.  Doing this (rather than restarting the counting
   sequence) preserves the monotonicity of CSN's and avoids the need for
   renumbering sequence numbers assigned to documents within the new
   zone.  The original zone also continues allocating CSN's from this
   high-water mark CSN.  Note that once a zone is delegated, the fact
   that the original and new zone have the same CSN implies nothing
   about the relative orderings of updates applied in each.  ARS defines
   no ordering of updates across zones.

3.1.3 DataWithOps Encoding

   ARS requires all clients and servers to support the DataWithOps
   encoding.  DataWithOps is used by ARS servers that do not support the
   ars-e sub-protocol.  It is also used in cases where ars-e is
   supported but has not been performed between a pair of ARS peers.


Schwartz                  Expires April 7, 2002                [Page 27]

Internet-Draft              ANTACID Protocol                October 2001


   Each DataWithOps element contains zero or more DatumAndOp elements
   describing a set of update operations to be performed, such that
   either all operations succeed or all operations fail (per the ANTACID
   semantics defined in [1]).  Each DatumAndOp element contains a set of
   attributes concerning the update to be performed and the content of
   the document being updated.  The attributes are:

   Name: the URI of the document to be updated;

   CSN: the CSN of the document to be updated;

   Action: one of:

      create: verifies that the documents do not exist in the datastore
         before creating them;

      write: creates or overwrites the documents in the datastore (the
         default);

      update: verifies that the documents exist in the datastore before
         overwriting them; or,

      delete: removes the documents from the datastore.


3.1.4 ARSError

   The ARSError element provides an error-signaling structure for
   exchanging ARS profile-specific errors, providing specific detail
   beyond BEEP error handling.  The ARSError element contains three
   attributes:

   1.  OccurredAtSvrHost specifies the DNS name or IP address of the
       server that flagged the error;

   2.  OccurredAtSvrPort specifies the port number of the server that
       flagged the error; and,

   3.  OccurredAtSvrIncarn specifies the incarnation stamp of the server
       that flagged the error.

   This can provide useful information when an update propagates up
   several hops in a DAG, with multiple choices at each hop.


Schwartz                  Expires April 7, 2002                [Page 28]

Internet-Draft              ANTACID Protocol                October 2001


   The ARSError element contains three elements:

   1.  ARSErrorCode, which must be filled in with values as enumerated
       below.  ARSErrorCode 0 is defined to be invalid.  It can be used
       by a server implementation as a pre-initialized value to ensure a
       valid code was received during later processing.

   2.  ARSErrorText, which must be filled in.

   3.  ARSErrorSpecificsText, which may be filled in to provide
       additional detail.  The error code enumeration below provides
       recommendations of what additional information should be filled
       in for the ARSErrorSpecificsText in cases where additional detail
       is warranted.

   Non-zero ARSErrorCode's use positional structure encoded into
   unsigned 32 bit numbers, as follows:


               First digit:
                    1: client problem
                    2: server problem
               Second digit:
                    1: service failure
                    2: service refusal
               Third digit:
                    1: security
                    2: timeout
                    3: mis-configuration
                    4: too expensive
                    5: implementation-specific failure
                    6: data conflict
                    7: protocol/format error
                    8: request for unimplemented feature
                    9: resource overload
                    0: other
               Fourth-Sixth digits: three-digit enumeration of errors.  For
               example, error code 114001 is a client problem that caused a
               service failure because the request was too expensive.


   The error codes listed below are referenced throughout this document.
   These error codes cover more failure conditions than those
   specifically mentioned in the protocol and algorithm discussions in
   this document, such as disk space exhaustion.  Moreover, a variety of
   local implementation failures are possible (such as data validity
   assertion failures built into the code), which also are represented
   in the ARSError list below.


Schwartz                  Expires April 7, 2002                [Page 29]

Internet-Draft              ANTACID Protocol                October 2001


   The currently defined ARSErrorCode's are:

   116001: Attempt to delete non-existent document.  [ErrorSpecificsText
      should specify the non-existent document URI.]

   116002: Attempt to update non-existent document.  [ErrorSpecificsText
      should specify the non-existent document URI.]

   117001: Missing URI in document store request.

   121001: Authentication failure.

   121002: Access denied.

   123001: Request was made to submission server that does not hold zone
      being requested.

   123002: Request was made to upstream server that does not hold zone
      being requested.

   123003: Attempt to update documents spanning zone boundaries within a
      single UpdateGroup.

   123004: Request to update data in unknown name space.

   126001: Write-write conflict detected.  [ErrorSpecificsText should
      show ID of server that last updated document before this conflict
      was detected.]

   126002: Request violates datastore operation semantics.
      [ErrorSpecificsText should specify more details.]

   126003: General datastore error.  [ErrorSpecificsText should specify
      more details.]

   127001: Malformed client-server ARS protocol transmission.
      [ErrorSpecificsText should show XML parser error output.]

   210001: Unable to propagate update submission to any upstream
      servers.  [ErrorSpecificsText should provide some details about
      how many attempts were made, over how long of a duration.]

   212001: Timeout at zone primary waiting for submitted update re-
      ordering.

   212002: Timeout while waiting for zone lock.


Schwartz                  Expires April 7, 2002                [Page 30]

Internet-Draft              ANTACID Protocol                October 2001


   212003: Timeout while waiting for upstream server to propagate
      update.

   212004: Timeout while trying to respond to request.

   213001: Content encoding from upstream server not understood.
      [ErrorSpecificsText should name the encoding.]

   213002: No appropriate content encoding was available for the
      requested operation.  [ErrorSpecificsText should name the server
      where the problem occurred, and the operation for which no content
      encoding could be found.]

   213003: Malformed ARS protocol transmission (client or server).
      [ErrorSpecificsText should describe parse error (note: used for
      cases where the underlying service can't tell whether it's a
      client-to-server or server-to-server ARS parsing error).]

   213004: General parsing error.  [ErrorSpecificsText should describe
      parse error (note: used for cases where the underlying service
      can't determine whether it's server-to-server parsing or config
      file parsing).]

   213005: BEEP connection attempt to remote ARS end point relayed on
      behalf of current request failed.  [ErrorSpecificsText should
      describe more detail about the nature of the failure.]

   219001: Server resource overload.  [ErrorSpecificsText should contain
      detail about what resource(s) overloaded.]

   223001: No content encodings available for full zone transfer.

   223002: PropagateSubmittedUpdate request received from server not
      configured as a downstream server.

   223003: PushCommittedUpdates request received from server not
      configured as an upstream server.

   223004: PullCommittedUpdates request received from server not
      configured as a downstream server.

   223005: Requested ARS sub-protocol not supported.

   223006: Update submission received at non-primary that does not
      support ars-s.

   225001: Implementation-specific failure.  [ErrorSpecificsText should
      provide detail.]


Schwartz                  Expires April 7, 2002                [Page 31]

Internet-Draft              ANTACID Protocol                October 2001


   226001: Duplicate update submission detected -- could be a server
      retransmitting after update submission has already been
      successfully received, or a server configuration loop.

   226002: Request for CSN before log truncation point.  Full zone
      transfer should be requested.  [ErrorSpecificsText should show CSN
      & truncation point.]

   227001: Malformed server-server ARS protocol transmission.
      [ErrorSpecificsText should show XML parser error output.]


3.2 ARS Message Semantics

   ARS consists of three sub-protocols, only the first of which must be
   implemented by all ARS servers: the Commit-and-Propagate Protocol
   (ars-c), the Submission-Propagation Protocol (ars-s), and the
   Encoding Negotiation Protocol (ars-e).  The protocol syntax for a
   server supporting any subset of these protocols is defined by a DTD
   whose contents are constructed based on the top-level definition and
   inclusion content (Appendix D).  Here the operations to be supported
   are defined in the "ARSREQUESTS" ENTITY, the DTD's for the supported
   sub-protocol(s) is/are included, and, if ars-e is not supported, the
   "UpdateGroup" ELEMENT is set to define the single required default
   encoding for all ARS servers (DataWithOps).

   The "ARSRequest" element contains a "ReqNum" attribute and one of a
   subset of the following elements, the subset being defined by the
   "ARSREQUESTS" ENTITY: a "SubmitUpdate" element, a
   "SubmittedUpdateResultNotification" element, a "PushCommittedUpdates"
   element, a "PullCommittedUpdates" element, a
   "ContentEncodingNegotiation" element, and a
   "PropagateSubmittedUpdate" element.

   The "ReqNum" attribute (an integer in the range 1..4294967295) is
   used to correlate "ARSRequest" elements sent by a BEEP peer acting in
   the client role with the "ARSResponse" elements sent by a BEEP peer
   acting in the server role.  Request number 0 is defined to be
   invalid, and thus can be used by the server implementation as a pre-
   initialized value to ensure a request was received during later
   processing.

   The semantics of each of the elements within the ARSRequest are
   defined in the following subsections.


Schwartz                  Expires April 7, 2002                [Page 32]

Internet-Draft              ANTACID Protocol                October 2001


3.2.1 ARS Commit-and-Propagate Protocol (ars-c)

   ars-c defines four request elements: SubmitUpdate,
   SubmittedUpdateResultNotification, PushCommittedUpdates, and
   PullCommittedUpdates.  For the time being we assume clients submit to
   the primary server; submissions to non-primary servers are discussed
   later (Section 3.3).  For the time being we also assume that all
   submitted and committed updates are transmitted between all ARS peers
   using the DataWithOps encoding.  The ability to support other
   encodings is discussed non-primary servers are discussed later
   (Section 3.4).

3.2.1.1 SubmitUpdate

   Clients submit groups of documents and their associated operation
   names to be performed in an ANTACID (see [1]) fashion using the
   SubmitUpdate request.

   The SubmitUpdate element contains three optional elements:

   NotifyHost specifies the DNS name or IP address to which asynchronous
      notification is to be sent after the commit fails or succeeds;

   NotifyPort specifies the port number for asynchronous notification;
      and,

   NotifyOkOnCurrentChannel specifies whether it is acceptable for the
      server to send notification on the same channel that was used for
      submitting the update, if that channel is still open at the time
      the notification is ready to be sent.  This flag allows the server
      to avoid the overhead of opening a new BEEP channel for updates
      that commit relatively quickly.  The flag is needed because it is
      possible that the submission arrives on a different host and port
      than that specified by NotifyHost and NotifyPort, and different
      applications may or may not want to allow notification to arrive
      on the original submission channel.  The default (IMPLIED) value
      is "no", meaning that the server must open a new channel for
      notification.

   If a NotifyHost is specified then a NotifyPort must also be included.
   If only one of these attributes is included the update must be
   rejected with an ARSError containing ARSErrorCode=127001.  The peer
   that receives notification may differ from the original submitting
   client, for example allowing a mobile client to perform update
   submissions and an always-connected server to receive the
   SubmittedUpdateResultNotification and convert it to an email message
   for the user to pick up later.


Schwartz                  Expires April 7, 2002                [Page 33]

Internet-Draft              ANTACID Protocol                October 2001


   If NotifyOkOnCurrentChannel='yes', then NotifyHost and NotifyPort
   must also be specified.  If NotifyOkOnCurrentChannel='yes' and
   NotifyHost or NotifyPort is not specified, the update must be
   rejected with an ARSError containing ARSErrorCode=127001.  The
   semantics when NotifyOkOnCurrentChannel='yes' are:

   o  If the submission channel is still open at the time notification
      is ready to be sent, the server may send notification on the
      submission channel or it may open a channel to the specified
      NotifyHost and NotifyPort and send the notification on that
      channel.

   o  If the submission channel is no longer open at the time
      notification is ready to be sent, the server must open a channel
      to the specified NotifyHost and NotifyPort and send the
      notification on that channel.  The server must not attempt to
      determine the host and port for the original channel and open a
      new connection to that host and port.

   The SubmitUpdate element also contains an UpdateGroup element.  The
   UpdateGroup contains one or more DataWithOps elements, structured as
   noted in the DataWithOps Encoding section (Section 3.1.3).  Although
   the DTD allows for zero or more DataWithOps, if zero elements are
   included in a SubmitUpdate request the update must be rejected with
   an ARSError containing ARSErrorCode=127001.  (The case of zero
   elements is used elsewhere in the protocol.)


Schwartz                  Expires April 7, 2002                [Page 34]

Internet-Draft              ANTACID Protocol                October 2001


    The response to a failed SubmitUpdate request contains an ARSError
   describing the failure.  For example, an update submission requesting
   deletion of a non-existent document might receive the a response as
   follows.


       <ARSRequest ReqNum='1'>
           <SubmittedUpdateResultNotification
           SubmisSvrHost='svr2.example.com' SubmisSvrPort='10201'
           SubmisSvrIncarn='979428854' ssn='1' csn='0'
           ZoneTopNodeName='blocks:test.schwartz'>
               <ARSError OccurredAtSvrHost='svr3.example.com'
               SubmisSvrPort='10201' OccurredAtSvrIncarn='979428854'>
                   <ARSErrorCode>
                       126002
                   </ARSErrorCode>
                   <ARSErrorText>
                       Request violates datastore operation semantics
                   </ARSErrorText>
                   <ARSErrorSpecificsText>
                       Request #1 [BlockNameAndStoreOp:
                       name=test.schwartz.blk01, StoreOp=delete] failed
                   </ARSErrorSpecificsText>
               </ARSError>
           </SubmittedUpdateResultNotification>
       </ARSRequest>


   The response to a successful SubmitUpdate request contains an
   ARSAnswer element, which in turn contains a GlobalSubmitID element.
   The GlobalSubmitID contains four attributes:

   SubmisSvrHost specifies the DNS name of the submission server (note
      that unlike some other parts of ARS, the GlobalSubmitID allows
      only DNS names, not IP addresses, in the host component);

   SubmisSvrPort specifies the port number of the submission server;

   SubmisSvrIncarn specifies the incarnation stamp of the submission
      server; and,

   SSN specifies the Submit Sequence Number assigned to this update
      submission.


Schwartz                  Expires April 7, 2002                [Page 35]

Internet-Draft              ANTACID Protocol                October 2001


   A success response to a SubmitUpdate request means that the server
   has accepted the update and will begin processing it at some time in
   the future.  If a client wishes to be informed of success/failure of
   the update commit operation it may request asynchronous notification,
   as noted earlier.

3.2.1.2 SubmittedUpdateResultNotification

   The SubmittedUpdateResultNotification element is used to notify the
   client of success/failure of its submitted update.  A
   SubmittedUpdateResultNotification is sent to the client when this
   status becomes known at the submission server (as opposed to when the
   update has committed at the primary).

   A SubmittedUpdateResultNotification for a successfully committed
   update contains six attributes:

   SubmisSvrHost specifies the DNS name of the submission server;

   SubmisSvrPort specifies the port number of the submission server;

   SubmisSvrIncarn specifies the incarnation stamp of the submission
      server;

   SSN specifies the Submit Sequence Number assigned to this update
      submission;

   CSN specifies the Commit Sequence Number that was assigned by the
      primary for this update; and,

   ZoneTopNodeName specifies the URI [5] of the top node in the zone
      within which this update occurred.

   A SubmittedUpdateResultNotification for an update that failed
   contains the same six attributes above, except that the CSN number is
   set to 0.  In addition, the SubmittedUpdateResultNotification for a
   failed update contains a single ARSError element describing the error
   that occurred.

3.2.1.3 PushCommittedUpdates

   A PushCommittedUpdates request is made from an upstream server to a
   downstream server to suggest that the downstream server perform a
   PullCommittedUpdates request from the upstream server.  It provides a
   means of propagating updates quickly without the downstream servers'
   needing to poll the upstream server.


Schwartz                  Expires April 7, 2002                [Page 36]

Internet-Draft              ANTACID Protocol                October 2001


   The PushCommittedUpdates element contains two attributes:

   UpstreamHost specifies the DNS name or IP address of the upstream
      server making the request; and,

   UpstreamPort specifies the port number of the upstream server making
      the request.


3.2.1.4 PullCommittedUpdates

   The PullCommittedUpdates request specifies one or more ReplState
   elements, corresponding to the one or more zone's the downstream
   server replicates from the upstream server, for which it is making
   the PullCommittedUpdates request.  Each ReplState element contains
   two elements:

   TopNodeOfZoneToReplicate specifies the top node in the name tree for
      the current zone being replicated; and,

   LastSeenCSN specifies the last CSN the downstream server has seen.
      The semantics are that the upstream server is to send committed
      update content (discussed shortly) for each operation that has
      occurred since that CSN (i.e., not including that CSN), optionally
      using the collapsing notion defined in [1].  A request specifying
      LastSeenCSN='0' indicates that the entire zone is to be
      transferred.

   The response to a failed PullCommittedUpdates request contains a
   ARSError describing the failure.

   The response to a successful PullCommittedUpdates request contains
   zero or more UpdateGroup's:

   o  zero UpdateGroup's are sent in the case of a PullCommittedUpdates
      request to an upstream server that has committed no new updates
      since the last PullCommittedUpdates request performed by the
      downstream server.

   o  one UpdateGroup is sent a single UpdateGroup has been committed by
      the upstream since the specified CSN.

   o  more than one UpdateGroup's are sent if multiple UpdateGroup's
      have been committed by the upstream since the specified CSN.

   Each UpdateGroup contains a set of committed updates, encoded in the
   default DataWithOps encoding.


Schwartz                  Expires April 7, 2002                [Page 37]

Internet-Draft              ANTACID Protocol                October 2001


3.3 ARS Submission-Propagation Protocol (ars-s)

   If the primary and a non-primary server both support ars-s, updates
   may also be submitted to the non-primary server.

   ars-s adds two new protocol requests to those defined by ars-c:

   1.  PropagateSubmittedUpdate, which is used by a non-primary to
       forward an update submission up the replication Directed Acyclic
       Graph (DAG) towards the primary; and,

   2.  SubmittedUpdateResultNotification (which is used by ars-s for
       client notification) is used in an additional way, namely, to
       provide asynchronous success/failure notification to a downstream
       server of a request it had earlier submitted.


3.3.1 PropagateSubmittedUpdate

   The PropagateSubmittedUpdate element contains six attributes:

   SubmisSvrHost specifies the DNS name of the submission server;

   SubmisSvrPort specifies the port number of the submission server;

   SubmisSvrIncarn specifies the incarnation stamp of the submission
      server;

   SSN specifies the Submit Sequence Number assigned by the submission
      server for this update submission;

   NotifyHost specifies the DNS name or IP address to which asynchronous
      notification is to be sent after the commit fails or succeeds;
      and,

   NotifyPort specifies the port number for asynchronous notification.


Schwartz                  Expires April 7, 2002                [Page 38]

Internet-Draft              ANTACID Protocol                October 2001


   The PropagateSubmittedUpdate element also contains one of two
   possible elements:

   1.  UpdateGroup containing the submitted update content, which
       contains one or more DataWithOps elements.

   2.  FailedUpdateSubmission, which is used to indicate that all
       attempts to perform a PropagateSubmittedUpdate request to
       upstream servers fail (after timing out/retrying a configurable
       number of times) have failed.  A FailedUpdateSubmission can also
       be generated by an administrative tool run to fail updates
       submitted to a server that is not brought down cleanly, in
       violation of the Client-Server Promise (see [1]).


3.3.2 SubmittedUpdateResultNotification Extended Semantics

   Unlike its use in ars-c, with ars-s the notification destination
   (host/IP and port) is required, so that downstream servers always
   receive notification of update results.  Note also that the
   GlobalSubmitID contained in a SubmittedUpdateResultNotification
   always specifies the globally unique identifier for the submission
   server (including the unique SSN it generated), which should be used
   by each server along the submission path as a key into a local state
   table of in-progress update submissions (e.g., to find where to
   propagate the response back down to the previous server on the
   submission path).

   As in the ars-c case, the SubmittedUpdateResultNotification includes
   the ARSError if an error occurred, or the CSN that was assigned by
   the primary for the given SSN if no error occurred.

3.4 ARS Encoding Negotiation Protocol (ars-e)

   The ContentEncodingNegotiation element is optionally initiated by an
   ARS peer that wishes to determine if an expanded set of encodings is
   available beyond the default DataWithOps encoding.

   The currently defined encodings and procedures for registering new
   encodings are provided in an appendix (Appendix I.1).

   The ContentEncodingNegotiation element contains a ZoneTopNodeName
   attribute specifying the URI of the top node in the zone to which
   this encoding is to apply, and one or more ContentEncodingName
   elements corresponding to content encodings the initiator supports.
   Each ContentEncodingName element contains an NMTOKEN specifying the
   name of a defined encoding (such as "DataWithOps").


Schwartz                  Expires April 7, 2002                [Page 39]

Internet-Draft              ANTACID Protocol                October 2001


   The responder sends back the subset of the requested encodings that
   it supports.

   After the ContentEncodingNegotiation has completed, each ARS peer may
   cache the list of ContentEncodingName's supported by the given peer
   and for the given zone, for the duration of the ARS channel's
   lifetime.  Given a list of supported ContentEncodingName's, each ARS
   peer may select an appropriate encoding in future message exchanges.

   If no ContentEncodingNegotiation has taken place before an operation,
   the DataWithOps encoding must be used.  See Current Encodings section
   (Appendix I.1) about cases where the DataWithOps may fail to meet the
   needs of the current transmission.


Schwartz                  Expires April 7, 2002                [Page 40]

Internet-Draft              ANTACID Protocol                October 2001


4. Algorithms and Implementation Details

   Below we discuss the basic state management needed to implement an
   ARS server.  We then discuss algorithm and implementation details for
   each of the three sub-protocols.

4.1 ARS Meta-Data Management

   A variety of meta-data must be managed to implement an ARS service.
   This section discusses possible implementation approaches for
   managing this meta-data.

4.1.1 Document State

   ARS requires two pieces of meta-data to be associated with each
   document: the name of the document and its current CSN.  It is a
   local implementation matter how these meta-data are stored.  One
   approach would be to store these meta-data in the datastore itself,
   as attributes in the root element of each document.  Another approach
   would be to maintain a separate repository mapping document name to
   the pair (physical address for document, CSN), where the physical
   address might be a disk block address or a database row ID.  This
   approach is similar to how a UNIX file system uses a directory file
   to map from hierarchical name to flat (inode) name plus protection
   attributes.

4.1.2 Committed Update State Management

   To be able to respond to PullCommittedUpdates requests, an ARS server
   needs to track the set of operations that have committed on each
   document, and the corresponding CSN's.  Some type of index is needed
   to locate all operations and these associated meta-data for which the
   CSN is larger than a given CSN.  It is a local implementation matter
   how these meta-data are to be managed.  We discuss two possibilities
   here.

   Similar to the case noted in the previous section, one approach would
   be to store the meta-data as attributes in the datastore itself, in
   each document.  An additional complication with doing this for
   managing committed update state is that there must be a way to track
   deleted documents (so that "delete" operations can be returned in
   response to a PullCommittedUpdates request after a delete has
   committed at the ARS server).  To do this, at "delete" time the
   datastore could use another root element attribute to mark documents
   as deleted, rather than physically removing them from the datastore.
   Additionally, the datastore will need to provide a way for each
   profile (SEP, ARS, etc.) that uses the datastore to choose whether to
   retrieve deleted documents.  For example, it must be possible to


Schwartz                  Expires April 7, 2002                [Page 41]

Internet-Draft              ANTACID Protocol                October 2001


   service SEP queries such that deleted documents are not returned, but
   it must be possible for ARS to retrieve the document name and
   "delete" operations that have committed since a given CSN.

   In addition to the above complication, there is a potential
   performance problem with tracking deleted documents in the datastore.
   Specifically, the "greater than CSN" lookup needs to be very
   efficient, potentially retrieving millions of results.  If the
   underlying datastore supports only a text-based index (e.g., designed
   primarily to support SEP textual queries), "greater than" queries
   will probably be slow.  In this case, it would be preferable to
   implement a more specialized indexing structure to track committed
   updates.  That leads to the second approach, namely, tracking
   committed updates in some type of log.  The log could be implemented
   as a flat file with a corresponding numeric index, or perhaps in a
   relational database table.

   If a log implementation is chosen, a local implementation decision
   needs to be made about how far back in history to keep update logs.
   Generally speaking, the larger the content held by an ARS server and
   the more expensive the network links, the longer back in history the
   server should retain logs.  Note also that systems supporting mobile
   clients should provision for more log data to be kept around, more
   clients, longer-running transactions, etc.

4.1.3 Committed Update Collapsing

   Regardless of whether committed update state is tracked inside the
   datastore or in an auxiliary log, ARS servers may choose to implement
   "collapsing" updates as defined in [1].  Doing so could yield
   significant savings in network transmissions as well as space
   required for committed update state.  For the sake of simplicity
   below we describe only how to implement update collapsing assuming a
   log-based implementation of committed update state management.


Schwartz                  Expires April 7, 2002                [Page 42]

Internet-Draft              ANTACID Protocol                October 2001


   To implement update collapsing, the ARS server does as follows:

   o  The log stores a single entry in the log per document that has
      been updated.

   o  If the most recent operation on the document was 'create',
      'update' or 'write', the operation saved with this entry is saved
      as 'write' (which is the datastore operation that overwrites the
      document if it exists and creates it otherwise).

   o  If the most recent operation on the document was 'delete', the
      operation saved with this entry is 'delete'.

   o  When transmitting a collapsed update, the upstream server sends a
      null operation for each sequence number that has been elided by
      the above create/update/write/delete substitution algorithm, so
      that a complete up-counting sequence is transmitted to the
      downstream server.  detect missing operations from a up-counting
      set of sequence numbers.

   In this fashion, for example, the following update sequence run at
   the primary:

   o  write blk2 csn=1

   o  write blk2 csn=2

   o  delete blk2 csn=3

   o  create blk2 csn=4

   o  update blk2 csn=5

   will be "played back" for the downstream server that requests all
   operations since csn=1 as:

   o  noop csn=1

   o  noop csn=2

   o  noop csn=3

   o  noop csn=4

   o  write blk2 csn=5


Schwartz                  Expires April 7, 2002                [Page 43]

Internet-Draft              ANTACID Protocol                October 2001


   ARS servers are not required to perform update collapsing when
   responding to a PullCommittedUpdates request.  However, ARS servers
   must be prepared to process PullCommittedUpdates responses that have
   been collapsed.  Specifically:

   o  The collapsing algorithm requires that downstream servers ignore
      datastore "non-existent document deletion" failures, since, for
      example, the sequence:

      *  create blk2 csn=4

      *  write blk2 csn=5

      *  update blk2 csn=6

      *  delete blk2 csn=7

      will be replayed as:

      *  noop csn=4

      *  noop csn=5

      *  noop csn=6

      *  delete blk2 csn=7

      but the downstream server does not have a blk2 at the time this
      delete is performed (because it was created and deleted after the
      last PullCommittedUpdates request the downstream server handled).
      Note that datastore invalid document deletions ARE correctly
      detected at the primary, and hence ARS does not negatively impact
      datastore semantics for replicated datastores.  It is only the
      downstream servers that ignore document deletion failures.

   o  Upstream servers must transmit a complete up-counting sequence of
      CSN's, starting with the "last seen" CSN.  Downstream servers must
      check that they always receive a complete up-counting sequence of
      CSN's, starting with the "last seen" CSN.


Schwartz                  Expires April 7, 2002                [Page 44]

Internet-Draft              ANTACID Protocol                October 2001


4.1.4 Per Server Sequence Number State

   Sequence numbers are tracked as follows.  For each zone it handles,
   an ARS server tracks the last assigned SSN for that server for that
   zone.  In addition,

   o  if it is a non-primary for the zone it tracks the last CSN it has
      seen for the zone; or,

   o  if it is a primary for the zone it tracks:

      *  The last assigned CSN for the zone; and,

      *  The last seen SSN for each submission server for the zone.

   If a site's ARS service is implemented by multiple physical servers
   (all identified by a single DNS name at the site), those servers must
   coordinate assignment of sequence numbers among each other to meet
   the uniqueness requirement, for example by retrieving the SSN from a
   shared backend database.

   Note that per-server sequence number state need not be saved in the
   datastore, and in fact for the sake of efficiency should be saved to
   a lighter weight storage system such as flat files.  (The datastore
   implements ACID semantics, which is overkill for managing individual
   data items.)


Schwartz                  Expires April 7, 2002                [Page 45]

Internet-Draft              ANTACID Protocol                October 2001


4.1.5 Locking

   A zone-wide lock is obtained in the process of committing an update.
   This is accomplished using 'lock' and 'release' primitives at the
   top-level node for the zone before and after (respectively)
   performing the individual document writes, which implement the
   following semantics:

   lock: specifies the URI of the document defining a subset of a zone
      to which the requesting user instance is requesting exclusive
      write access.  The zone subset to be locked consists of the named
      document and all documents beneath that document in the subtree,
      down to but not including any zone delegation cut points in the
      subtree.  (If there are no zone delegation points, the zone subset
      consists of the entire subtree under the specified node, down to
      and including the leaves.)  A lock must be performed successfully
      before any document writes may be performed.  While a zone subset
      is locked, no other user instance may lock or write documents
      successfully within the zone subset, and any document write
      operations are journaled until a subsequent release operation.

   release: specifies whether to commit or rollback any journaled
      document write operations.  All document write operations
      performed while a zone subset is locked have atomic update
      semantics -- either they all succeed or they all fail.  If they
      all succeed, they must all become visible to other clients of the
      local datastore atomically.

   Note: for performance reasons it may be preferable to implement a
   more optimistic concurrency control technique so that write
   operations from multiple updates can be overlapped and conflicts
   cause rollback/replay.  For simplicity we talk about zone-wide
   locking in the current document.

   If the ARS implementation is threaded additional synchronization is
   required, because datastore lock semantics disallow a single process
   from locking nesting subtrees (e.g., locking "a.b.c" when "a.b" is
   already locked).  Threaded implementations therefore need to maintain
   a table of threads currently holding or waiting for a lock, listing
   the thread identifier and the tree node locked / to be locked.  When
   a new lock request is to be performed, this table needs to be checked
   to see if any other threads currently hold locks on tree nodes above
   or below the current request in the tree, and if so to create a queue
   of such requests.  When a lock is released, this table again needs to
   be checked to see if any threads are currently waiting for a lock
   that may now be allowed to issue the datastore lock request.


Schwartz                  Expires April 7, 2002                [Page 46]

Internet-Draft              ANTACID Protocol                October 2001


   Finally, appropriate synchronization is needed around accesses to the
   above table.

4.1.6 Server Configuration Data

4.1.6.1 Replication Topology (Normative)

   Appendix H provides the DTD for configuring the replication topology
   of a ARS server.  While the storage management mechanism for this
   configuration data (local file, database table, etc.) is a local
   implementation matter, the document structure is defined here for two
   reasons:

   o  This syntax provides a standard format for exporting server
      information, which may be used to support server location (e.g.,
      through export to DNS service location (SRV) records); and

   o  Having a uniform syntax allows for easier discussion, e.g., in
      outside documents or email discussions.

   Each server specifies the set of zones it handles, whether it is
   primary for each zone, and the immediate upstream and downstream
   servers for each zone it serves.  The configuration data also
   specifies the frequency of PushCommittedUpdates and
   PullCommittedUpdates requests, as well as preferences for the order
   that servers are to be contacted when propagating submitted updates.


Schwartz                  Expires April 7, 2002                [Page 47]

Internet-Draft              ANTACID Protocol                October 2001


    As an example, the following is the configuration data for a primary
   server running on host s1.example.com and port 5682, which replicates
   content in the Blocks name space:


   <?xml version='1.0'?>
   <!DOCTYPE ARSExportedConfig SYSTEM 'ARSExportedConfig.dtd'>
   <ARSExportedConfig>
       <GlobalServerID SvrHost='s1.example.com' SvrPort="5682"
        SvrIncarn='979428854'/>
       <ZonePrimaryConfig>
           <ZoneTopNode Name='blocks:.'/>
           <ZoneCutPoint Name='blocks:doc.rfc'/>
           <ZoneCutPoint Name='blocks:doc.edgar'/>
           <DownstreamServer>
               <ServerLocation SvrHost='s2.example.com'
                SvrPort='5682'/>
               <PushProperties Period='600'/>
           </DownstreamServer>
           <DownstreamServer>
               <ServerLocation SvrHost='s3.example.com'
                SvrPort='5682'/>
               <PushProperties Period='600'/>
           </DownstreamServer>
       </ZonePrimaryConfig>
   </ARSExportedConfig>


   This is the primary server for the global name tree root, delegating
   at cut points "doc.rfc" and "doc.edgar".  It is replicated by two
   downstream servers, running on s2.example.com and s3.example.com.  It
   pushes updates to those servers every 10 minutes.


Schwartz                  Expires April 7, 2002                [Page 48]

Internet-Draft              ANTACID Protocol                October 2001


   Here is a configuration file for a non-primary server running on host
   s2.example.com and port 5682, which also replicates content in the
   Blocks name space:


   <?xml version='1.0'?>
   <!DOCTYPE ARSExportedConfig SYSTEM 'ARSExportedConfig.dtd'>
   <ARSExportedConfig>
       <GlobalServerID SvrHost='s2.example.com' SvrPort='5682'
        SvrIncarn='979428854'/>
       <NonZonePrimaryConfig>
           'ZoneTopNode Name='blocks:.'/>
           <ZoneCutPoint Name='blocks:doc.rfc'/>
           <ZoneCutPoint Name='blocks:doc.edgar'/>
           <UpstreamServer>
               <Preference Weight='10'/>
               <ServerLocation SvrHost='s1.example.com'
                SvrPort='5682'/>
               'TopNodeOfZoneToReplicate Name='blocks:.'/>
               <PullProperties Period="-1"/>
           </UpstreamServer>
           <UpstreamServer>
               <Preference Weight='20'/>
               <ServerLocation SvrHost='s6.example.com'
                SvrPort='5682'/>
               <TopNodeOfZoneToReplicate Name='blocks:.'/>
               <PullProperties Period='-1'/>
           </UpstreamServer>
           <DownstreamServer>
               <ServerLocation SvrHost='s4.example.com'
                SvrPort='5682'/>
               <PushProperties Period='600'/>
           </DownstreamServer>
           <DownstreamServer>
               <ServerLocation SvrHost='s5.example.com'
                SvrPort='5682'/>
               <PushProperties Period='600'/>
           </DownstreamServer>
       </NonZonePrimaryConfig>
   </ARSExportedConfig>


   This server replicates the "." zone from two upstream servers
   (s1.example.com and s6.example.com).  It does not schedule any
   periodic update pull requests from the upstream servers, because in
   this set of servers only pushes are scheduled.  The server specifies
   preference weights for each upstream server, used to determine the
   order that the upstream servers are tried when attempting to


Schwartz                  Expires April 7, 2002                [Page 49]

Internet-Draft              ANTACID Protocol                October 2001


   propagate update submissions.  Finally, this server is replicated by
   two downstream servers, running on s4.example.com and s5.example.com,
   respectively.

4.1.6.2 Local Implementation Settings (Non-Normative)

   In addition to replica topology information, ARS servers will also
   need various local configuration data.  What follows is not part of
   the normative specification for ARS, but rather is included to
   provide a concrete example to implementors, based on the author's
   server implementation.  The author's ARS implementation has the
   following local configuration data:

   HomeDirectory: Root directory under which data, logs, and
      configuration information are stored.

   ValidateARSMessages: Whether to validate ARS protocol messages
      against DTD.  Note that this setting can adversely affect server
      performance.

   DetectWriteWriteConflicts: Whether to detect write-write conflicts.
      Only matters at Zone primary.  The spec makes this required to be
      on but I included the option to turn it off to allow
      experimentation (since it adds overhead) and to allow easier
      testing (since otherwise you need to have the right CSN before
      sending an update).

   OutOfOrderTimeoutInSecs: How long to wait (in seconds) for out-of-
      order update submissions while earlier submissions are propagated
      before timing out the update for the current attempt period.

   LockWaitTimeoutInSecs: Number of seconds to allow update submissions
      to wait for for the real subtree lock while trying to apply a
      committed update before timing out.

   SingleARSRequestAttemptTimeoutInSecs: How long to wait (in seconds)
      for PropagateSubmittedUpdate requests to complete before timing
      out the update for the current attempt period.

   ServiceFailedTransmitRetryPeriodInSecs: How long to wait after a
      retryable request fails due to service failure out before
      retrying, in seconds.

   ServiceFailedTransmitMaxAttempts: Number of times to retry a service
      failed request before giving up and reporting it failed to client.


Schwartz                  Expires April 7, 2002                [Page 50]

Internet-Draft              ANTACID Protocol                October 2001


   LogicallyIndentBlocks: If true, we put logical indentation into XML
      document start and end elements (not the character content) as we
      write them out.  Else will be left margin aligned.  Note that this
      setting is only meaningful for XML documents that are parsed.

   CacheSeqNumBlocks: Enable/disable SeqNumBlock caching.

   ARSDTDFileName: Location of ARS DTD.

   ARSContentEncodingsFileName: File name where to find ARS content
      encodings DTD.  This file can be edited locally to add new (non-
      standardized) content encodings, and is included here so that the
      ARS runtime can validate content encodings if ValidateARSMessages
      is enabled.


4.2 Protocol Processing

   An ARS server must implement ars-c, and may implement one or both of
   ars-s and ars-e.  As part of the required protocol handling support,
   ARS servers must reject requests for a non-supported sub-protocol
   with an ARSError containing ARSErrorCode=223005.

4.2.1 ARS Commit-and-Propagate Protocol (ars-c)


Schwartz                  Expires April 7, 2002                [Page 51]

Internet-Draft              ANTACID Protocol                October 2001


4.2.1.1 SubmitUpdate Processing

   Upon receipt of a SubmitUpdate request, an ARS server performs the
   following steps:

   1.  If a non-primary ARS server that does not support ars-s receives
       an update submission (either via a SubmitUpdate or
       PropagateSubmittedUpdate request), it must reject the request by
       responding with an ARSError containing ARSErrorCode=223006.

   2.  If ARS server receives an update submission specifying an
       unsupported name space it must reject the request by responding
       with an ARSError containing ARSErrorCode=123004.

   3.  The server parses the DataWithOps encoding, saves the enclosed
       documents and their associated CSN's and update operations to
       temporary stable storage (using temporary identifiers guaranteed
       not to clash with other concurrently arriving updates), and
       performs the following checks:

       *  access control denial;

       *  update request to a document in a zone not served by the
          current ARS server; and,

       *  update request that spans more than one zone.

       Note that the temporary document copies need not be saved in the
       datastore, and in fact for the sake of efficiency should be saved
       to a lighter weight storage system such as a journaling file
       system.  (The datastore implements ACID semantics, which is
       overkill for managing temporary data.)

   4.  If no failures occurred during the above checks, the server
       allocates a new GlobalSubmitID for the UpdateGroup, for the zone
       within which the submission falls.

   5.  At this point the server responds to the client either with an
       ARSError describing the error that occurred or an ARSAnswer
       containing the GlobalSubmitID to indicate that the submission has
       been successfully received.  It also saves the
       OptionalNotificationDest information provided (if any), for use
       in asynchronous notification once the update has completed.


Schwartz                  Expires April 7, 2002                [Page 52]

Internet-Draft              ANTACID Protocol                October 2001


   6.  Now that the update has been completely received, the server
       enqueues it for commit processing.  The server processes elements
       in this queue one at a time, as follows:

       *  If the local implementation uses log-based committed update
          state management (Section 4.1.2), create a temporary list into
          which document names, operation names and CSN's can be stored.

       *  Acquire a zone-wide lock, setting an implementation-specified
          timeout period that will result in an ARSError containing
          ARSErrorCode=212002 being sent to the client if the lock is
          not acquired before the timeout expires.  Note that this
          rough-grain locking is required to implement zone-wide
          serialization, and can become a source of contention if the
          operations performed while locking are not implemented
          efficiently.

       *  Allocate a new CSN for this zone.


Schwartz                  Expires April 7, 2002                [Page 53]

Internet-Draft              ANTACID Protocol                October 2001


       *  Loop on all DatumAndOp elements within the UpdateGroup and
          perform the following steps in the order the operations occur
          in the UpdateGroup:

          +  load data and operation from saved temporary state.

          +  If no 'csn' attribute is currently set in the document,
             treat that document as having CSN=1.  Doing this allows a
             datastore not previously replicated by ARS to be replicated
             without running a special tool to add CSN's.

          +  At this point the local implementation may perform write-
             write conflict detection by comparing the value of the
             'CSN' attribute contained in the DatumAndOp against the
             corresponding value stored in the primary's local
             datastore.  If any of these values differ, the
             implementation may reject the update by responding with a
             ARSError containing ARSErrorCode=126001.

          +  Update the 'csn' attribute in each document per the
             assigned CSN, and then perform the needed datastore
             operation, trapping any errors/exceptions that arise.
             (Note that datastore lock/release semantics do not make the
             operation visible until the corresponding release occurs).

          +  If the local implementation uses log-based committed update
             state management (Section 4.1.2), save the document name,
             datastore operation name, and CSN in the temporary list.

          +  Continue to the next operation.

       *  If an error/exception arises during the above loop:

          +  Release the zone-wide lock, requesting that all contained
             updates be aborted.

          +  If the local implementation uses log-based committed update
             state management (Section 4.1.2), discard the temporary
             document name/operation list.

          +  Reset the CSN counter so that this CSN will be used for the
             next commit attempt.  (Each CSN must represent a successful
             update.)

          +  Generate an ARSError to be transmitted to the client (if
             notification was requested).


Schwartz                  Expires April 7, 2002                [Page 54]

Internet-Draft              ANTACID Protocol                October 2001


       *  Else, if the local implementation uses log-based committed
          update state management (Section 4.1.2), append the temporary
          document name/operation list to the log of all operations
          performed on the datastore (which is used by the
          PullCommittedUpdates request; details about this log are
          discussed later (Section 4.1.3)).  This log is only written
          when the zone lock is held, and therefore the log will be
          serialized in the same update order as applied to the local
          datastore.

       *  Finally, release the zone-wide lock, requesting that all
          contained updates be committed.

   7.  Upon completion (either successful or not), if notification was
       requested the server performs a SubmittedUpdateResultNotification
       operation (discussed below).

   Note: as an optimization for step 1 above, incoming documents can be
   written directly to the datastore (rather than saving first to
   temporary storage), and the update simply aborted if an error is
   detected.  However, we recommend against this approach because:

   o  It would require holding the zone-wide lock potentially an
      arbitrarily long time, for example while a large update is
      submitted across a congested link.  By saving to temporary storage
      first a bound is placed on how long the zone lock is held.

   o  This approach will not work if the ars-c implementation is
      extended to support ars-s, since ars-s needs to propagate
      submission upstream before they are committed.


4.2.1.2 SubmittedUpdateResultNotification Processing

   SubmittedUpdateResultNotification must be implemented as a timeout-
   and-retry style of operation, so that if the client is temporarily
   unreachable the server will retry over a period of time.  The number
   of retries and timeout period is determined by the local
   implementation.

   For failed updates, the SubmittedUpdateResultNotification contains
   the ARSError that occurred and the GlobalServerID where the error
   occurred.

   For successful updates, the SubmittedUpdateResultNotification
   contains an empty ARSError element, as well as the CSN that was
   assigned by the primary for the given SSN.


Schwartz                  Expires April 7, 2002                [Page 55]

Internet-Draft              ANTACID Protocol                October 2001


4.2.1.3 PushCommittedUpdates Processing

   The processing of PullCommittedUpdates requests is implementation-
   dependent.  The downstream server may ignore the request, or may use
   it to schedule (Section 4.2.2.4) a PullCommittedUpdates request.

4.2.1.4 PullCommittedUpdates Processing

   Downstream servers must synchronize PullCommittedUpdates requests so
   that at most on request/response is in progress for a given zone at
   any time.

   If a server maintains committed update state in a log (Section 4.1.2)
   and a request is received for updates further back in history than
   are stored in that log, the upstream server responds with an ARSError
   containing ARSErrorCode=226002.  In response the downstream server
   may either re-issue the same request at a different ARS server, or
   (if both servers support an appropriate ars-e encoding (Section 3.4))
   request a full zone transfer.  Note in particular that if a server
   performs committed update log truncation it will be unable to support
   new ARS replicas' requests to join the replication network (since
   they will need to perform a request for all updates since CSN 0)
   unless both servers also support an appropriate ars-e encoding.  As a
   consequence, an implementation that does not support ars-e and that
   wishes to allow new replicas to join over time must not perform
   committed update log truncation.

   The upstream server must lock the requested zone while processing a
   PullCommittedUpdates request, so that the underlying datastore
   contents are not changed while the content is being sent (which could
   result in inconsistent content being transmitted to the downstream
   server).  Since this may cause the zone to be locked for a long time,
   an alternative implementation would be to lock the zone, make a copy
   of the documents to be sent, and unlock, before transmitting those
   documents.  Copy-on-write implementations are also possible.

   The upstream server must send the UpdateGroup's in increasing order
   of CSN for that zone.


Schwartz                  Expires April 7, 2002                [Page 56]

Internet-Draft              ANTACID Protocol                October 2001


   When a downstream server receives a committed set of UpdateGroup's
   from an upstream server (in response to the PullCommittedUpdates
   request) the downstream ARS server:

   o  should check that all CSN's contained within each UpdateGroup are
      monotonically increasing.

   o  must apply a (possibly complete) prefix of the CSN's.  Resource
      overrun during acceptance of the updates must not leave the
      downstream server in a state where a non-prefix subset of the
      responses has been committed.

   o  must apply each UpdateGroup atomically, applying steps 1-3, 6 and
      7 listed under SubmitUpdate Processing (Section 4.2.1.1), with one
      exception: in step 6 rather than allocating a new CSN the
      downstream server should use the CSN contained in each document
      (which was allocated and written into the documents by the
      primary).

   o  Updates its local state based on the last CSN received and
      committed from the committed UpdateGroup(s).

   The downstream server must ignore datastore delete failures to
   function correctly in response to upstream servers that implement
   collapsing updates.

4.2.1.5 Submitted Update Collapsing for Infrequently Synchronized Peers

   If an ARS server performs write-write conflict detection, clients
   cannot submit two updates in a row to a document without getting a
   commit response after each submission.  That can be an annoying
   limitation for infrequently synchronized nodes, such as mobile PDAs.
   To mitigate this problem ARS peers may collapse updates as follows.

   If a pending update submission has not yet been propagated up the
   DAG, the ARS server may choose to replace the pending submission with
   another update to the same document, reusing the SSN.  To maintain
   the correct submitted update ordering, the SSN's for all updates
   between the previous and recent submission must be reordered whenever
   this algorithm is applied, by dropping the original submission,
   shifting each of the following SSN's back by one, and decrementing
   the current not-yet-assigned SSN at that ARS server.  For example,
   consider the update submission sequence: a.b.c (SSN 4), a.b.d (SSN
   5), a.b.e (SSN 6), a.b.c (SSN 7).  If none of these updates has yet
   been propagated up the DAG, this update sequence can be replaced with
   the sequence a.b.d (SSN 4), a.b.e (SSN 5), a.b.c (SSN 6), and then
   reusing SSN 7 for the next update that is submitted.


Schwartz                  Expires April 7, 2002                [Page 57]

Internet-Draft              ANTACID Protocol                October 2001


4.2.2 ARS Submission-Propagation Protocol (ars-s) Processing

   ars-s requires more complex synchronization for performing the ars-c
   SubmittedUpdateResultNotification operation.  Each of these
   operations is discussed below.

4.2.2.1 Non-Primary SubmitUpdate Processing

   If a non-primary ARS server that supports ars-s receives a
   SubmitUpdate request, it performs the following steps:

   1.  Steps 1-5 listed under SubmitUpdate Processing (Section 4.2.1.1).
       Note: a SSN should not be used in place of a temporary identifier
       in step 3 because if a failure occurs during these steps a
       FailedUpdateSubmission request will have to be propagated
       upstream (discussed below), adding additional load to all
       upstream servers and delaying other update submissions until this
       FailedUpdateSubmission has completed at the primary.

   2.  The server generates a PropagateSubmittedUpdate request,
       consisting of the same content as the received submission, but
       filling in the GlobalSubmitID attribute with the server's
       SubmisSvrHost, SubmisSvrPort, SubmisSvrIncarn, and the SSN.

   3.  The server attempts to send this PropagateSubmittedUpdate request
       to each upstream server in turn, until one successfully receives
       it.

   4.  If the PropagateSubmittedUpdate request cannot be successfully
       forwarded to any upstream server, a timer must be set to retry
       the sequence of upstream servers again later (because of the
       client-server promise discussed in [1]).  The timeout duration
       and number of attempts is determined by the local implementation.

   5.  Once the PropagateSubmittedUpdate transmission has completed, the
       server saves stable state to indicate that the update has been
       propagated, so that it can look up this state when the update
       later completes (successfully or not) and notify the client if
       notification was requested.  The PropagateSubmittedUpdate request
       must not be transmitted again once it has been successfully
       received by an upstream server.


Schwartz                  Expires April 7, 2002                [Page 58]

Internet-Draft              ANTACID Protocol                October 2001


   6.  If all attempts to send/timeout/re-send the
       PropagateSubmittedUpdate request upstream fail and notification
       was requested, the server sends an ARSError containing
       ARSErrorCode=210001.  If all attempts fail the server always
       generates a PropagateSubmittedUpdate request containing a
       FailedUpdateSubmission element, which it attempts to send
       upstream using the same timeout-and-retry logic as noted in step
       (4), with the exception that it never stops trying until it
       succeeds.  The reason is that the primary must learn of the
       failed update submission, else all future submissions from the
       submission server will fail because of the requirement to
       serialize updates by SSN (see below).


4.2.2.2 Non-Primary PropagateSubmittedUpdate Processing

   If a non-primary ARS server that supports ars-s receives a
   PropagateSubmittedUpdate request (which came either from a non-
   primary that received a SubmitUpdate request and generated a
   corresponding PropagateSubmittedUpdate request, or from a server
   propagating a PropagateSubmittedUpdate request it received), it does
   the following:

   1.  If the request contains a "FailedUpdateSubmission" element, it
       responds to the downstream server with an ARSAnswer containing
       the GlobalSubmitID to indicate that it has successfully received
       the request.  It then it attempts to send this
       PropagateSubmittedUpdate request to each upstream server in turn,
       until one successfully receives it.  Note that at this point the
       responsibility for completing the FailedUpdateSubmission
       transmission has passed from the previous server to the current
       server, so the current server must retry transmitting the request
       indefinitely until an upstream server has accepted it.


Schwartz                  Expires April 7, 2002                [Page 59]

Internet-Draft              ANTACID Protocol                October 2001


   2.  Otherwise, the server performs steps 1-6 listed under Non-Primary
       (Section 4.2.2.1), with four changes:

       *  In addition to the other checks performed during step 1 (more
          specifically, during step 3 of SubmitUpdate Processing
          (Section 4.2.1.1)), it checks for a duplicate GlobalSubmitID
          to the one already seen.  This check is done in two places:

          1.  During step 3 of SubmitUpdate Processing (Section 4.2.1.1)
              a check is made that the SSN contained within the given
              GlobalSubmitID is greater than the SSN of the last
              successfully committed update for the given zone &
              submission GlobalServerID; and,

          2.  After step 3 of SubmitUpdate Processing (Section 4.2.1.1)
              a check is made that the given GlobalSubmitID is not
              currently being processed (which could happen if duplicate
              submissions arrive so close together that one has started
              processing and not yet completed).

          This check ensures that:

          +  DAG cycles (caused by configuration errors) cannot result
             in infinite loops or deadlocks; and,

          +  PropagateSubmittedUpdate operations are idempotent, which
             provides greater resilience in dealing with partitions.

       *  It rewrites the NotifyHost with its own GlobalServerID, so
          that the SubmittedUpdateResultNotification from the upstream
          server to which the submission was propagated will be sent the
          current server.

       *  Upon completion (successful or not) the server sends a
          SubmittedUpdateResultNotification to the server from which the
          PropagateSubmittedUpdate request was received.  See also the
          discussion of SubmittedUpdateResultNotification
          Synchronization (Section 4.2.2.6).

       *  If all attempts to send the PropagateSubmittedUpdate request
          to upstream servers fail (step 6 of Non-Primary
          PropagateSubmittedUpdate Processing (Section 4.2.2.2)) the
          server sends the appropriate ARSError to the downstream server
          from which the PropagateSubmittedUpdate request was received,
          but does not generate the PropagateSubmittedUpdate request
          containing a FailedUpdateSubmission element.  That request
          must be generated by the submission server.


Schwartz                  Expires April 7, 2002                [Page 60]

Internet-Draft              ANTACID Protocol                October 2001


   Note that each server in the submission path assumes responsibility
   for the client-server promise (see [1]) as the update submission is
   passed up the tree.  This promise allows an ARS never to retransmit a
   submission once it has been accepted by an upstream server (step 5
   under "Non-Primary SubmitUpdate Processing").

4.2.2.3 Primary PropagateSubmittedUpdate Processing

   If a primary ARS server that supports ars-s receives a
   PropagateSubmittedUpdate request, it performs the following steps:

   o  If the request does not contain a "FailedUpdateSubmission"
      element, it performs the following stems:

      1.  Steps 1-3 listed under SubmitUpdate Processing (Section
          4.2.1.1).

      2.  If the GlobalSubmitID for the submission is not one greater
          than the last seen SSN from the submission server, enqueue the
          submission to await receipt of the missing submission(s),
          starting a timer to detect excessive delays.  This ensures
          that update serialization preserves the correct partial
          ordering of updates.  Note that this hold-and-re-order
          mechanism is required because submissions transmitted up
          varying replication DAG paths could arrive out of order.  If a
          timeout occurs, send an ARSError containing
          ARSErrorCode=212001 and abort the update.

      3.  Steps 6-7 listed under SubmitUpdate Processing (Section
          4.2.1.1), but sending the SubmittedUpdateResultNotification to
          the downstream server from which the submission was received.

      4.  If the update completed successfully, it updates its local
          state that tracks the last seen SSN from the given submission
          server.

   o  Otherwise (FailedUpdateSubmission):

      1.  It responds to the downstream server with an ARSAnswer
          containing the GlobalSubmitID to indicate that it has
          successfully received the request.

      2.  Step 2 above, but without setting a timeout.

      3.  Step 4 above.


Schwartz                  Expires April 7, 2002                [Page 61]

Internet-Draft              ANTACID Protocol                October 2001


4.2.2.4 PushCommittedUpdates and PullCommittedUpdates Scheduling

   ARS does not specify how PushCommittedUpdates and
   PullCommittedUpdates operations are to be scheduled.  As a local
   implementation matter, ARS servers may schedule PushCommittedUpdates
   and PullCommittedUpdates operations a variety of different ways,
   perhaps offering configuration options that can support any/all of
   the following:

   1.  Periodic PullCommittedUpdates requests.

   2.  PushCommittedUpdates requests down the submission path
       immediately following any update that was propagated up that
       path, to minimize committed update propagation latency back down
       to the submitting client.

   3.  PushCommittedUpdates requests down to other servers immediately
       following an update, to minimize committed update propagation
       latency for servers that need to keep in close synchronization.

   4.  PullCommittedUpdates requests only upon new replica join, server
       re-boot, mobile device reconnection, or partition repair, to
       "catch up".


Schwartz                  Expires April 7, 2002                [Page 62]

Internet-Draft              ANTACID Protocol                October 2001


   If a server implements (2) and/or (3) above, care should be taken to
   prevent backlogging the downstream server with many
   PushCommittedUpdates requests.  For example, if the primary is
   experiencing high update rates and performs a PushCommittedUpdates
   each time it completes an update, it may not be possible to process
   the ensuing PullCommittedUpdates requests that the downstream
   server(s) make as fast as new PushCommittedUpdates requests are being
   made.  This can create excess network traffic and lock contention at
   the primary, at precisely the worst time.  To avoid this problem, the
   following algorithm (reminiscent of delayed acknowledgements and
   Nagle's algorithm used by TCP [10]) should be used:

   o  The upstream server uses a flag that tracks whether the downstream
      server has run a PullCommittedUpdates request since the last
      PushCommittedUpdates request from the upstream server.

   o  After sending a PushCommittedUpdates request to the downstream
      server, the upstream server sets the flag to true.

   o  After processing a PullCommittedUpdates request, the upstream
      server sets the flag to false.

   o  Each time the upstream server's PushCommittedUpdates code is
      triggered it checks this flag and only performs the
      PushCommittedUpdates request if the flag is false.

   o  The code that performs the PushCommittedUpdates and sets the flag
      must be run within a critical section that guarantees that a
      PullCommittedUpdates request cannot begin until the
      PushCommittedUpdates completes (otherwise a deadlock can ensue
      that stops any future PushCommittedUpdates requests from running).

   With this approach, updates can be propagated when they complete, but
   during times of high update submission load many PushCommittedUpdates
   operations will be batched together.

4.2.2.5 PullCommittedUpdates Synchronization

   The downstream server must perform synchronization to ensure that at
   most one PullCommittedUpdates request can be running at a time for a
   given zone.  For example, a server configured with two different
   upstream servers for a zone must not run concurrent
   PullCommittedUpdates requests from the two upstream servers.  (This
   synchronization requirement is one reason why PushCommittedUpdates is
   simply a suggestion for a PullCommittedUpdates request to be
   performed.  If PushCommittedUpdates actually transmitted data, it
   would be difficult to synchronize because the PushCommittedUpdates
   and PullCommittedUpdates data transfers would be initiated by


Schwartz                  Expires April 7, 2002                [Page 63]

Internet-Draft              ANTACID Protocol                October 2001


   different servers.  Instead, the downstream server controls the
   scheduling of committed update transmissions.)  This is important not
   only because concurrent data transfers for PullCommittedUpdates' for
   the same zone would waste traffic and server load, but also because
   this concurrency could result in incorrect committed state.  For
   example, consider the sequence:

   o  Start PullCommittedUpdates request from upstream server #1 with
      LastSeenCSN=2.

   o  Start PullCommittedUpdates request from upstream server #2 with
      LastSeenCSN=2.

   o  Upstream server #1 has last seen CSN 4, and begins sending content
      for CSN's 2,3,4.

   o  Upstream server #2 has last seen CSN 5, and begins sending content
      for CSN's 2,3,4,5.

   o  Upstream server #2 happens to finish first, committing updates 2-
      5.

   o  Upstream server #1 then finishes, leaving the last-seen CSN=4 but
      CSN 5 already applied.  This is an incorrect state.


4.2.2.6 SubmittedUpdateResultNotification Synchronization

   The scheduling of SubmittedUpdateResultNotification request to a
   downstream server is complicated by two factors:

   1.  Because the response to a PullCommittedUpdates request can
       contain more than one UpdateGroup, receipt of a
       PullCommittedUpdates request in a server that supports ars-s may
       trigger multiple SubmittedUpdateResultNotification's to be
       generated to downstream servers and/or clients.

   2.  It is possible that the committed update content for an update
       reaches the downstream server before the
       SubmittedUpdateResultNotification from its upstream server
       reaches that downstream server.


Schwartz                  Expires April 7, 2002                [Page 64]

Internet-Draft              ANTACID Protocol                October 2001


   To illustrate the second case above, consider the following
   replication topology:


                                 svr3
                               (primary)
                                |    |
                               \|/  \|/
                              svr2  svr4
                                |    |
                               \|/  \|/
                                 svr1


    Given this topology, consider the following event ordering sequence:


                 1. client->svr1: SubmitUpdate
                 2. svr1->svr2: PropagateSubmittedUpdate
                 3. link between svr1 and svr2 goes down
                 4. svr2->svr3: PropagateSubmittedUpdate
                 5. svr3->svr2: SubmittedUpdateResultNotification
                 6. svr3->svr2: PushCommittedUpdates
                 7. svr2->svr3: PullCommittedUpdates
                 8. svr3->svr4: PushCommittedUpdates
                 9. svr4->svr3: PullCommittedUpdates
                10. svr4->svr1: PushCommittedUpdates
                11. svr1->svr4: PullCommittedUpdates
                12. link between svr1 and svr2 comes back up
                13. svr2->svr1: SubmittedUpdateResultNotification
                14. svr2->svr1: PushCommittedUpdates
                15. svr1->svr2: PullCommittedUpdates
                16. svr1->client: SubmittedUpdateResultNotification


   Because the link between svr1 and svr2 goes down after the submitted
   update has been propagated, the committed update content reaches svr1
   via an alternate path through the DAG (svr3->svr4->svr1, completing
   in event 11) before the SubmittedUpdateResultNotification reaches it
   (event 13).


Schwartz                  Expires April 7, 2002                [Page 65]

Internet-Draft              ANTACID Protocol                October 2001


   Because of these complications, SubmittedUpdateResultNotification (as
   well as scheduling of PushCommittedUpdates operations to propagate
   the newly arrived committed content downstream) should be triggered
   as follows:

   o  Each server maintains a submitted update state table keyed by SSN
      and sequentially ordered (in a secondary data structure such as a
      tree) by CSN.  It also maintains a counter of the last seen CSN
      for the given zone.

   o  When a SubmittedUpdateResultNotification arrives from an upstream
      server, the downstream server looks up the state table entry for
      the given SSN, and sets the CSN for that table entry (adjusting
      tree order accordingly).

   o  A thread runs periodically (say, once per second), scanning the
      CSN-ordered tree for submitted updates whose CSN is less than the
      last seen CSN for the given zone.  For each entry, the downstream
      server generates a SubmittedUpdateResultNotification the server
      downstream of the current downstream server, along the submission
      path.

   Implementations should not attempt to simplify the synchronization
   requirements here by forcing the SubmittedUpdateResultNotification to
   complete before the committed content propagates, because doing so
   could mean that a single unavailable downstream server would hold up
   transmissions of committed updates to all servers in the network.

4.2.2.7 Submitted Update Reordering Details

   Non-primary servers must not hold-and-re-order update submissions.
   They simply forward all updates up the DAG, and the primary performs
   any needed re-ordering.  Non-primary servers need not hold-and-re-
   order committed updates coming back down the DAG, because all ARS
   servers are required to send committed updates in order and without
   gaps in the numbering sequence since the requested CSN.


Schwartz                  Expires April 7, 2002                [Page 66]

Internet-Draft              ANTACID Protocol                October 2001


    The following figure provides an example of the dynamics that can
   result from the update submission re-ordering/time-out mechanism.


                          primary:E2 (E4, E5 queued)
                               / \
                              /   \
                            \|/   \|/
                           repA   repB
                           /  \
                          ~    \
                        \|/    \|/
                     repC:E3   repD:E4,E5
                         ~      /
                         |     /
                        \|/  \|/
                          repE


   In this figure, the last update to be serialized by the primary from
   replica server E is E's SSN number 2 (denoted by "primary:E2").
   After E2 committed, E propagated three more update submissions: it
   propagated SSN number 3 to replica server C, and SSN's 4 and 5 to
   replica server D.  Replica server C became partitioned from the
   network after it accepted submission E3 (indicated by the tilde's in
   the figure), but SSN 4 and 5 made it to the primary via repD->repA-
   >primary.  Because it has not yet seen E3, the primary queues E4 and
   E5, waiting for E's SSN 3 to arrive.  If replica server C stays
   partitioned for a long time, the primary will time out SSN's 4 and 5
   (sending an ARSError for each containing ARSErrorCode=127001).
   Replica server C might then repair its partition and propagate E3
   upstream, at which point the primary will serialize and pass the
   corresponding committed update back down.  Replica server E could
   therefore see E4 and E5 fail and then see E3 succeed.  It is up to E
   to decide whether and when to resubmit the failed submissions.
   Possibilities include:

   o  Reflecting the state to the user;

   o  Pausing and resubmitting E4 and E5 some configurable number of
      times; and,

   o  Re-propagating E3 (and E4 and E5), so the primary receives it
      without waiting for C to repair its partition.

   Submission re-ordering is not performed in the downstream direction.
   Instead, updates are only propagated in the downstream direction in
   CSN order.  Note that this happens naturally because the primary


Schwartz                  Expires April 7, 2002                [Page 67]

Internet-Draft              ANTACID Protocol                October 2001


   generates updates and sends them in CSN order, and its downstream
   servers likewise send updates only up through the last CSN they have
   seen, so all ARS servers will always see updates in complete CSN
   order.

4.2.3 ARS Encoding Negotiation Protocol (ars-e) Processing

   In response to a ContentEncodingNegotiation request, the responder
   makes a zone-specific decision (e.g., different zones can have
   different underlying databases, supporting correspondingly different
   proprietary encoding formats).  The local implementation may also
   consider other issues (e.g., source IP address to decide if
   encryption allowed based on country's export control restrictions).

   Encoding negotiation results may be cached as long as a BEEP channel
   is open to the remote server.  Thus, to change the set of encodings
   it supports a server must first close any open channels.

   If a server that does not support the ars-e protocol receives a
   ContentEncodingNegotiation request, it responds an ARSError
   containing ARSErrorCode=223005.

   After receiving a response to the ContentEncodingNegotiation request,
   the initiator should check that the responded set is indeed a subset
   of the original encodings.

   A request specifying LastSeenCSN='0' indicates that the entire zone
   is to be transferred.  This case may be used by the upstream server
   to trigger a special full-zone encoding, if ars-e is supported by
   both servers.

   The basic algorithm used for 'plumbing' into a content encoding is to
   define an API which the encoding can upcall to save documents to
   their stable (temporary) storage, passing the document name, content,
   CSN, and operation to be performed.  On the reverse side (sending a
   set of documents from stable storage out through a encoding), the
   encoding upcalls to get a list of document names needing
   transmission, and then upcalls to get the document data content for
   each.  The encoding can then perform whatever transformations are
   needed on the way to/from stable storage.  Importantly, the whole
   process must be implemented as a pipeline so as not to assume an
   entire update will fit in memory -- as they arrive documents should
   be saved to stable storage, and they should be read as they are to be
   sent.


Schwartz                  Expires April 7, 2002                [Page 68]

Internet-Draft              ANTACID Protocol                October 2001


4.3 Example State Transition Diagrams

   The state transitions needed to implement ARS will depend on which
   subset of ARS sub-protocols is implemented, and what scheduling and
   synchronization mechanisms are implemented.  The following three
   figures provide a set of state transition diagrams that could be used
   to implement all three ARS sub-protocols (ars-c, ars-s, and ars-e),
   with support for PushCommittedUpdates requests down the submission
   path immediately following an update that was propagated up that
   path.  In these state diagrams the paths running straight down
   represent the transitions taken when the current state completes
   successfully, while the paths to the left represent the transitions
   when a failure occurs.

    The first state transition diagram can be used for handling
   submitted updates arriving at a non-primary from a client (via
   SubmitUpdate) or from a downstream ARS server (via
   PropagateSubmittedUpdate):


                           |
                          \|/
                 /<---Incomplete
                /          |
               /          \|/
              /    /<-CompletelyReceived
             /    /        |
            /    /        \|/
           /    /<-PropagatingUpstream<------\ timeout + retrans
          |     |          |          \______/ N times, then infinite
          | f   |         \|/                 FailedSubmissionPropagation
          | a   |<--AwaitingCommitNotif
          | i   |          |
          | l   |         \|/
          | u   |<--AwaitingLocalCommit
          | r   |          |
          | e   |         \|/
          | s   |<-KickingDownstreamScheds
          |     |          |
          |      \        \|/
          \       \-->NotifyingSubmitter<------\ timeout + retrans
           \               |            \______/ N times
            \             \|/
             \------->CleaningUp
                           |
                          \|/
                         Done


Schwartz                  Expires April 7, 2002                [Page 69]

Internet-Draft              ANTACID Protocol                October 2001


    The second state transition diagram can be used for handling
   submitted updates arriving at the primary from a client or from a
   downstream ARS server:


                           |
                          \|/
                 /<---Incomplete
                /          |
               /          \|/
              /    /<-CompletelyReceived
             /    /        |
            /    /        \|/
           /    /<-QueuedForReordering
          |     |          |
          | f   |         \|/
          | a   |<---WaitingForLock
          | i   |          |
          | l   |         \|/
          | u   |<--ApplyingToLocalDatastore
          | r   |          |
          | e   |         \|/
          | s   |<-KickingDownstreamScheds
          |     |          |
          |      \        \|/
          \       \-->NotifyingSubmitter<------\ timeout + retrans
           \               |            \______/ N times
            \             \|/
             \------->CleaningUp
                           |
                          \|/
                         Done


Schwartz                  Expires April 7, 2002                [Page 70]

Internet-Draft              ANTACID Protocol                October 2001


    The third state transition diagram can be used for handling
   committed updates arriving at a non-primary from an upstream ARS
   server:


                          |
                         \|/
                     /Incomplete
                    /     |
                   /     \|/
                  /<-CompletelyReceived
             f   /        |
             a  /        \|/
             i /<---WaitingForLock
             l |          |
             u |         \|/
             r |<--ApplyingToLocalDatastore
             e |          |
             s  \        \|/
                 \--->CleaningUp
                          |
                         \|/
                        Done


   The AwaitingCommitNotif state is used to represent the case where a
   submitted update has been propagated upstream and the local server
   has not yet received notification that the update has committed
   (along with the CSN).  The AwaitingLocalCommit state is used to
   represent the case where the commit notification has been received
   but the committed update content has not yet been propagated back
   down the DAG to the local server.  (See the discussion of
   SubmittedUpdateResultNotification Processing (Section 4.2.1.2).)

   The KickingDownstreamScheds state is used to represent the case where
   PushCommittedUpdates operations are scheduled to run periodically at
   the upstream server, and when a new update arrives the schedules need
   to be changed such that a PushCommittedUpdates runs immediately and
   then the normal schedule period is re-started.  Again, this is needed
   for the case of an implementation that performs PushCommittedUpdates
   requests down the submission path immediately following an update
   that was propagated up that path.


Schwartz                  Expires April 7, 2002                [Page 71]

Internet-Draft              ANTACID Protocol                October 2001


5. Security Considerations

   See [1]'s Section 10 for a discussion of ARS security issues.


Schwartz                  Expires April 7, 2002                [Page 72]

Internet-Draft              ANTACID Protocol                October 2001


References

   [1]   Schwartz, M., "The ANTACID Replication Service: Rationale and
         Architecture", draft-schwartz-antacid-service-00 (work in
         progress), October 2001.

   [2]   World Wide Web Consortium, "Extensible Markup Language (XML)
         1.0", W3C XML, February 1998, <http://www.w3.org/TR/1998/REC-
         xml-19980210>.

   [3]   Rose, M., "The Blocks Extensible Exchange Protocol Framework",
         draft-mrose-blocks-protocol-04 (work in progress), May 2000.

   [4]   Rose, M., Gazzetta, M. and M. Schwartz, "The Blocks Datastore
         Model", Draft Technical Memo, January 2001.

   [5]   Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform
         Resource Identifiers (URI): Generic Syntax", RFC 2396, August
         1998.

   [6]   Reynolds, J., "Post Office Protocol", RFC 918, Oct 1984.

   [7]   Postel, J., "Simple Mail Transfer Protocol", RFC 788, Nov 1981.

   [8]   Mockapetris, P., "Domain names - concepts and facilities", RFC
         1034, STD 13, Nov 1987.

   [9]   Lamport, L., "Time, Clocks, and the Ordering of Events in a
         Distributed System", Communications of the ACM Vol. 21, No. 7,
         July 1978.

   [10]  Stevens, W., "TCP/IP Illustrated, Volume 1 - The Protocols",
         Addison-Wesley Professional Computing Series , 1994.

   [11]  Borenstein, N. and N. Freed, "MIME (Multipurpose Internet Mail
         Extensions) Part One: Mechanisms for Specifying and Describing
         the Format of Internet Message Bodies", RFC 1521, September
         1993.


Author's Address

   Michael F. Schwartz
   Code On The Road, LLC

   EMail: schwartz@CodeOnTheRoad.com
   URI:   http://www.CodeOnTheRoad.com


Schwartz                  Expires April 7, 2002                [Page 73]

Internet-Draft              ANTACID Protocol                October 2001


Appendix A. Acknowledgements

   The author would like to thank the following people for their reviews
   of this specification: Marco Gazzetta, Carl Malamud, Darren New, and
   Marshall Rose.


Schwartz                  Expires April 7, 2002                [Page 74]

Internet-Draft              ANTACID Protocol                October 2001


Appendix B. Future Enhancements and Investigations

   A possible future enhancement to the protocol and implementation
   would be to use an attribute that specifies payload length for update
   content.  By doing this, an implementation could copy the payload
   directly to stable storage instead of first parsing it.  This could
   provide a significant performance improvement, and would also allow
   the update content to be saved in exactly the format it was sent in
   (as opposed to the rewriting/reindenting/etc.  that happen when XML
   content is parsed and then output).

   Another possible future enhancement to the protocol and
   implementation would be to allow serialization-only primary servers,
   whose only job is to serialize update submissions and distribute the
   work for applying and propagating serialized updates among the first
   tier of zone replica servers.  That would offload query and update
   processing from the inherently centralized serialization server.

    A possible future enhancement to the protocol would be to allow a
   replication topology containing cycles, rather than requiring DAGs.
   This generalization would allow more resilience to network partitions
   with fewer servers than DAGs.  For example, consider the following
   cyclic replication topology:


                    s1
                    | \
                    |  \
                   s2---s3


   In this figure updates can to s2 if s3 is down and vice versa.  With
   ARS's DAG-based topology an additional server would be required to
   achieve the same level of redundancy:


                  s1------>s4
                  | \      /
                  |  \    /
                  |    \ /
                  |    / \
                  |   /   \
                 \|/\|/   \|/
                   s2----->s3


   Another possible future enhancement to the protocol would be to allow
   batching of submitted updates before propagating up the DAG.


Schwartz                  Expires April 7, 2002                [Page 75]

Internet-Draft              ANTACID Protocol                October 2001


   An area for further work is defining SNMP-based monitoring/management
   interfaces.

   An area for further work is automating the approach to laying out
   replication topology.


Schwartz                  Expires April 7, 2002                [Page 76]

Internet-Draft              ANTACID Protocol                October 2001


Appendix C. ANTACID Replication Service Registration

   Profile Identification: http://xml.resource.org/profiles/ARS

   Messages exchanged during Channel Creation: none

   Messages in "REQ" frames: "ARSRequest"

   Messages in positive "RSP" frames: "ARSResponse"

   Messages in negative "RSP" frames: "ARSError"

   Message Syntax: c.f., Appendix D, Appendix E, Appendix F, and
      Appendix G.

   Message Semantics: c.f., Section 3.2


Schwartz                  Expires April 7, 2002                [Page 77]

Internet-Draft              ANTACID Protocol                October 2001


Appendix D. ARS Top-Level DTD


   <!--
     Top-level (implementation choices) for ANTACID Replication
     Service, as of 2001-10-07.

     Copyright 2001 Code On The Road, LLC.
     -->


   <!-- Entity declarations for ARS sub-protocols ->
       <!ENTITY % ARSC PUBLIC "-//IETF//DTD ARSC//EN" "">
       <!ENTITY % ARSE PUBLIC "-//IETF//DTD ARSE//EN" "">
       <!ENTITY % ARSS PUBLIC "-//IETF//DTD ARSS//EN" "">


   <!-- Implementations supporting only the Commit-and-Propagate Protocol
        (ars-c) use the following -->
   <!ENTITY % ARSREQUESTS '
   	   SubmitUpdate                       |
   	   SubmittedUpdateResultNotification  |
   	   PushCommittedUpdates               |
   	   PullCommittedUpdates
   '>
   <!ELEMENT UpdateGroup                DataWithOps>
   %ARSC;


   <!-- Implementations supporting Commit-and-Propagate Protocol
        (ars-c) and Submission-Propagation Protocol (ars-s) use the
        following -->
   <!ENTITY % ARSREQUESTS '
   	   SubmitUpdate                       |
   	   SubmittedUpdateResultNotification  |
   	   PropagateSubmittedUpdate           |
   	   PushCommittedUpdates               |
   	   PullCommittedUpdates
   '>


Schwartz                  Expires April 7, 2002                [Page 78]

Internet-Draft              ANTACID Protocol                October 2001


   <!ELEMENT UpdateGroup                DataWithOps>
   %ARSC;
   %ARS;


   <!-- Implementations supporting Commit-and-Propagate Protocol
        (ars-c) and Encoding Negotiation Protocol (ars-e) use the
        following -->
   <!ENTITY % ARSREQUESTS '
   	   SubmitUpdate                       |
   	   SubmittedUpdateResultNotification  |
   	   PushCommittedUpdates               |
   	   PullCommittedUpdates               |
   	   ContentEncodingNegotiation'
   '>
   %ARSC;
   %ARSE;


   <!-- Implementations supporting Commit-and-Propagate Protocol
        (ars-c), Submission-Propagation Protocol (ars-s), and Encoding
        Negotiation Protocol (ars-e) use the following -->
   <!ENTITY % ARSREQUESTS '
   	   SubmitUpdate                       |
   	   SubmittedUpdateResultNotification  |
   	   PropagateSubmittedUpdate           |
   	   PushCommittedUpdates               |
   	   PullCommittedUpdates               |
   	   ContentEncodingNegotiation'
   '>
   %ARSC;
   %ARS;
   %ARSE;


Schwartz                  Expires April 7, 2002                [Page 79]

Internet-Draft              ANTACID Protocol                October 2001


Appendix E. ars-c DTD


   <!--
     DTD for ANTACID Replication Service Commit-and-Propagate Protocol
     (ars-c), as of 2001-10-07.

     Copyright 2001 Code On The Road, LLC.

     This document is a DTD and is in full conformance with all
     provisions of Section 10 of RFC2026 except that the right to
     produce derivative works is not granted.


     Refer to this DTD as:

       <!ENTITY % ARSC PUBLIC "-//IETF//DTD ARS//EN" "">
       %ARSC;
     -->


   <!--
     Contents

       DTD inclusion

       Data types

       Entities

       ARS messages
             The SubmitUpdate operation
             The SubmittedUpdateResultNotification operation
             The PushCommittedUpdates operation
             The PullCommittedUpdates operation

     -->


Schwartz                  Expires April 7, 2002                [Page 80]

Internet-Draft              ANTACID Protocol                October 2001


   <!--
     DTD inclusion

     Caller should already have included the BEEP Channel Management DTD.
     -->


   <!--
     Data types:

           entity   syntax/reference                     example
           ======   ================                     =======
       names
           DNSNAME    ([A-Za-z][-A-Za-z0-9]*)              a3.example.com
                          ("." ([A-Za-z][-A-Za-z0-9]*))*
                      c.f. [RFC-1036]
           DNIP       ([A-Za-z][-A-Za-z0-9]*)              a3.example.com
                          ("." ([A-Za-z][-A-Za-z0-9]*))*      - or -
                      *OR* [0-9]+.[0-9]+.[0-9]+.[0-9]+     204.62.247.64
                      c.f. [RFC-1036]
           DOCURI     ([A-Za-z][-+.A-Za-z0-9]*):           blocks:doc.rfc.2629
                          ([A-Za-z0-9][-_A-Za-z0-9]*)
                              ("." ([A-Za-z0-9][-_A-Za-z0-9]*))*
                      c.f., [RFC-2396]

       integers
           UINT16     0..32767                             42
           UINT32     0..4294967295                        17
           UINT64     0 .. 1.8447x10^^19                   329412431233

       multiline character data
           TEXT
     -->


   <!ENTITY % DNSNAME           "NMTOKEN">
   <!ENTITY % DNIP              "NMTOKEN">
   <!ENTITY % DOCURI            "NMTOKEN">
   <!ENTITY % UINT16            "CDATA">
   <!ENTITY % UINT32            "CDATA">
   <!ENTITY % UINT64            "CDATA">
   <!ENTITY % TEXT              "#PCDATA">


Schwartz                  Expires April 7, 2002                [Page 81]

Internet-Draft              ANTACID Protocol                October 2001


   <!--
     Entities

       entity                   use
       ======                   ===
       GlobalSubmitID           globally unique-for-all-time identifier
                                of an update submission.
       OptionalNotificationDest hostname/IP address + port number where
                                notifications are to be sent (for use
                                in contexts where optional/implied)
       RequiredNotificationDest hostname/IP address + port number where
                                notifications are to be sent (for use
                                in contexts where required)
       UpdateOpNames            string names of update operations
                                supported

     -->

   <!ENTITY % GlobalSubmitID '
              SubmisSvrHost              (%DNSNAME;)            #REQUIRED
              SubmisSvrPortNum           (%UINT16;)             #REQUIRED
              SubmisSvrIncarn            (%UINT64;)             #REQUIRED
              SSN                        (%UINT64;)             #REQUIRED
   '>

   <!ENTITY % OptionalNotificationDest '
              NotifyHost                 (%DNIP;)               #IMPLIED
              NotifyPort                 (%UINT16;)             #IMPLIED
   '>

   <!ENTITY % RequiredNotificationDest '
              NotifyHost                 (%DNIP;)               #REQUIRED
              NotifyPort                 (%UINT16;)             #REQUIRED
   '>

   <!ENTITY % UpdateOpNames              "(create|write|update|delete|noop)">


   <!--
     ARS messages

        role           REQ               RSP
        ====           ===               ===
         I             ARSRequest        ARSResponse
                                             +: ARSAnswer
                                             -: ARSError


Schwartz                  Expires April 7, 2002                [Page 82]

Internet-Draft              ANTACID Protocol                October 2001


     The following defines the ELEMENT's that are returned within the
     ARSAnswer part of successful ARSResponse messages, to show which
     type of ARSResponse MUST be paired with each type of ARSRequest:

        ARSRequest                          ELEMENT within ARSAnswer
        ==========                          ========================
        SubmitUpdate                        GlobalSubmitID
        SubmittedUpdateResultNotification   (none)
        PushCommittedUpdates                (none)
        PullCommittedUpdates                (UpdateGroup)*

     -->


   <!ELEMENT ARSRequest                 (%ARSREQUESTS;)>
   <!ATTLIST ARSRequest
             ReqNum                     (%UINT32;)              #REQUIRED
   >

   <!ELEMENT ARSResponse                (ARSAnswer|ARSError)>
   <!ATTLIST ARSResponse
             ReqNum                     (%UINT32;)              #REQUIRED
   >

   <!ELEMENT ARSAnswer                  (
                                         (GlobalSubmitID |
                                         (UpdateGroup)*)?
                                        )
   >

   <!ELEMENT GlobalSubmitID             EMPTY>
   <!ATTLIST GlobalSubmitID
             %GlobalSubmitID;
   >

   <!ELEMENT ARSError                   (
                                         ARSErrorCode,
                                         ARSErrorText,
                                         ARSErrorSpecificsText?
                                        )
   >
   <!ATTLIST ARSError
             OccurredAtSvrHost          (%DNIP;)               #REQUIRED
             OccurredAtSvrPortNum       (%UINT16;)             #REQUIRED
             OccurredAtSvrIncarn        (%UINT64;)             #REQUIRED
   >
   <!ELEMENT ARSErrorCode               (%UINT32;)>
   <!ELEMENT ARSErrorText               (%TEXT;)>


Schwartz                  Expires April 7, 2002                [Page 83]

Internet-Draft              ANTACID Protocol                October 2001


   <!ELEMENT ARSErrorSpecificsText      (%TEXT;)>

   <!ELEMENT SubmitUpdate               UpdateGroup>
   <!ATTLIST SubmitUpdate
             %OptionalNotificationDest;
             NotifyOkOnCurrentChannel   (yes|no)                #IMPLIED

   <!ELEMENT SubmittedUpdateResultNotification (ARSError?)>
   <!ATTLIST SubmittedUpdateResultNotification
             %GlobalSubmitID;
             CSN                        (%UINT64;)              #REQUIRED
             ZoneTopNodeName            (%DOCURI;)              #REQUIRED
   >


   <!ELEMENT PushCommittedUpdates       EMPTY>
   <!ATTLIST PushCommittedUpdates
             UpstreamHost               (%DNIP;)                #REQUIRED
             UpstreamPortNum            (%UINT16;)              #REQUIRED
   >

   <!ELEMENT PullCommittedUpdates       ReplState>

   <!ELEMENT ReplState
             (TopNodeOfZoneToReplicate,LastSeenCSN)+
   >

   <!ELEMENT TopNodeOfZoneToReplicate   (%DOCURI;)>

   <!ELEMENT LastSeenCSN                (%UINT64;)>

   <!--
        ContentEncodingName: DataWithOps
        This encoding can be used for submitted and committed updates.
        Associated with each data unit is a store (update, delete, etc.)
        operation and attributes for the data unit's name and CSN.  This
        encoding MUST be supported by all ARS clients and servers (and is
        the encoding used for servers that do not support the ars-e
        protocol).
     -->
   <!ELEMENT DataWithOps                (DatumAndOp*)>

   <!-- ANY contains a single data unit's content, meeting the requirements
        defined in datastore.dtd
     -->
   <!ELEMENT DatumAndOp                 ANY>
   <!ATTLIST DatumAndOp
             Name                       (%DOCURI;)              #REQUIRED


Schwartz                  Expires April 7, 2002                [Page 84]

Internet-Draft              ANTACID Protocol                October 2001


             CSN                        (%UNIT64;)              #REQUIRED
             Action                     (%UpdateOpNames;)       #REQUIRED
   >


Schwartz                  Expires April 7, 2002                [Page 85]

Internet-Draft              ANTACID Protocol                October 2001


Appendix F. ars-s DTD


   <!--
     DTD for ANTACID Replication Service ARS Submission-Propagation
     Protocol (ars-s), as of 2001-10-07.

     Copyright 2001 Code On The Road, LLC.

     This document is a DTD and is in full conformance with all
     provisions of Section 10 of RFC2026 except that the right to
     produce derivative works is not granted.


     Refer to this DTD as:

       <!ENTITY % ARSS PUBLIC "-//IETF//DTD ARS//EN" "">
       %ARSS;
     -->


   <!--
     Contents

       ARS messages
   	  The PropagateSubmittedUpdate operation

     -->


   <!--
     ARS messages

     The following defines the ELEMENT's that are returned within the
     ARSAnswer part of successful ARSResponse messages, to show which
     type of ARSResponse MUST be paired with each type of ARSRequest:

        ARSRequest                          ELEMENT within ARSAnswer
        ==========                          ========================
        PropagateSubmittedUpdate            (none)

     -->


Schwartz                  Expires April 7, 2002                [Page 86]

Internet-Draft              ANTACID Protocol                October 2001


   <!ELEMENT PropagateSubmittedUpdate   (
   				      FailedUpdateSubmission |
   				      UpdateGroup
   				     )
   >
   <!ATTLIST PropagateSubmittedUpdate
             %GlobalSubmitID;
             %RequiredNotificationDest;
   >

   <!ELEMENT FailedUpdateSubmission     EMPTY>


Schwartz                  Expires April 7, 2002                [Page 87]

Internet-Draft              ANTACID Protocol                October 2001


Appendix G. ars-e DTD


   <!--
     DTD for ANTACID Replication Service Encoding Negotiation Protocol
     (ars-e), as of 2001-10-07.

     Copyright 2001 Code On The Road, LLC.

     This document is a DTD and is in full conformance with all
     provisions of Section 10 of RFC2026 except that the right to
     produce derivative works is not granted.


     Refer to this DTD as:

       <!ENTITY % ARSE PUBLIC "-//IETF//DTD ARSE//EN" "">
       %ARSE;
     -->


   <!--
     Contents

       DTD inclusion

       Data types

       ARS messages
             The ContentEncodingNegotiation operation

     -->


   <!--
     DTD inclusion

     Caller should already have included the defined encodings.
     -->


Schwartz                  Expires April 7, 2002                [Page 88]

Internet-Draft              ANTACID Protocol                October 2001


   <!--
     Data types:

           entity         syntax/reference           example
           ======         ================           =======
       names
           ENCODINGNAME   NMTOKEN                    unixFileStore.tar.gz
     -->

   <!ENTITY % ENCODINGNAME "NMTOKEN">


   <!--
     ARS messages

     The following defines the ELEMENT's that are returned within the
     ARSAnswer part of successful ARSResponse messages, to show which
     type of ARSResponse MUST be paired with each type of ARSRequest:

        ARSRequest                          ELEMENT within ARSAnswer
        ==========                          ========================
        ContentEncodingNegotiation          ContentEncodingsSupported

     -->


   <!ELEMENT ContentEncodingsSupported  (ContentEncodingName+)>

   <!ELEMENT ContentEncodingName        (%ENCODINGNAME;)>

   <!ELEMENT ContentEncodingNegotiation ContentEncodingsSupported>
   <!ATTLIST ContentEncodingNegotiation ContentEncodingsSupported
             ZoneTopNodeName            (%DOCURI;)              #REQUIRED
   >


Schwartz                  Expires April 7, 2002                [Page 89]

Internet-Draft              ANTACID Protocol                October 2001


Appendix H. ARS Topology Configuration DTD


   <!--
     DTD for configuration file for specifying characteristics of
     significance to a ANTACID Replication Service (ARS) server that are
     relevant to other clients/servers with which it must interact,
     especially topology configuration.  As of 2001-10-07.

     Copyright 2001 Code On The Road, LLC.

     This document is a DTD and is in full conformance with all
     provisions of Section 10 of RFC2026 except that the right to
     produce derivative works is not granted.

     Refer to this DTD as:

       <!ENTITY % ARSEXPCONFIG PUBLIC "-//IETF//DTD ARS CONFIG//EN" "">
       %ARSEXPCONFIG;

     -->


   <!--
     DTD inclusions

     Caller should already have included the BEEP Channel Management DTD.

     Caller should already have included the ars-c DTD>
     -->


   <!ELEMENT ARSExportedConfig     (
                                    AdminContactInfo,
                                    GlobalServerID,
                                    (
                                     (ZonePrimaryConfig |
                                      NonZonePrimaryConfig
                                     )+
                                    )
                                   )
   >

   <!ELEMENT AdminContactInfo            (name?, organization?, address?)>

   <!ELEMENT name                        (%TEXT;)>


Schwartz                  Expires April 7, 2002                [Page 90]

Internet-Draft              ANTACID Protocol                October 2001


   <!-- The GlobalServerID of this ARS server.  Note: the hostname
        specified in the downstream and upstream servers must be identical
        Strings or else the immediate schedule 'kick' will not be triggered
        by upstream server in response to downstream server's successful
        PropagateSubmittedUpdate -->
   <!ELEMENT GlobalServerID              EMPTY>
   <!ATTLIST GlobalServerID
             %GlobalServerID;
   >

   <!ELEMENT ZonePrimaryConfig           (
                                          ZoneTopNode,
                                          (ZoneCutPoint*),
                                          (DownstreamServer*)
                                         )
   >

   <!ELEMENT NonZonePrimaryConfig        (
                                          ZoneTopNode,
                                          (ZoneCutPoint*),
                                          (UpstreamServer*),
                                          (DownstreamServer*)
                                         )
   >

   <!ELEMENT ZoneTopNode                 EMPTY>
   <!ATTLIST ZoneTopNode
             Name                        (%DOCURI;)        #REQUIRED
   >

   <!ELEMENT ZoneCutPoint                EMPTY>
   <!ATTLIST ZoneCutPoint
             Name                        (%DOCURI;)        #REQUIRED
   >

   <!ELEMENT ServerLocation              EMPTY>
   <!ATTLIST ServerLocation
             SvrHost                     %DNSNAME;         #REQUIRED
             SvrPort                     %UINT16;          #REQUIRED
   >

   <!-- All that would really be needed for proper server operation is a
        ServerLocation for each DownstreamServer and each UpstreamServer.  We
        use a GlobalServerID here (which adds the SvrIncarnNum) because that
        makes it possible to do better state dumping of the ARS server.  Note
        that if you don't update the SvrIncarnNum in the ARSExcportedConfig
        file the server will still work correctly in all respects except it
        will not be able to dump the last seen SubmitSeqNum for any downstream


Schwartz                  Expires April 7, 2002                [Page 91]

Internet-Draft              ANTACID Protocol                October 2001


        servers that are using a SvrIncarNum different from what is listed in
        the config file.  -->

   <!ELEMENT DownstreamServer            (GlobalServerID, PushProperties)*>

   <!ELEMENT UpstreamServer              (
                                          Preference,
                                          GlobalServerID,
                                          (TopNodeOfZoneToReplicate|ZoneFilter),
                                          PullProperties
                                         )
   >

   <!ELEMENT Preference                  EMPTY>
   <!ATTLIST Preference
             Weight                      (%UINT32;)        #REQUIRED
   >

   <!ELEMENT TopNodeOfZoneToReplicate    EMPTY>
   <!ATTLIST TopNodeOfZoneToReplicate
             Name                        (%DOCURI;)        #REQUIRED
   >

   <!-- Push and pull periods are # of seconds between attempts to
        send/retrieve updates.  0 for a DownstreamServer means no delay
        should be imposed between receipt of an update and attempting to
        propagate that update to that DownstreamServer.  (0 is not
        meaningful/allowed for an UpstreamServer).  -1 means no attempt
        should be made to push or pull updates.
     -->
   <!ELEMENT PushProperties              EMPTY>
   <!ATTLIST PushProperties
             Period                      (%INT32;)         #REQUIRED
   >
   <!ELEMENT PullProperties              EMPTY>
   <!ATTLIST PullProperties
             Period                      (%INT32;)         #REQUIRED
   >


Schwartz                  Expires April 7, 2002                [Page 92]

Internet-Draft              ANTACID Protocol                October 2001


Appendix I. Current Encodings and Registration Procedures

   ARS encodings are defined as MIME [11] Content-Type
   "application/ars", with the single parameter "encoding_name" naming
   which encoding is being used (e.g., DataWithOps).  ars-e is NOT a
   MIME Content Transfer Encoding, since it is not application-
   independent.

   As is the case with MIME primary types, encodings being used
   privately (that is, between peers that understand the encoding by
   mutual prior arrangement) must be given names that begin with "X-" to
   indicate the encodings' non-standard status and to avoid a potential
   conflict with a future official name.  Following the "X-" must be a
   URI [5] that identifies the encoding uniquely (for example, X-
   http://xml.resource.org/encodings/mysqlRaw.html).  This URI should
   refer to a document that describes the encoding (whether formally or
   informally), but the existence of a document is not required.  The
   only requirement is that the URI must provide a globally unique
   identification of the encoding, to prevent clashes in the name space
   of privately defined encodings.

   ARS Encodings are afforded official status when they have been
   registered with the Internet Assigned Numbers Authority (IANA), using
   the template provided below.  The currently defined ARS encodings are
   also listed below, for convenience.

   Note that ARS references the encoding_name within the
   ContentEncodingsSupported and UpdateGroup elements, without the MIME
   "Content-Type:" syntax.

I.1 Currently Defined Encodings


   <!--
     DTD for ANTACID Replication Service Registered Encodings
     (ars-encs), as of 2001-10-07.


     Copyright 2001 Code On The Road, LLC.

     This document is a DTD and is in full conformance with all
     provisions of Section 10 of RFC2026 except that the right to
     produce derivative works is not granted.


     Refer to this DTD as:


Schwartz                  Expires April 7, 2002                [Page 93]

Internet-Draft              ANTACID Protocol                October 2001


       <!ENTITY % ARSENC PUBLIC "-//IETF//DTD ARSENC//EN" "">
       %ARSENC;
     -->


   <!--
     DTD inclusion

     Caller should already have included the ars-c DTD, which defines the
     DataWithOps encoding.

     -->


   <!-- each UpdateGroup contains some encoded data.  The UpdateGroup
        element defines the set of encodings understood by this server
     -->
   <!ELEMENT UpdateGroup                (
                                         DataWithOps |
                                         AllZoneData |
                                         EllipsisNotation
                                        )
   >

   <!ELEMENT AllZoneData               (DatumAndOp*)>
   <!ATTLIST AllZoneData
             TopNodeOfZoneToReplicate    (%DOCURI;)             #REQUIRED
   >

   <!ELEMENT EllipsisNotation           (GlobalUpdateSubmitID+)>
   <!ELEMENT GlobalUpdateSubmitID       EMPTY>
   <!ATTLIST GlobalUpdateSubmitID       (%GlobalUpdateSubmitID;)
             CSN		             (%UNIT64;)              #REQUIRED
   >


Schwartz                  Expires April 7, 2002                [Page 94]

Internet-Draft              ANTACID Protocol                October 2001


   The AllZoneData encoding is used to send (and receive) all documents
   within a datastore, used for two cases: (a) starting up a new
   replica, (b) updating a downstream replica from an upstream server
   that uses log-based committed update state management, when the
   downstream server's last seen CSN is earlier than the upstream
   replica's log truncation point (Section 4.1.2).  The encoding is
   similar to that used for DataWithOps, with the following differences:

   o  with AllZoneData the receiving ARS server must delete the existing
      documents for the zone before applying the updates in the case of
      the AllZoneData encoding; and,

   o  with AllZoneData the requestor must specify the zone for which
      they want all zone data.

   The EllipsisNotation encoding may be used during committed update
   propagation when transmitting to a downstream server on the DAG path
   along which an update was originally submitted.  Instead of sending
   the documents to be updated inside the UpdateGroup, the upstream
   server sends the GlobalUpdateSubmitID that was assigned when the
   update was originally submitted.  The downstream server then commits
   the content that it had saved in temporary stable storage.  This
   encoding avoids transmitting the update content down the same link(s)
   along which it was originally submitted.  When using this encoding it
   is the responsibility of the upstream server to track where it
   received updates in order to determine when the ellipsis notation may
   be applied.  It is a local implementation matter whether the state
   needed for tracking this information is kept on stable storage vs as
   in-memory current-server-incarnation state.


Schwartz                  Expires April 7, 2002                [Page 95]

Internet-Draft              ANTACID Protocol                October 2001


I.2 Encoding Registration Procedures

    Similar to the MIME IANA Registration Procedures, this appendix
   provides an email template for registering new ARS encodings.  Note
   that this template has not yet been registered with the IANA.


        To:  IANA@isi.edu
        Subject:  Registration of new ARS Encoding (MIME Content-Type:
        application/ars)

        Encoding name:

        Dependence on proprietary formats:

        Security considerations:

        Published specification:

        (The published specification must be an Internet RFC or
        RFC-to-be if a new top-level type is being defined, and must be
        a publicly available specification in any case.)

        Person & email address to contact for further information:


Schwartz                  Expires April 7, 2002                [Page 96]

Internet-Draft              ANTACID Protocol                October 2001


Full Copyright Statement

   Copyright (C) The Internet Society (2001).  All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works.  However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

   Code On The Road, LLC expressly disclaims any and all warranties
   regarding this contribution including any warranty that (a) this
   contribution does not violate the rights of others, (b) the owners,
   if any, of other rights in this contribution have been informed of
   the rights and permissions granted to IETF herein, and (c) any
   required authorizations from such owners have been obtained.  This
   document and the information contained herein is provided on an "AS
   IS" basis and CODE ON THE ROAD, LLC DISCLAIMS ALL WARRANTIES, EXPRESS
   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

   IN NO EVENT WILL CODE ON THE ROAD, LLC BE LIABLE TO ANY OTHER PARTY
   INCLUDING THE IETF AND ITS MEMBERS FOR THE COST OF PROCURING
   SUBSTITUTE GOODS OR SERVICES, LOST PROFITS, LOSS OF USE, LOSS OF
   DATA, OR ANY INCIDENTAL, CONSEQUENTIAL, INDIRECT, OR SPECIAL DAMAGES
   WHETHER UNDER CONTRACT, TORT, WARRANTY, OR OTHERWISE, ARISING IN ANY
   WAY OUT OF THIS OR ANY OTHER AGREEMENT RELATING TO THIS DOCUMENT,
   WHETHER OR NOT SUCH PARTY HAD ADVANCE NOTICE OF THE POSSIBILITY OF
   SUCH DAMAGES.


Schwartz                  Expires April 7, 2002                [Page 97]

Internet-Draft              ANTACID Protocol                October 2001


Acknowledgement

   Funding for the RFC Editor function is currently provided by the
   Internet Society.


Schwartz                  Expires April 7, 2002                [Page 98]