Internet DRAFT - draft-langer-ldup-mdcr

draft-langer-ldup-mdcr




LDUP Multiple Draft Conflict Resolution                   Albert Langer
Internet-Draft                                        Directory Designs
Document: draft-langer-ldup-mdcr-00.txt                    7 April 2000
Intended Category: Informational            
Expires: 7 October 2000                     
 
 
    
             LDUP Multiple Draft Conflict Resolution (MDCR) 
                                      
 
   Copyright (C) The Internet Society (2000). All Rights Reserved. 
 
Status of this Memo 
 
   This document is an Internet-Draft and is in full conformance with 
   all provisions of Section 10 of RFC2026 [1].  
    
   Internet-Drafts are working documents of the Internet Engineering 
   Task Force (IETF), its areas, and its working groups. Note that 
   other groups may also distribute working documents as Internet-
   Drafts. Internet-Drafts are draft documents valid for a maximum of 
   six months and may be updated, replaced, or obsoleted by other 
   documents at any time. It is inappropriate to use Internet- Drafts 
   as reference material or to cite them other than as "work in 
   progress." 
    
   The list of current Internet-Drafts can be accessed at 
   http://www.ietf.org/ietf/1id-abstracts.txt  
    
   The list of Internet-Draft Shadow Directories can be accessed at 
   http://www.ietf.org/shadow.html. 
    
   This draft is an individual submission to the IETF LDUP Working 
   Group. Distribution of this document is unlimited. Comments should 
   be sent to the LDUP Replication mailing list <ldup@imc.org> and to 
   the author. 
 
   This Internet-Draft expires on 7 October 2000. 
    
    
1. Abstract 
    
   Concurrent changes in a Multi-Master directory are a partially 
   ordered tree. "LDUP Update Reconciliation Procedures" (URP) uses 
   timestamps to impose a strict linear order on this tree, and merges 
   changes to fit that imposed order. Merged changes violate the 
   semantics of the LDAP/X.500 data model. No specific "modifierName" 
   can be assigned to a merged change and requirements to support audit 
   trails and access controls and not to break applications or impede 
   future work on transactions cannot be met. The current LDUP 
   requirements draft avoids dealing with these critical issues. 
    
 
 Langer         Informational - Expires 7 October 2000        [Page 1] 
INTERNET-DRAFT  Multiple Draft Conflict Resolution - MDCR  7 April 2000  
 
   This memo, "Multiple Drafts Conflict Resolution" (MDCR) proposes an 
   alternative. MDCR stores the tree of conflicts as "multiple drafts" 
   of entries at each replica, and resolves the conflicts between the 
   drafts without merging, based on the actual ordering of the tree. 
   MDCR guarantees atomic, coherent, isolated, and serialized changes 
   to individual entries, with convergence to eventual consistency 
   between replicas. The trade off is that some suppressed conflicting 
   changes become ephemeral. A complete history enables notification 
   support for durability and rollbacks by DUAs. 
    
   The choices that must be made between maintaining coherence of 
   entries and minimizing ephemeral changes should be addressed in LDUP 
   requirements. 
    
    
2. Key terms used in this document 
    
   A "change" is an LDAP v3 protocol add, del, modify or modifyDN 
   request for which the requesting Directory User Agent (DUA) has been 
   sent a success response by an "originator" Directory Service Agent 
   (DSA) after schema and access control checks and application of the 
   change to a replicated entry. 
    
   A "report" is a specification of a change, published by the 
   originator and propagated in messages forwarded to and by other DSAs 
   that hold a replica of the same NamingContext (NC). 
    
   A "record" is a copy of a report stored by the originator or a 
   receiver. 
    
   An "update" is an alteration, performed in response to receiving a 
   report, on an entry within a replica by the DSA holding that 
   replica. 
     
   A "retraction" is an asynchronous notification that a previous 
   success response to a DUA has been invalidated due to conflicting 
   concurrent changes. 
    
   A "confirmation" is an asynchronous notification that no retraction 
   will be issued. 
    
   A "draft" is the result of any change or update for which no 
   confirmation has been issued. Drafts are visible to DUAs as the 
   current state of an entry at a particular replica and may be the 
   basis on which further changes are made by those DUAs. Transient 
   inconsistencies between replicas affect subsequent changes as well 
   as reads when multi-master replication is used. Entry drafts may not 
   just be out of date like shadow copies in a single master directory,  
   but also ephemeral like Internet-Drafts. It is inappropriate for 
   DUAs to rely on them as anything more than "work in progress". (It 
   is even more inappropriate for DSAs to pretend otherwise by 
   eclectically merging drafts and randomly assigning an author ;-). 
 
 Langer         Informational - Expires 7 October 2000        [Page 2] 
INTERNET-DRAFT  Multiple Draft Conflict Resolution - MDCR  7 April 2000  
 
 
   These distinctions are maintained by a complete separation of report 
   propagation from update processing. Change propagation is the end 
   result of a deliberate synthesis of both, and may include the 
   propagation of retractions and confirmations to DUAs or their users. 
    
   The key words "MUST NOT IMPEDE" are a prohibition of specifications 
   that would make it significantly more difficult to later standardize 
   functionality generally described by the words immediately 
   following. Conformity is established by reference to some plausible 
   mechanism that might be used to implement that functionality, 
   despite the consequences of the conforming standard. Actual complete 
   specification of the future functionality and suggested mechanism is 
   of course left to future work. 
    
   The key words "SHOULD NOT IMPEDE" mean that the needs of future 
   standardization have been taken fully into account in drafting the 
   conforming standard. Conformity is established either by satisfying 
   "MUST NOT IMPEDE", or by showing that there is no known plausible 
   mechanism by which the functionality provided by the conforming 
   standard could do so, and by explaining the implications of the 
   choices made. 
 
   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 
   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in 
   this document are to be interpreted as described in RFC-2119 [2]. 
    
    
3. Introduction 
    
   Multi-Master LDAP replication MUST NOT break applications that rely 
   on the LDAP v3 X.500 data model and SHOULD NOT IMPEDE extensions for 
   transactions, referential integrity, audit trails, signed 
   operations, management of replicas with heterogeneous conforming 
   server implementations etc. 
    
   The key goals of MDCR are to satisfy those requirements, which are 
   not satisfied by URP [3] and some of which are either omitted or 
   ambiguously stated in the current LDUP requirements draft [4]. 
    
   URP merges concurrent modifications made by different DUAs at 
   different replicas, to attributes and values of a single entry. This  
   inevitably breaks the semantics of "modifiersName", a mandatory 
   operational attribute actually understood by the directory service. 
   Grouping or transactions affecting multiple entries becomes 
   impossible when even single entries have lost coherence. 
    
   For the same reasons, URP must inevitably break the semantics of 
   other relationships between attributes values understood only by 
   applications. Among those applications are the management of 
   prescriptive access controls and other policy information, and hence 
   the viability of multi-master replication itself. 
 
 Langer         Informational - Expires 7 October 2000        [Page 3] 
INTERNET-DRAFT  Multiple Draft Conflict Resolution - MDCR  7 April 2000  
 
    
   The relationships between a named set of attributes are precisely 
   what DUAs manipulate when they make changes to a directory entry. If 
   the attributes could be manipulated independently, they should be 
   stored in different entries. This can conveniently be done using 
   compound families of entries [5]. For example a DistributionList or 
   AccessControlGroup family parent could have a family child entry for 
   each member. This could be combined with translation of add and 
   delete attribute value operations for backward compatability with 
   existing DUAs that are unaware of multi-master replication issues. 
    
   An LDAP success response when making a change is not reliable if 
   replication updates cause the final result to differ from that 
   expected by each of the DUAs to which success is reported. A success 
   response is also not reliable if it can sometimes be followed by a 
   revocation. But at least applications can then be designed to take 
   some corrective action before end users are affected, rather than 
   after perplexing application behaviour has been reported from end 
   users to help desks.  
    
   The first thing anyone responding to a problem report would want to 
   know is "who changed what?". If Multi-Master replication makes it 
   impossible to answer that question, it is broken. System 
   administrators are in short supply and have better things to do. 
    
   The consequences of not supporting "modifiersName" are easily 
   understood by directory users and could turn out to be a major 
   competitive disadvantage for vendors making that choice (guess who 
   ;-) - unless of course such vendors are rescued by adoption of 
   standards endorsing their choice. 
    
   The following alternative proposals are not based on any 
   implementation or operational experience and so are necessarily half 
   baked. They are presented purely for feedback. Hopefully there is 
   enough here to enable meaningful discussion of feasability, and to 
   prompt clarification of requirements to specify just what guarantees 
   are actually provided by LDUP multi-master protocols and what are 
   not. Decisions to break the data model of LDAP v3 should not just be 
   glossed over or buried within architecture of procedural documents. 
   They should be clearly explained and justified in a requirements 
   document, so as to actively solicit input from the many other areas 
   that will be relying on directory services. 
    
    
4. Architecture Overview 
    
   Each replica reports each change it originates to all other 
   replicas. These reports are stored and propagated whether or not 
   they result in updates. The records for each entry are not a linear 
   or totally ordered log, but a partially ordered tree or forest. 
   Subsequent reports may result in a previously ignored report being 

 
 Langer         Informational - Expires 7 October 2000        [Page 4] 
INTERNET-DRAFT  Multiple Draft Conflict Resolution - MDCR  7 April 2000  
 
   used in an update, switching from draft entries in one branch of the 
   tree to another. 
    
   Each replica independently builds a tree faithfully reflecting the 
   actual state of conflicting concurrent changes to an entry at all 
   replicas, based on its knowledge from reports received. Each replica 
   independently applies deterministic procedures to update its own 
   copy of an entry based on the knowledge it has. This knowledge 
   includes the "modifiersName" of the DUA that made each change, so 
   that conflicts with concurrent changes to access control information 
   affecting a change can also be detected.  
 
   The knowledge converges as reports propagate, so the state of the 
   entry at all replicas converges when all conflicting changes have 
   been propagated. 
    
   The update procedures select a sequence of changes that could have 
   been made as atomic, isolated and serialized changes at a single 
   master DSA, even though they may have actually been made at 
   different replicas. These changes may become durable. Conflicting 
   concurrent changes, not part of a durable sequence, are suppressed 
   by convergence to the entry state resulting from a durable sequence. 
   The conflicting changes are ephemeral. 
    
   As each replica has a complete history of any conflict, DUAs whose 
   changes were ephemeral and those whose changes were durable, can be 
   notified with retractions and confirmations. The history can then be 
   discarded, but can also be kept as a valuable audit trail. This 
   prepares a foundation for enabling transactions that affect more 
   than one entry. 
    
   The use of protocolOps to describe changes instead of primitives 
   avoids impeding future standardization of signed operations. 
    
   Clock synchronization is not required. 
    
   Report propagation procedures are designed to minimize replication 
   bandwidth by eliminating duplication and making it feasible for 
   replicas that have failed to resynchronize after restoration from 
   backups, using incremental rather than full replication. (Full 
   replication is the main source of bandwidth problems because it 
   causes undesirable peaks). Sequential delivery of reports from each 
   originator is guaranteed, despite interleaved delivery of reports 
   from different originators concerning changes to the same entry. 
    
   Multi-Master DSAs (MMDSAs) must also act as suppliers to Single-
   Master DSAs (SMDSAs). Migration of existing single master 
   implementations to support single master LDUP and later multi-master 
   LDUP is made easier by the use of protocolOps and a ChangeLog in 
   MDCR. MMDSAs always know the previous state of an entry that they 
   have supplied to a consumer and can forward reports for the same 
   updates that they apply to that entry themselves. SMDSAs only 
 
 Langer         Informational - Expires 7 October 2000        [Page 5] 
INTERNET-DRAFT  Multiple Draft Conflict Resolution - MDCR  7 April 2000  
 
   receive reports that they can apply immediately as updates and do 
   not need to implement MMDSA conflict resolution procedures. They  
   implement exactly the same report propagation procedures (restricted 
   to one way propagation to consumer SMDSAs), providing a natural 
   migration path for later upgrade to support Multi-Master 
   replication. 
    
   Supplying read only shadow copies also involves partial and 
   fractional replicated areas not supported for Multi-Master 
   replication, so details are not discussed here. Essentially MMDSAs 
   forward reports to shadow consumers only when they are applied in 
   updates (and local changes) instead of whenever they are received. 
   SMDSAs forward reports to other secondary shadow consumer SMDSAs 
   whenever they are received, just as MMDSAs do among themselves. 
   "Relay" DSAs could also be supported, connecting groups of MMDSAs 
   but only providing read only copies to DUAs. Likewise separate 
   "Connector" DUAs could implement the report propagation procedures 
   between existing DSA implementations and LDUP compliant DSAs. 
    
   The complete separation of report propagation from update processing 
   also enables possible future use of broadcast or multicast 
   protocols. 
    
   The directory is intended for non-volatile data, so conflicts are 
   rare and the probability of a particular number of concurrent 
   conflicts decreases exponentially with the number. The goals of 
   Multi-Master replication are high availability and local 
   availability of changeable copies. Occasional concurrent changes to 
   the same entry are not a goal but an unavoidable consequence. 
    
   Much of the detail below is needed only to avoid manual repairs when 
   the unusual does happen, as it inevitably does in any large 
   distributed system. Actual performance is dominated by simple 
   changes with no conflicts, so there is no point in implementations 
   attempting to optimize the performance, as opposed to the 
   correctness, of handling conflicts. Typically an implementation 
   would just add change records to a simple log file and keep the 
   trees of transient conflicts in RAM, resulting in greater simplicity 
   and efficiency than state based approaches. 
    
   MDCR is aimed at improved handling of conflicting modify changes. It 
   does not directly offer any significant improvement for add, del or 
   modifyDN changes. These are already handled by URP, so details have 
   been omitted pending further discussion with the URP authors. 
    
   Actual use of the "modifiersName" preserved by MDCR to ensure 
   updates are consistent with concurrent changes to access controls 
   involves consideration of all four types of changes and access 
   control mechanisms. This has therefore also been deferred. 
    
   Actual use of the history preserved by MDCR to issue confirmations 
   and retractions is closely related to work on notifications and 
 
 Langer         Informational - Expires 7 October 2000        [Page 6] 
INTERNET-DRAFT  Multiple Draft Conflict Resolution - MDCR  7 April 2000  
 
   general grouping or transactions affecting more than one entry. This 
   has therefore also been deferred. 
    
    
6. Information Model 
    
   When performing a change, a numbered report is generated listing 
   where the change was made, which entry was changed, what version of 
   that entry was changed, who changed it, when and how. (The great 
   remaining question, "why?" is beyond the scope of directory 
   standardization). 
    
   report ::= SEQUENCE { 
        originatorRSN    ReportSequenceNumber   -- report number 
        entryID          UUID,                  -- which 
        originator       ReplicaNumber,         -- where 
        previous         ReplicaNumber,         -- what 
        version          INTEGER,               -- what 
        modifiersName    DistinguishedName,     -- who           
        modifyTimestamp  UTCTime,               -- when 
        protocolOp       OCTETSTRING,           -- how 
        oldParentID      UUID OPTIONAL,         -- how 
        newParentID      UUID OPTIONAL          -- how 
   } 
    
   The report number, originatorRSN, is described under Report 
   Propagation. It is not used for update processing. 
 
   6.1 Which 
    
   "EntryID" is the UUID of the entry that was the target DN of the 
   ProtocolOp at the originator. Having both entryID and target DN is 
   may be important for detecting certain conflicts resulting from 
   concurrent access control and possibly schema changes. These issues 
   not discussed in this draft. 
    
   6.2 Where 
    
   "Originator" is the ReplicaNumber of the DSA where the change was 
   performed and is stored as an operational attribute of each entry. 
    
    
   6.3 What 
    
   "Previous" is the value of originator that was stored with the entry 
   immediately prior to performing the change. Previous = originator 
   when an entry is first created. 
    
   "Version" is another operational attribute, initially 1 when an 
   entry is first created, and incremented by 1 for each change. A 
   change and incrementation may occur at one or more different 
   replicas, resulting in one or more different change reports with the 
 
 Langer         Informational - Expires 7 October 2000        [Page 7] 
INTERNET-DRAFT  Multiple Draft Conflict Resolution - MDCR  7 April 2000  
 
   same next higher version but different originators. Further 
   conflicts may add more branches to the tree until all conflicting 
   reports have been fully propagated.  
    
   The combination of entryID, version and originator uniquely 
   identifies a report since changes are atomic and an originator 
   increments the version to the next consecutive number when 
   performing a change. It also uniquely identifies the attributes and 
   values of an entry draft, as explained below. 
     
   Updates are not changes and do not increment the version. Updates do 
   replace the version with a version that is not smaller, as well as 
   replacing other attributes. The replacement version from an update 
   is usually, but not always, larger. Successive versions of an entry 
   are not always consecutive but are always monotonically increasing. 
   The lexical ordering of version, modifyTimestamp and originator of 
   an entry is always strictly monotonically increasing. Version has 
   priority, so modifyTimestamp may move backwards. 
    
   This priority is based on detailed research done for the Coda file 
   system, a successor to the Andrew File System, and adopted by 
   Microsoft in Active Directory. It eliminates the need for accurate 
   clock synchronization and has other advantages. The Coda work also 
   supports "sometimes connected" clients with notifications by servers 
   about concurrent changes to information they have cached and 
   resynchronization of servers when the replication topology has 
   become disconnected. It has many other important ideas that should 
   be studied for LDUP work and provides complete open source code for 
   implementors (included in FreeBSD and some Linux distributions). 
    
    
   6.5 Who, When and How 
    
   "ModifiersName" and "modifyTimestamp" correspond to the creatorsName 
   and createTimestamp when an entry is first created and to the usual 
   operational attributes later. 
 
   "ProtocolOp" is the BER encoding of the actual add, del, modify or 
   modifyDN successful request operation as specified for the 
   protocolOp component of LDAPMessage in 4.1.1 of RFC 2251 [3]. 
    
   OldParentID and NewParentID are not used for the modify operations 
   discussed in this draft. They are for processing updates where the 
   protocolOp is an add, del or modifyDN request. 
    
   "NewParentID" is the entryID of the parent of the entry at the 
   originator, immediately after performing a change. Present iff 
   protocolOp is a modifyDN that specifies a new parent DN. 
    
   "OldParentID" is the entryID of the parent of the entry at the 
   originator, immediately prior to performing a change. Present iff 
   protocolOp is an add, del or a modifyDN. 
 
 Langer         Informational - Expires 7 October 2000        [Page 8] 
INTERNET-DRAFT  Multiple Draft Conflict Resolution - MDCR  7 April 2000  
 
    
    
7. Report Propagation 
    
   7.1 Report Sequence Numbers 
    
   A ReportSequenceNumber (RSN) is a 64 bit non-negative integer. 
    
   Each DSA maintains a currentRSN monotonically incremented when any 
   record is stored, whether from a report generated by a local change 
   or a report received. It is also incremented for each replication 
   session (so that it is incremented even if no change reports are 
   received). The currentRSN applies to a DSA as a whole and is not 
   specific to a replica within the DSA. 
    
   RSNs are not necessarily consecutive in any context. They are just 
   unique, ordered and strictly monotonically increasing. 
 
   The currentRSN of the originator is assigned as originatorRSN when a 
   change is performed, and is included in the report generated. 
    
   A localRSN is added to each report as a record number whenever a 
   record is stored at any replica. A record is just a report with the 
   localRSN record number added. The localRSN is the currentRSN of the 
   DSA holding the replica in which the record is stored. When the 
   replica is also the originator, localRSN is equal to originatorRSN. 
    
   The localRSN assigned to a report is also sent with any message 
   containing the report. The message sent is just a copy of the record 
   held by the sender. 
    
   The value of localRSN included in a message is not stored with the 
   record by the receiver. Instead the highest localRSN received from a 
   supplier during a replication session is noted by the consumer as a 
   HighwaterMark for that supplier. The consumer DSA's own currentRSN 
   is stored as the localRSN for the record of a report stored by the 
   consumer. 
    
    
   record ::= SEQUENCE { 
                localRSN ReqportSequenceNumber, 
                        -- sequential local record number 
                report 
                } 
    
   message ::= record 
        -- localUSN for a message is senders sequential message number 
    
   Each replica has an UpdatedMark for every replica. This holds, for 
   each originator, the highest originatorRSN of any record generated 
   by that originator which has been stored by the replica holding the 
   UpdatedMark. 
 
 Langer         Informational - Expires 7 October 2000        [Page 9] 
INTERNET-DRAFT  Multiple Draft Conflict Resolution - MDCR  7 April 2000  
 
    
   Each replica also has a SeenMark for every replica. This holds, for 
   each originator, the lowest UpdatedMark for that originator, known 
   by the replica holding the SeenMark to be held by any replica. 
 
    
   7.2 Replication Sessions 
    
   The supplier HighwaterMark held by the consumer is transmitted to 
   the supplier at the start of the replication session and used by the 
   supplier to select only records that have a localUSN at least as 
   high as that HighwaterMark.  
    
   Messages MUST be transmitted from the supplier to the consumer in 
   strictly increasing order by localRSN, so that the new HighwaterMark 
   recorded by the consumer at the end of the session (including the 
   end of an interrupted session) enables resuming the next session 
   from where the last session left off. 
    
   Records MUST be stored in the same sequence as messages are 
   received, so that the sequence is preserved when applying updates 
   and propagating copies of records to other neighbors. 
    
   The full set of UpdatedMarks held by the consumer is also 
   transmitted to the supplier at the start of the replication session. 
   It is used to further select from among the records selected by the 
   HighwaterMark, only those that have not already been received by the 
   consumer via other replication agreements (including changes 
   previously reported to the supplier by the consumer through 
   replication in the opposite direction). This causes gaps in the 
   sequence of localUSNs of the messages that were sent from the 
   supplier to the consumer. 
    
   Since each replica propagates reports strictly in order of localRSN, 
   it also publishes the reports corresponding to local changes 
   strictly in order of originatorRSN, and this order is preserved by 
   subsequent propagation. Although reports can arrive via multiple 
   redundant paths, with interleaving of reports from different 
   originators, the originatorRSNs from each originator can only arrive 
   in order. Any gaps in the sequence of originatorRSNs received from a 
   particular originator can only result from non-consecutive 
   originatorRSNs issued by that originator, eg because it holds 
   multiple replicas or is distributed among multiple processors. 
    
   The correct sequence of originatorRSNs for each originator is 
   maintained because any consumer always requests earlier reports 
   before later ones, by sending their set of UpdatedMarks to the 
   supplier. Any supplier either obtained the reports in sequence as a 
   consumer or generated them in sequence as the originator and 
   therefore has the earlier reports and will deliver them to the 
   consumer in sequence. Interruption of a session and resumption from 
   another supplier will merely continue where the previous supplier 
 
 Langer         Informational - Expires 7 October 2000       [Page 10] 
INTERNET-DRAFT  Multiple Draft Conflict Resolution - MDCR  7 April 2000  
 
   left off, whether or not the substitute supplier also has later 
   reports than the failed supplier. The replacement supplier will have 
   to check more of its records because the consumer cannot supply a 
   suitable HighwaterMark, but it will only transmit those records not 
   yet seen by the consumer, and with the same sequence of 
   originatorRSN for each originator (but different interleaving among 
   originators). 
    
   If a failed replica is restored from backups any reports it holds 
   that have already been propagated will not be requested by its 
   neighbours and any that have not been propagated will have also 
   prevented the propagation of later reports from the same originator 
   that have a higher originatorRSN. 
    
   The failed replica will resynchronize through a normal incremental 
   session using it's restored HighwaterMark and set of UpdatedMarks 
   and will not cause a bandwidth spike by full replication. 
    
   Thus the highest originatorUSN stored as the UpdatedMark for an 
   originator guarantees that the replica holding that mark has 
   received every report from that originator with a lower or equal 
   originatorUSN and has not received any with a higher originatorRSN. 
    
   Likewise the SeenMark stored for an originator at any replica 
   guarantees than any records stored at that replica with that 
   originator and an equal or higher originatorUSN have also been 
   stored at every other replica. 
    
   Note: These invariants still hold when reports in transit are 
   destroyed by the complete failure of intermediate replicas without 
   backups, or transmitted late after restoration from backups, 
   provided the replication topology remains connected. It also applies 
   even if an originator is excluded from the set of replicas as a 
   result of being offline for too long, since that results in later 
   reports being discarded before the failed replica can rejoin, and 
   does not affect the earlier reports already in transit. However it 
   does not follow that any similar invariants hold with respect to the 
   sequence of version numbers of entries. The invariant only applies 
   to reports generated at each originator separately. 
    
   A supplier MUST transmit only records with a higher originatorUSN 
   than the UpdateMark for the originator of the same record, as 
   provided by the consumer. 
    
   The combination of a HighwaterMark and a full set of UpdatedMarks 
   corresponds to the replicaUpdateVector in the current LDUP protocol, 
   but has a different syntax and semantics. Likewise the set of 
   SeenMarks corresponds to the PurgeVector defined in the current LDUP 
   architecture, but is used differently, as explained in section 9 
   below. 
    

 
 Langer         Informational - Expires 7 October 2000       [Page 11] 
INTERNET-DRAFT  Multiple Draft Conflict Resolution - MDCR  7 April 2000  
 
   No significant change to the current LDUP protocol [6], as distinct 
   from the procedures, would be required by adoption of these 
   procedures. Nor are there any significant changes to the current 
   drafts for subschema [7], operation framing [8], architecture [9] or 
   information model [10], so far as they describe replication 
   agreements. The current draft architecture mixes update 
   reconciliation and report propagation, so sections 4.5, 4.6 and 10.5 
   are substantially affected, with minor changes elsewhere and to some 
   of the related syntax in the information model. 
 
   Note: The above approach to propagation can also be applied on a per 
   attribute or even a per attribute value basis, with replication of 
   primitives instead of operations, and update reconciliation instead 
   of conflict resolution. Active Directory does this on a per 
   attribute basis. The highest localRSN for any of the attributes of 
   an entry is copied to a changedRSN for the entry as a whole and the 
   HighwaterMark is based on changedRSN instead of localRSN. The 
   UpdatedMarks are used as above to eliminate changes to particular 
   attributes or attribute values that have an originatorRSN not 
   greater than the UpdatedMark for the corresponding originator. 
   (ActiveDirectory does not store a "modifiersName" per attribute. It 
   does not store one per entry either, because the semantics are 
   completely broken by replicating even entire attributes 
   independently, let alone attribute values). 
    
   Also note that each replica has all the information it needs to 
   generate the "primitives" used by URP, or any other update 
   processing method, when protocolOps are propagated instead of 
   primitives. Provided that the procedures at each replica are 
   deterministic and not dependent on clock synchronization, no reports 
   need to be obsoleted by update processing during report propagation 
   and likewise no additional reports need to be generated. This avoids 
   a combinatorial explosion of reports about merged reports, even if 
   merging is being done at each replica, provided it is done 
   deterministically and independently. 
    
   Thus it would be possible to combine the URP approach and the MDCR 
   approach to update processing described below, by allowing certain 
   operations in other branches of the tree to affect updates to 
   entries. For example "add self" and "delete self" operations could 
   be applied to the member attribute of an entry for a group of names, 
   even though they are not properly serialized with respect to other 
   concurrent changes to the entry. This might not break the semantics 
   of that operation and could avoid unnecessary suppression of 
   concurrent changes that could reasonably be merged. It would still 
   break the semantics of "modifierName", but in this particular case, 
   the DUA name is the added or deleted attribute value anyway. 
   Other variations and combinations are also possible. 
    
   Architectural decisions about report propagation should be 
   independent of update processing. 
    
 
 Langer         Informational -                                Expires 7 October 2000       [Page 12] 
INTERNET-DRAFT  Multiple Draft Conflict Resolution - MDCR  7 April 2000  
 
    
8. Update Processing 
    
   Each replica stores a record for each new report as it is received, 
   to build a local complete history of the known changes made by all 
   replicas to each entry. Any resulting update is then applied to the 
   local entry based on that known history. 
    
   Each replica independently processes its current records about an 
   entry whenever a new record about that entry is stored, and appends 
   the record to a tree rooted at the record of the first change, ie 
   the creation of the entry. 
 
   The tree is usually trivial and becomes empty when no further 
   conflicting changes are being propagated for an entry. 
    
   Typically implementations would simply append records to a log file 
   and hash extracts from the records in RAM for each recently used the 
   entryID. The RAM extracts might contain shorter primitives 
   referencing only attribute value assertions and entryIDs, derived 
   from the protocolOps and other attributes held only in the log file 
   after initial processing. It could contain nothing but the links, 
   unsorted as a blob of memory for each recently changed entry. 
    
   The position of each record in this tree is specified by the 
   attributes version, originator and previous of the corresponding 
   report. Total overhead is 12 bytes per record, allowing for 4 
   billion changes per entry, 16 thousand replicas per NC and a pointer 
   into a 4 GB log file (whose relevant records would usually still be 
   in disk cache buffers when needed). 
 
   The previous attribute of a report is identical to the originator 
   attribute of its predecessor report. A predecessor report with that 
   originator attribute and the next lower version must exist, because 
   it was generated for the predecessor change that resulted in the 
   immediately previous state of the entry at the originator when a 
   change was made. The version is the depth of the record node in the 
   tree because consecutive changes result in consecutive version 
   numbers and updates do not generate reports. 
    
   The originator is unique among siblings, since the combination of 
   version and originator uniquely determines a report within an entry, 
   and all siblings have the same version. Each node is therefore named 
   by the sequence of originator attributes from the root. This 
   corresponds to a sequence of replicas at which fully serialized 
   atomic, isolated changes were applied to each previous draft to 
   produce the next, just as though the changes were made at a single 
   master DSA without any conflicts. For the initial add operation and 
   subsequent modify operations, applying the protocolOp in the report 
   will generate the attributes and values of the entry state 
   corresponding to each successive version from its predecessor.  
    
 
 Langer         Informational - Expires 7 October 2000       [Page 13] 
INTERNET-DRAFT  Multiple Draft Conflict Resolution - MDCR  7 April 2000  
 
   The tree constructed at any replica is consistent with that at any 
   other replica, ie if nodes with the same name exist in trees at two 
   replicas, they contain the same report. Again this follows because 
   nodes with the same name must have the same version as well as the 
   same originator and this uniquely determines a report. 
    
   Propagation delays result in different replicas having different 
   records and applying different updates. The trees converge as 
   reports propagate, resulting in convergence of the updates for 
   "eventual consistency". 
    
   Records may arrive out of order. The tree can be constructed despite 
   this, leaving any out of order records unattached until the gap is 
   filled. 
    
   When a new record is stored, if a record already attached to the 
   tree for the same entry has the next lower version and originator 
   equal to previous for the new record, the new record is attached as 
   a child. Otherwise it remains unattached. If it was attached, any 
   existing unattached records for that entry are checked to see if 
   they can now be attached, recursively. 
    
   There cannot be more than one predecessor to which a new child could 
   be attached because:  
    
   a) The previous attribute in a report is uniquely determined as the 
   originator attribute present in the entry immediately before the 
   change described by the report. 
    
   b) Any predecessor has an originator attribute equal to that 
   previous attribute and a version one less than the successor 
   version.  
    
   c) This combination of the predecessors originator and version 
   uniquely specifies a report within an entry. 
    
   The above pedantry establishes that the trees of records at each 
   replica based on each replica's limited current knowledge of reports 
   are always faithfully consistent with the state of the real trees of 
   conflicts. 
    
   A longest strand of the tree is selected, from the root to a leaf, 
   and the sequence of protocolOps in the records is applied in turn to 
   generate an update. This update replaces the current entry at that 
   replica. 
    
   The longest strands are those with the highest version number at the 
   leaf. If there is more than one longest strand, those with the 
   latest modifyTimestamp are selected. If there is still more than 
   one, the one with the highest originator is selected. This is unique 
   because the combination of version and originator is unique. 
    
 
 Langer         Informational - Expires 7 October 2000       [Page 14] 
INTERNET-DRAFT  Multiple Draft Conflict Resolution - MDCR  7 April 2000  
 
   The operational attributes listed in the report at the leaf of the 
   selected strand replace those attributes of the updated entry, 
   without modification. 
    
   Procedures for applying add, del and modifyDN protocolOps are not 
   discussed in this draft. Procedures for modify are obvious and 
   result in identical user attributes and values for entries that have 
   the same version and originator at different replicas. 
    
   An update is not a change. It is a replication of a previous state 
   of an entry, already made visible at some other replica. No new 
   report results from applying an update and no new conflicting draft 
   entries are created. 
    
 
9. Weeding and Summarizing 
    
   Weeding purges records of ephemeral changes that have been 
   suppressed and can no longer influence future updates to an entry. 
    
   Summarization consolidates records of durable changes that cannot be 
   suppressed by subsequent changes. 
    
   Notification procedures may be added to inform DUAs that made 
   changes about retraction of success responses for their ephemeral 
   changes or confirmation for their durable changes. 
    
   When a SeenMark for an originator rises above or equal to the 
   originatorRSN of a record of a change made by that originator, the 
   replica holding the SeenMark and record knows that all replicas must 
   have stored their own records of that change, so the record is 
   marked as "Seen". 
    
   Each replica MUST apply any update resulting from storing a change 
   record before the next increment of its currentRSN. This prevents it 
   from accepting a change from a DUA based on its version of the entry 
   prior to the update. That is necessary for the procedures below. 
    
   Note: A weaker requirement may be desirable to not constrain 
   implementations from optimizing their processing of updates. This 
   could need more complex specified delays and procedures than those 
   below. Any such specifications MUST be standardized as independent 
   optimizations are unlikely to interoperate correctly. 
    
   When a record and all its predecessors in the tree have been marked 
   "Seen", no subsequent change can occur unless it has a higher 
   version than that record. This follows because: 
    
   a) Each record in the sequence has a corresponding record stored at 
   every replica which has been appended to its predecessor in the tree 
   for its entry at that replica by the recursive update processing 

 
 Langer         Informational - Expires 7 October 2000       [Page 15] 
INTERNET-DRAFT  Multiple Draft Conflict Resolution - MDCR  7 April 2000  
 
   procedures, regardless of the order in which those records arrived 
   at any replica. 
    
   b) Each replica has also updated its copy of the entry when 
   necessary to ensure it has the highest version number among all 
   records in the tree. 
    
   c) The entry at all replicas therefore has a version not less than 
   the last of that sequence of records marked "Seen". 
    
   d) Therefore no originator can make a change that does not increment 
   the version to a higher version. 
 
   If no more conflicts are occurring for an entry, report propagation 
   must result in the SeenMark rising until all records have been 
   marked seen. 
    
   When the strand of the tree ending at the record with the same 
   version and originator as the entry, has all been marked as "Seen", 
   none of the other leaves with the same, highest, version number can 
   be visible at any replica, even if they have previously been marked 
   "Seen". This follows because they all have either an earlier 
   modifyTimestamp or the same modifyTimestamp and a smaller 
   originator. 
    
   Once that has happened, if, after a delay based on the maximum 
   possible time for propagation of a change from another replica, no 
   further reports have been received that cause an update to the 
   entry, any future updates to the entry will be based on the entry. 
    
   The delay is necessary because changes could have been made at some 
   other replica before the current state of the entry was seen, that 
   have reached a higher version and are still propagating to this 
   replica. 
    
   Note: The method of dynamically calculating the delay needs careful 
   specification. It ought to be possible to improve these procedures 
   when doing so. It may well be possible to do earlier weeding and 
   summarizing. 
    
   After the delay, the record with the same version and originator as 
   the entry and any predecessors in the tree are marked "Durable". 
   Those changes are the basis from which all future changes will be 
   made. All other records are marked "Ephemeral". They have no effect 
   on the current state of the entry at any replica and cannot have any 
   future effect. 
    
   Any DSA that has a record marked "Durable" or "Ephemeral" and is 
   originator of that record, is responsible for any notifications to 
   users of the DUAs listed as "modifierName" for those entries. 
   Notifications could include the protocolOp, the state of the entry 
   before and after the change and the current state. 
 
 Langer         Informational - Expires 7 October 2000       [Page 16] 
INTERNET-DRAFT  Multiple Draft Conflict Resolution - MDCR  7 April 2000  
 
 
   Sending notifications would presumably depend on controls the DUAs 
   used in their operations, attributes in the directory entry, perhaps 
   in some other NC, for the DUAs DN (modifiersName) etc. Details 
   should be considered in connection with other work on notifications 
   and transactions as the complete history maintained by these 
   procedures are likely to be relevant to both.  
    
   "Ephemeral" records may be weeded after issueing any notifications, 
   or if the originator was a different replica. This may only involve 
   detaching them from the tree in RAM, while retaining a full audit 
   trail in the log file. 
    
   "Durable" records can likewise be detached from the tree, as they 
   are summarized by the current visible entry. However this 
   consolidation must be preserved as the root of the tree of records 
   when the next change or update is applied, so that the protocolOps 
   in subsequent records have a base to start from when calculating 
   further updates. (If record extracts are converted to primitives it 
   may be possible for implementations to work backwards from the 
   current entry, and then forwards to a replacement). 
    
 
11. Security Considerations 
 
   The directory is commonly used to hold security related information 
   and Multi-Master replication is particularly useful for providing 
   local and high availability of changeable authentication and 
   authorization information in a "single sign-on" environment. Great 
   care is required to maintain coherence between prescriptive and 
   entry access control information, other policy information and the 
   names and other attributes of entries, and to maintain an accurate 
   audit trail. This and other proposals for Multi-Master replication 
   need careful evaluation of security considerations, which has not 
   yet been done. 
    
    
12. Acknowledgements 
    
   The courtesy of the LDUP working group in postponing a final call on 
   the current requirements draft to consider this very late submission 
   from a previous non-participant is appreciated. Thanks also to 
   Steven Legge, Alison Payne, Ed Reed and Uppili Srinivasan for taking 
   time out for discussion during a hectic IETF meeting. 
    
   The proposed mechanisms for propagation dampening and prioritization 
   of change notifications are based on those developed as part of the 
   Coda file system project [11] and implemented in Microsoft Active 
   Directory [12]. 
    
   The preference for resolving rather than reconciling conflicts and 
   the basic concept of dynamically maintaining monolithic unity by a 
 
 Langer         Informational - Expires 7 October 2000       [Page 17] 
INTERNET-DRAFT  Multiple Draft Conflict Resolution - MDCR  7 April 2000  
 
   distributed process of letting a hundred schools of thought contend, 
   selecting a most progressive path, weeding obsolete paths and 
   summing up while moving on, is of course derived from Mao [13]. 
    
   The related concept of "Multiple Drafts" contending for visibility, 
   with branching rather than linear causality, has connections to 
   theories concerning the emergence of consciousness in large neural 
   networks promoted by Daniel Dennett [14] as well as some 
   philosophies of quantum mechanics. 
    
    
13. References 
 
   1  Bradner, S., "The Internet Standards Process -- Revision 3", BCP 
      9, RFC 2026, October 1996. 
    
   2  Bradner, S., "Key words for use in RFCs to Indicate Requirement 
      Levels", BCP 14, RFC 2119, March 1997. 
    
   3  Legg, S. and Payne, A., "LDUP Update Reconciliation Procedures", 
      Internet-Draft, draft-ietf-ldup-urp-02.txt, October 1999. 
    
   4  Weiser, R.F. and Stokes, E. "LDAP V3 Replication Requirements", 
      Internet-Draft, draft ietf                           -    -ldup-replica-req-02.txt, October 1999. 
    
   5  Chadwick, D.W., "Compound (Families of) Entries", Internet-Draft, 
      draft-chadwick-families-00.txt, (expired) 5 December 1999. 
    
   6  Stokes, E. and Good, G., "The LDUP Replication Update Protocol", 
      Internet-Draft, draft-ietf-ldup-protocol-01.txt, March 2000. 
 
   7  Reed, E. "LDAP Subentry Schema", Internet-Draft, draft-ietf-ldup-
      subentry-02.txt, March 2000. 
    
   8  Stokes, E., Harrison, A. and Good, G., "Extended Operations for 
      Framing LDAP Operations", Internet-Draft, draft-ietf-ldup-
      framing-00.txt, March 2000. 
    
   9 Merrels, J., Reed, E. and Srinivasan, U. "LDAP Replication 
      Architecture", draft-ietf-ldup-model-03.txt, March 2000. 
    
   10 Reed, E., "LDUP Replication Information Model", Internet-Draft, 
      draft-ietf-ldup-infomod-01.txt, March 2000. 
    
   11 "Scientific Papers from the Coda Project", 
      http://www.cs.cmu.edu/afs/cs/project/coda/Web/docs-coda.html 
    
   12 "Active Directory Replication", Chapter 6 of "Microsoft Windows 
      2000 Server Distributed Systems Guide", Microsoft Press, 2000. 
      Also downloadable in file act_dir_guide.zip from www.msdn.com. 
    

 
 Langer         Informational - Expires 7 October 2000       [Page 18] 
INTERNET-DRAFT  Multiple Draft Conflict Resolution - MDCR  7 April 2000  
 
 
   13 Mao Tsetung, "Four Essays on Philosophy", Foreign Languages 
      Press, Peking, 1966. 
    
   14 Dennett, D.C,. "Consciousness Explained", Penguin 1992. 
    
    
14. Author's Address 
    
   Albert Langer 
   Directory Designs 
   PO Box 1116 
   North Fitzroy VIC 3068 
   AUSTRALIA 
   Email: Albert.Langer@Directory-Designs.Org 
    
    
 Full Copyright Statement 

   Copyright (C) The Internet Society (2000). All Rights Reserved. 
    
   This document and translations of it may be copied and furnished to 
   others, and derivative works that comment on or otherwise explain it 
   or assist in its implementation may be prepared, copied, published 
   and distributed, in whole or in part, without restriction of any 
   kind, provided that the above copyright notice and this paragraph 
   are included on all such copies and derivative works. However, this 
   document itself may not be modified in any way, such as by removing 
   the copyright notice or references to the Internet Society or other 
   Internet organizations, except as needed for the purpose of 
   developing Internet standards in which case the procedures for 
   copyrights defined in the Internet Standards process must be 
   followed, or as required to translate it into languages other than 
   English. 
    
   The limited permissions granted above are perpetual and will not be 
   revoked by the Internet Society or its successors or assigns. 
    
   This document and the information contained herein is provided on an 
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 
    
    
    
    




 
 Langer         Informational - Expires 7 October 2000       [Page 19]