LDUP Multiple Draft Conflict Resolution Albert Langer Internet-Draft Directory Designs Document: draft-langer-ldup-mdcr-00.txt 7 April 2000 Intended Category: Informational Expires: 7 October 2000 LDUP Multiple Draft Conflict Resolution (MDCR) Copyright (C) The Internet Society (2000). All Rights Reserved. Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 [1]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This draft is an individual submission to the IETF LDUP Working Group. Distribution of this document is unlimited. Comments should be sent to the LDUP Replication mailing list and to the author. This Internet-Draft expires on 7 October 2000. 1. Abstract Concurrent changes in a Multi-Master directory are a partially ordered tree. "LDUP Update Reconciliation Procedures" (URP) uses timestamps to impose a strict linear order on this tree, and merges changes to fit that imposed order. Merged changes violate the semantics of the LDAP/X.500 data model. No specific "modifierName" can be assigned to a merged change and requirements to support audit trails and access controls and not to break applications or impede future work on transactions cannot be met. The current LDUP requirements draft avoids dealing with these critical issues. Langer Informational - Expires 7 October 2000 [Page 1] INTERNET-DRAFT Multiple Draft Conflict Resolution - MDCR 7 April 2000 This memo, "Multiple Drafts Conflict Resolution" (MDCR) proposes an alternative. MDCR stores the tree of conflicts as "multiple drafts" of entries at each replica, and resolves the conflicts between the drafts without merging, based on the actual ordering of the tree. MDCR guarantees atomic, coherent, isolated, and serialized changes to individual entries, with convergence to eventual consistency between replicas. The trade off is that some suppressed conflicting changes become ephemeral. A complete history enables notification support for durability and rollbacks by DUAs. The choices that must be made between maintaining coherence of entries and minimizing ephemeral changes should be addressed in LDUP requirements. 2. Key terms used in this document A "change" is an LDAP v3 protocol add, del, modify or modifyDN request for which the requesting Directory User Agent (DUA) has been sent a success response by an "originator" Directory Service Agent (DSA) after schema and access control checks and application of the change to a replicated entry. A "report" is a specification of a change, published by the originator and propagated in messages forwarded to and by other DSAs that hold a replica of the same NamingContext (NC). A "record" is a copy of a report stored by the originator or a receiver. An "update" is an alteration, performed in response to receiving a report, on an entry within a replica by the DSA holding that replica. A "retraction" is an asynchronous notification that a previous success response to a DUA has been invalidated due to conflicting concurrent changes. A "confirmation" is an asynchronous notification that no retraction will be issued. A "draft" is the result of any change or update for which no confirmation has been issued. Drafts are visible to DUAs as the current state of an entry at a particular replica and may be the basis on which further changes are made by those DUAs. Transient inconsistencies between replicas affect subsequent changes as well as reads when multi-master replication is used. Entry drafts may not just be out of date like shadow copies in a single master directory, but also ephemeral like Internet-Drafts. It is inappropriate for DUAs to rely on them as anything more than "work in progress". (It is even more inappropriate for DSAs to pretend otherwise by eclectically merging drafts and randomly assigning an author ;-). Langer Informational - Expires 7 October 2000 [Page 2] INTERNET-DRAFT Multiple Draft Conflict Resolution - MDCR 7 April 2000 These distinctions are maintained by a complete separation of report propagation from update processing. Change propagation is the end result of a deliberate synthesis of both, and may include the propagation of retractions and confirmations to DUAs or their users. The key words "MUST NOT IMPEDE" are a prohibition of specifications that would make it significantly more difficult to later standardize functionality generally described by the words immediately following. Conformity is established by reference to some plausible mechanism that might be used to implement that functionality, despite the consequences of the conforming standard. Actual complete specification of the future functionality and suggested mechanism is of course left to future work. The key words "SHOULD NOT IMPEDE" mean that the needs of future standardization have been taken fully into account in drafting the conforming standard. Conformity is established either by satisfying "MUST NOT IMPEDE", or by showing that there is no known plausible mechanism by which the functionality provided by the conforming standard could do so, and by explaining the implications of the choices made. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [2]. 3. Introduction Multi-Master LDAP replication MUST NOT break applications that rely on the LDAP v3 X.500 data model and SHOULD NOT IMPEDE extensions for transactions, referential integrity, audit trails, signed operations, management of replicas with heterogeneous conforming server implementations etc. The key goals of MDCR are to satisfy those requirements, which are not satisfied by URP [3] and some of which are either omitted or ambiguously stated in the current LDUP requirements draft [4]. URP merges concurrent modifications made by different DUAs at different replicas, to attributes and values of a single entry. This inevitably breaks the semantics of "modifiersName", a mandatory operational attribute actually understood by the directory service. Grouping or transactions affecting multiple entries becomes impossible when even single entries have lost coherence. For the same reasons, URP must inevitably break the semantics of other relationships between attributes values understood only by applications. Among those applications are the management of prescriptive access controls and other policy information, and hence the viability of multi-master replication itself. Langer Informational - Expires 7 October 2000 [Page 3] INTERNET-DRAFT Multiple Draft Conflict Resolution - MDCR 7 April 2000 The relationships between a named set of attributes are precisely what DUAs manipulate when they make changes to a directory entry. If the attributes could be manipulated independently, they should be stored in different entries. This can conveniently be done using compound families of entries [5]. For example a DistributionList or AccessControlGroup family parent could have a family child entry for each member. This could be combined with translation of add and delete attribute value operations for backward compatability with existing DUAs that are unaware of multi-master replication issues. An LDAP success response when making a change is not reliable if replication updates cause the final result to differ from that expected by each of the DUAs to which success is reported. A success response is also not reliable if it can sometimes be followed by a revocation. But at least applications can then be designed to take some corrective action before end users are affected, rather than after perplexing application behaviour has been reported from end users to help desks. The first thing anyone responding to a problem report would want to know is "who changed what?". If Multi-Master replication makes it impossible to answer that question, it is broken. System administrators are in short supply and have better things to do. The consequences of not supporting "modifiersName" are easily understood by directory users and could turn out to be a major competitive disadvantage for vendors making that choice (guess who ;-) - unless of course such vendors are rescued by adoption of standards endorsing their choice. The following alternative proposals are not based on any implementation or operational experience and so are necessarily half baked. They are presented purely for feedback. Hopefully there is enough here to enable meaningful discussion of feasability, and to prompt clarification of requirements to specify just what guarantees are actually provided by LDUP multi-master protocols and what are not. Decisions to break the data model of LDAP v3 should not just be glossed over or buried within architecture of procedural documents. They should be clearly explained and justified in a requirements document, so as to actively solicit input from the many other areas that will be relying on directory services. 4. Architecture Overview Each replica reports each change it originates to all other replicas. These reports are stored and propagated whether or not they result in updates. The records for each entry are not a linear or totally ordered log, but a partially ordered tree or forest. Subsequent reports may result in a previously ignored report being Langer Informational - Expires 7 October 2000 [Page 4] INTERNET-DRAFT Multiple Draft Conflict Resolution - MDCR 7 April 2000 used in an update, switching from draft entries in one branch of the tree to another. Each replica independently builds a tree faithfully reflecting the actual state of conflicting concurrent changes to an entry at all replicas, based on its knowledge from reports received. Each replica independently applies deterministic procedures to update its own copy of an entry based on the knowledge it has. This knowledge includes the "modifiersName" of the DUA that made each change, so that conflicts with concurrent changes to access control information affecting a change can also be detected. The knowledge converges as reports propagate, so the state of the entry at all replicas converges when all conflicting changes have been propagated. The update procedures select a sequence of changes that could have been made as atomic, isolated and serialized changes at a single master DSA, even though they may have actually been made at different replicas. These changes may become durable. Conflicting concurrent changes, not part of a durable sequence, are suppressed by convergence to the entry state resulting from a durable sequence. The conflicting changes are ephemeral. As each replica has a complete history of any conflict, DUAs whose changes were ephemeral and those whose changes were durable, can be notified with retractions and confirmations. The history can then be discarded, but can also be kept as a valuable audit trail. This prepares a foundation for enabling transactions that affect more than one entry. The use of protocolOps to describe changes instead of primitives avoids impeding future standardization of signed operations. Clock synchronization is not required. Report propagation procedures are designed to minimize replication bandwidth by eliminating duplication and making it feasible for replicas that have failed to resynchronize after restoration from backups, using incremental rather than full replication. (Full replication is the main source of bandwidth problems because it causes undesirable peaks). Sequential delivery of reports from each originator is guaranteed, despite interleaved delivery of reports from different originators concerning changes to the same entry. Multi-Master DSAs (MMDSAs) must also act as suppliers to Single- Master DSAs (SMDSAs). Migration of existing single master implementations to support single master LDUP and later multi-master LDUP is made easier by the use of protocolOps and a ChangeLog in MDCR. MMDSAs always know the previous state of an entry that they have supplied to a consumer and can forward reports for the same updates that they apply to that entry themselves. SMDSAs only Langer Informational - Expires 7 October 2000 [Page 5] INTERNET-DRAFT Multiple Draft Conflict Resolution - MDCR 7 April 2000 receive reports that they can apply immediately as updates and do not need to implement MMDSA conflict resolution procedures. They implement exactly the same report propagation procedures (restricted to one way propagation to consumer SMDSAs), providing a natural migration path for later upgrade to support Multi-Master replication. Supplying read only shadow copies also involves partial and fractional replicated areas not supported for Multi-Master replication, so details are not discussed here. Essentially MMDSAs forward reports to shadow consumers only when they are applied in updates (and local changes) instead of whenever they are received. SMDSAs forward reports to other secondary shadow consumer SMDSAs whenever they are received, just as MMDSAs do among themselves. "Relay" DSAs could also be supported, connecting groups of MMDSAs but only providing read only copies to DUAs. Likewise separate "Connector" DUAs could implement the report propagation procedures between existing DSA implementations and LDUP compliant DSAs. The complete separation of report propagation from update processing also enables possible future use of broadcast or multicast protocols. The directory is intended for non-volatile data, so conflicts are rare and the probability of a particular number of concurrent conflicts decreases exponentially with the number. The goals of Multi-Master replication are high availability and local availability of changeable copies. Occasional concurrent changes to the same entry are not a goal but an unavoidable consequence. Much of the detail below is needed only to avoid manual repairs when the unusual does happen, as it inevitably does in any large distributed system. Actual performance is dominated by simple changes with no conflicts, so there is no point in implementations attempting to optimize the performance, as opposed to the correctness, of handling conflicts. Typically an implementation would just add change records to a simple log file and keep the trees of transient conflicts in RAM, resulting in greater simplicity and efficiency than state based approaches. MDCR is aimed at improved handling of conflicting modify changes. It does not directly offer any significant improvement for add, del or modifyDN changes. These are already handled by URP, so details have been omitted pending further discussion with the URP authors. Actual use of the "modifiersName" preserved by MDCR to ensure updates are consistent with concurrent changes to access controls involves consideration of all four types of changes and access control mechanisms. This has therefore also been deferred. Actual use of the history preserved by MDCR to issue confirmations and retractions is closely related to work on notifications and Langer Informational - Expires 7 October 2000 [Page 6] INTERNET-DRAFT Multiple Draft Conflict Resolution - MDCR 7 April 2000 general grouping or transactions affecting more than one entry. This has therefore also been deferred. 6. Information Model When performing a change, a numbered report is generated listing where the change was made, which entry was changed, what version of that entry was changed, who changed it, when and how. (The great remaining question, "why?" is beyond the scope of directory standardization). report ::= SEQUENCE { originatorRSN ReportSequenceNumber -- report number entryID UUID, -- which originator ReplicaNumber, -- where previous ReplicaNumber, -- what version INTEGER, -- what modifiersName DistinguishedName, -- who modifyTimestamp UTCTime, -- when protocolOp OCTETSTRING, -- how oldParentID UUID OPTIONAL, -- how newParentID UUID OPTIONAL -- how } The report number, originatorRSN, is described under Report Propagation. It is not used for update processing. 6.1 Which "EntryID" is the UUID of the entry that was the target DN of the ProtocolOp at the originator. Having both entryID and target DN is may be important for detecting certain conflicts resulting from concurrent access control and possibly schema changes. These issues not discussed in this draft. 6.2 Where "Originator" is the ReplicaNumber of the DSA where the change was performed and is stored as an operational attribute of each entry. 6.3 What "Previous" is the value of originator that was stored with the entry immediately prior to performing the change. Previous = originator when an entry is first created. "Version" is another operational attribute, initially 1 when an entry is first created, and incremented by 1 for each change. A change and incrementation may occur at one or more different replicas, resulting in one or more different change reports with the Langer Informational - Expires 7 October 2000 [Page 7] INTERNET-DRAFT Multiple Draft Conflict Resolution - MDCR 7 April 2000 same next higher version but different originators. Further conflicts may add more branches to the tree until all conflicting reports have been fully propagated. The combination of entryID, version and originator uniquely identifies a report since changes are atomic and an originator increments the version to the next consecutive number when performing a change. It also uniquely identifies the attributes and values of an entry draft, as explained below. Updates are not changes and do not increment the version. Updates do replace the version with a version that is not smaller, as well as replacing other attributes. The replacement version from an update is usually, but not always, larger. Successive versions of an entry are not always consecutive but are always monotonically increasing. The lexical ordering of version, modifyTimestamp and originator of an entry is always strictly monotonically increasing. Version has priority, so modifyTimestamp may move backwards. This priority is based on detailed research done for the Coda file system, a successor to the Andrew File System, and adopted by Microsoft in Active Directory. It eliminates the need for accurate clock synchronization and has other advantages. The Coda work also supports "sometimes connected" clients with notifications by servers about concurrent changes to information they have cached and resynchronization of servers when the replication topology has become disconnected. It has many other important ideas that should be studied for LDUP work and provides complete open source code for implementors (included in FreeBSD and some Linux distributions). 6.5 Who, When and How "ModifiersName" and "modifyTimestamp" correspond to the creatorsName and createTimestamp when an entry is first created and to the usual operational attributes later. "ProtocolOp" is the BER encoding of the actual add, del, modify or modifyDN successful request operation as specified for the protocolOp component of LDAPMessage in 4.1.1 of RFC 2251 [3]. OldParentID and NewParentID are not used for the modify operations discussed in this draft. They are for processing updates where the protocolOp is an add, del or modifyDN request. "NewParentID" is the entryID of the parent of the entry at the originator, immediately after performing a change. Present iff protocolOp is a modifyDN that specifies a new parent DN. "OldParentID" is the entryID of the parent of the entry at the originator, immediately prior to performing a change. Present iff protocolOp is an add, del or a modifyDN. Langer Informational - Expires 7 October 2000 [Page 8] INTERNET-DRAFT Multiple Draft Conflict Resolution - MDCR 7 April 2000 7. Report Propagation 7.1 Report Sequence Numbers A ReportSequenceNumber (RSN) is a 64 bit non-negative integer. Each DSA maintains a currentRSN monotonically incremented when any record is stored, whether from a report generated by a local change or a report received. It is also incremented for each replication session (so that it is incremented even if no change reports are received). The currentRSN applies to a DSA as a whole and is not specific to a replica within the DSA. RSNs are not necessarily consecutive in any context. They are just unique, ordered and strictly monotonically increasing. The currentRSN of the originator is assigned as originatorRSN when a change is performed, and is included in the report generated. A localRSN is added to each report as a record number whenever a record is stored at any replica. A record is just a report with the localRSN record number added. The localRSN is the currentRSN of the DSA holding the replica in which the record is stored. When the replica is also the originator, localRSN is equal to originatorRSN. The localRSN assigned to a report is also sent with any message containing the report. The message sent is just a copy of the record held by the sender. The value of localRSN included in a message is not stored with the record by the receiver. Instead the highest localRSN received from a supplier during a replication session is noted by the consumer as a HighwaterMark for that supplier. The consumer DSA's own currentRSN is stored as the localRSN for the record of a report stored by the consumer. record ::= SEQUENCE { localRSN ReqportSequenceNumber, -- sequential local record number report } message ::= record -- localUSN for a message is senders sequential message number Each replica has an UpdatedMark for every replica. This holds, for each originator, the highest originatorRSN of any record generated by that originator which has been stored by the replica holding the UpdatedMark. Langer Informational - Expires 7 October 2000 [Page 9] INTERNET-DRAFT Multiple Draft Conflict Resolution - MDCR 7 April 2000 Each replica also has a SeenMark for every replica. This holds, for each originator, the lowest UpdatedMark for that originator, known by the replica holding the SeenMark to be held by any replica. 7.2 Replication Sessions The supplier HighwaterMark held by the consumer is transmitted to the supplier at the start of the replication session and used by the supplier to select only records that have a localUSN at least as high as that HighwaterMark. Messages MUST be transmitted from the supplier to the consumer in strictly increasing order by localRSN, so that the new HighwaterMark recorded by the consumer at the end of the session (including the end of an interrupted session) enables resuming the next session from where the last session left off. Records MUST be stored in the same sequence as messages are received, so that the sequence is preserved when applying updates and propagating copies of records to other neighbors. The full set of UpdatedMarks held by the consumer is also transmitted to the supplier at the start of the replication session. It is used to further select from among the records selected by the HighwaterMark, only those that have not already been received by the consumer via other replication agreements (including changes previously reported to the supplier by the consumer through replication in the opposite direction). This causes gaps in the sequence of localUSNs of the messages that were sent from the supplier to the consumer. Since each replica propagates reports strictly in order of localRSN, it also publishes the reports corresponding to local changes strictly in order of originatorRSN, and this order is preserved by subsequent propagation. Although reports can arrive via multiple redundant paths, with interleaving of reports from different originators, the originatorRSNs from each originator can only arrive in order. Any gaps in the sequence of originatorRSNs received from a particular originator can only result from non-consecutive originatorRSNs issued by that originator, eg because it holds multiple replicas or is distributed among multiple processors. The correct sequence of originatorRSNs for each originator is maintained because any consumer always requests earlier reports before later ones, by sending their set of UpdatedMarks to the supplier. Any supplier either obtained the reports in sequence as a consumer or generated them in sequence as the originator and therefore has the earlier reports and will deliver them to the consumer in sequence. Interruption of a session and resumption from another supplier will merely continue where the previous supplier Langer Informational - Expires 7 October 2000 [Page 10] INTERNET-DRAFT Multiple Draft Conflict Resolution - MDCR 7 April 2000 left off, whether or not the substitute supplier also has later reports than the failed supplier. The replacement supplier will have to check more of its records because the consumer cannot supply a suitable HighwaterMark, but it will only transmit those records not yet seen by the consumer, and with the same sequence of originatorRSN for each originator (but different interleaving among originators). If a failed replica is restored from backups any reports it holds that have already been propagated will not be requested by its neighbours and any that have not been propagated will have also prevented the propagation of later reports from the same originator that have a higher originatorRSN. The failed replica will resynchronize through a normal incremental session using it's restored HighwaterMark and set of UpdatedMarks and will not cause a bandwidth spike by full replication. Thus the highest originatorUSN stored as the UpdatedMark for an originator guarantees that the replica holding that mark has received every report from that originator with a lower or equal originatorUSN and has not received any with a higher originatorRSN. Likewise the SeenMark stored for an originator at any replica guarantees than any records stored at that replica with that originator and an equal or higher originatorUSN have also been stored at every other replica. Note: These invariants still hold when reports in transit are destroyed by the complete failure of intermediate replicas without backups, or transmitted late after restoration from backups, provided the replication topology remains connected. It also applies even if an originator is excluded from the set of replicas as a result of being offline for too long, since that results in later reports being discarded before the failed replica can rejoin, and does not affect the earlier reports already in transit. However it does not follow that any similar invariants hold with respect to the sequence of version numbers of entries. The invariant only applies to reports generated at each originator separately. A supplier MUST transmit only records with a higher originatorUSN than the UpdateMark for the originator of the same record, as provided by the consumer. The combination of a HighwaterMark and a full set of UpdatedMarks corresponds to the replicaUpdateVector in the current LDUP protocol, but has a different syntax and semantics. Likewise the set of SeenMarks corresponds to the PurgeVector defined in the current LDUP architecture, but is used differently, as explained in section 9 below. Langer Informational - Expires 7 October 2000 [Page 11] INTERNET-DRAFT Multiple Draft Conflict Resolution - MDCR 7 April 2000 No significant change to the current LDUP protocol [6], as distinct from the procedures, would be required by adoption of these procedures. Nor are there any significant changes to the current drafts for subschema [7], operation framing [8], architecture [9] or information model [10], so far as they describe replication agreements. The current draft architecture mixes update reconciliation and report propagation, so sections 4.5, 4.6 and 10.5 are substantially affected, with minor changes elsewhere and to some of the related syntax in the information model. Note: The above approach to propagation can also be applied on a per attribute or even a per attribute value basis, with replication of primitives instead of operations, and update reconciliation instead of conflict resolution. Active Directory does this on a per attribute basis. The highest localRSN for any of the attributes of an entry is copied to a changedRSN for the entry as a whole and the HighwaterMark is based on changedRSN instead of localRSN. The UpdatedMarks are used as above to eliminate changes to particular attributes or attribute values that have an originatorRSN not greater than the UpdatedMark for the corresponding originator. (ActiveDirectory does not store a "modifiersName" per attribute. It does not store one per entry either, because the semantics are completely broken by replicating even entire attributes independently, let alone attribute values). Also note that each replica has all the information it needs to generate the "primitives" used by URP, or any other update processing method, when protocolOps are propagated instead of primitives. Provided that the procedures at each replica are deterministic and not dependent on clock synchronization, no reports need to be obsoleted by update processing during report propagation and likewise no additional reports need to be generated. This avoids a combinatorial explosion of reports about merged reports, even if merging is being done at each replica, provided it is done deterministically and independently. Thus it would be possible to combine the URP approach and the MDCR approach to update processing described below, by allowing certain operations in other branches of the tree to affect updates to entries. For example "add self" and "delete self" operations could be applied to the member attribute of an entry for a group of names, even though they are not properly serialized with respect to other concurrent changes to the entry. This might not break the semantics of that operation and could avoid unnecessary suppression of concurrent changes that could reasonably be merged. It would still break the semantics of "modifierName", but in this particular case, the DUA name is the added or deleted attribute value anyway. Other variations and combinations are also possible. Architectural decisions about report propagation should be independent of update processing. Langer Informational - Expires 7 October 2000 [Page 12] INTERNET-DRAFT Multiple Draft Conflict Resolution - MDCR 7 April 2000 8. Update Processing Each replica stores a record for each new report as it is received, to build a local complete history of the known changes made by all replicas to each entry. Any resulting update is then applied to the local entry based on that known history. Each replica independently processes its current records about an entry whenever a new record about that entry is stored, and appends the record to a tree rooted at the record of the first change, ie the creation of the entry. The tree is usually trivial and becomes empty when no further conflicting changes are being propagated for an entry. Typically implementations would simply append records to a log file and hash extracts from the records in RAM for each recently used the entryID. The RAM extracts might contain shorter primitives referencing only attribute value assertions and entryIDs, derived from the protocolOps and other attributes held only in the log file after initial processing. It could contain nothing but the links, unsorted as a blob of memory for each recently changed entry. The position of each record in this tree is specified by the attributes version, originator and previous of the corresponding report. Total overhead is 12 bytes per record, allowing for 4 billion changes per entry, 16 thousand replicas per NC and a pointer into a 4 GB log file (whose relevant records would usually still be in disk cache buffers when needed). The previous attribute of a report is identical to the originator attribute of its predecessor report. A predecessor report with that originator attribute and the next lower version must exist, because it was generated for the predecessor change that resulted in the immediately previous state of the entry at the originator when a change was made. The version is the depth of the record node in the tree because consecutive changes result in consecutive version numbers and updates do not generate reports. The originator is unique among siblings, since the combination of version and originator uniquely determines a report within an entry, and all siblings have the same version. Each node is therefore named by the sequence of originator attributes from the root. This corresponds to a sequence of replicas at which fully serialized atomic, isolated changes were applied to each previous draft to produce the next, just as though the changes were made at a single master DSA without any conflicts. For the initial add operation and subsequent modify operations, applying the protocolOp in the report will generate the attributes and values of the entry state corresponding to each successive version from its predecessor. Langer Informational - Expires 7 October 2000 [Page 13] INTERNET-DRAFT Multiple Draft Conflict Resolution - MDCR 7 April 2000 The tree constructed at any replica is consistent with that at any other replica, ie if nodes with the same name exist in trees at two replicas, they contain the same report. Again this follows because nodes with the same name must have the same version as well as the same originator and this uniquely determines a report. Propagation delays result in different replicas having different records and applying different updates. The trees converge as reports propagate, resulting in convergence of the updates for "eventual consistency". Records may arrive out of order. The tree can be constructed despite this, leaving any out of order records unattached until the gap is filled. When a new record is stored, if a record already attached to the tree for the same entry has the next lower version and originator equal to previous for the new record, the new record is attached as a child. Otherwise it remains unattached. If it was attached, any existing unattached records for that entry are checked to see if they can now be attached, recursively. There cannot be more than one predecessor to which a new child could be attached because: a) The previous attribute in a report is uniquely determined as the originator attribute present in the entry immediately before the change described by the report. b) Any predecessor has an originator attribute equal to that previous attribute and a version one less than the successor version. c) This combination of the predecessors originator and version uniquely specifies a report within an entry. The above pedantry establishes that the trees of records at each replica based on each replica's limited current knowledge of reports are always faithfully consistent with the state of the real trees of conflicts. A longest strand of the tree is selected, from the root to a leaf, and the sequence of protocolOps in the records is applied in turn to generate an update. This update replaces the current entry at that replica. The longest strands are those with the highest version number at the leaf. If there is more than one longest strand, those with the latest modifyTimestamp are selected. If there is still more than one, the one with the highest originator is selected. This is unique because the combination of version and originator is unique. Langer Informational - Expires 7 October 2000 [Page 14] INTERNET-DRAFT Multiple Draft Conflict Resolution - MDCR 7 April 2000 The operational attributes listed in the report at the leaf of the selected strand replace those attributes of the updated entry, without modification. Procedures for applying add, del and modifyDN protocolOps are not discussed in this draft. Procedures for modify are obvious and result in identical user attributes and values for entries that have the same version and originator at different replicas. An update is not a change. It is a replication of a previous state of an entry, already made visible at some other replica. No new report results from applying an update and no new conflicting draft entries are created. 9. Weeding and Summarizing Weeding purges records of ephemeral changes that have been suppressed and can no longer influence future updates to an entry. Summarization consolidates records of durable changes that cannot be suppressed by subsequent changes. Notification procedures may be added to inform DUAs that made changes about retraction of success responses for their ephemeral changes or confirmation for their durable changes. When a SeenMark for an originator rises above or equal to the originatorRSN of a record of a change made by that originator, the replica holding the SeenMark and record knows that all replicas must have stored their own records of that change, so the record is marked as "Seen". Each replica MUST apply any update resulting from storing a change record before the next increment of its currentRSN. This prevents it from accepting a change from a DUA based on its version of the entry prior to the update. That is necessary for the procedures below. Note: A weaker requirement may be desirable to not constrain implementations from optimizing their processing of updates. This could need more complex specified delays and procedures than those below. Any such specifications MUST be standardized as independent optimizations are unlikely to interoperate correctly. When a record and all its predecessors in the tree have been marked "Seen", no subsequent change can occur unless it has a higher version than that record. This follows because: a) Each record in the sequence has a corresponding record stored at every replica which has been appended to its predecessor in the tree for its entry at that replica by the recursive update processing Langer Informational - Expires 7 October 2000 [Page 15] INTERNET-DRAFT Multiple Draft Conflict Resolution - MDCR 7 April 2000 procedures, regardless of the order in which those records arrived at any replica. b) Each replica has also updated its copy of the entry when necessary to ensure it has the highest version number among all records in the tree. c) The entry at all replicas therefore has a version not less than the last of that sequence of records marked "Seen". d) Therefore no originator can make a change that does not increment the version to a higher version. If no more conflicts are occurring for an entry, report propagation must result in the SeenMark rising until all records have been marked seen. When the strand of the tree ending at the record with the same version and originator as the entry, has all been marked as "Seen", none of the other leaves with the same, highest, version number can be visible at any replica, even if they have previously been marked "Seen". This follows because they all have either an earlier modifyTimestamp or the same modifyTimestamp and a smaller originator. Once that has happened, if, after a delay based on the maximum possible time for propagation of a change from another replica, no further reports have been received that cause an update to the entry, any future updates to the entry will be based on the entry. The delay is necessary because changes could have been made at some other replica before the current state of the entry was seen, that have reached a higher version and are still propagating to this replica. Note: The method of dynamically calculating the delay needs careful specification. It ought to be possible to improve these procedures when doing so. It may well be possible to do earlier weeding and summarizing. After the delay, the record with the same version and originator as the entry and any predecessors in the tree are marked "Durable". Those changes are the basis from which all future changes will be made. All other records are marked "Ephemeral". They have no effect on the current state of the entry at any replica and cannot have any future effect. Any DSA that has a record marked "Durable" or "Ephemeral" and is originator of that record, is responsible for any notifications to users of the DUAs listed as "modifierName" for those entries. Notifications could include the protocolOp, the state of the entry before and after the change and the current state. Langer Informational - Expires 7 October 2000 [Page 16] INTERNET-DRAFT Multiple Draft Conflict Resolution - MDCR 7 April 2000 Sending notifications would presumably depend on controls the DUAs used in their operations, attributes in the directory entry, perhaps in some other NC, for the DUAs DN (modifiersName) etc. Details should be considered in connection with other work on notifications and transactions as the complete history maintained by these procedures are likely to be relevant to both. "Ephemeral" records may be weeded after issueing any notifications, or if the originator was a different replica. This may only involve detaching them from the tree in RAM, while retaining a full audit trail in the log file. "Durable" records can likewise be detached from the tree, as they are summarized by the current visible entry. However this consolidation must be preserved as the root of the tree of records when the next change or update is applied, so that the protocolOps in subsequent records have a base to start from when calculating further updates. (If record extracts are converted to primitives it may be possible for implementations to work backwards from the current entry, and then forwards to a replacement). 11. Security Considerations The directory is commonly used to hold security related information and Multi-Master replication is particularly useful for providing local and high availability of changeable authentication and authorization information in a "single sign-on" environment. Great care is required to maintain coherence between prescriptive and entry access control information, other policy information and the names and other attributes of entries, and to maintain an accurate audit trail. This and other proposals for Multi-Master replication need careful evaluation of security considerations, which has not yet been done. 12. Acknowledgements The courtesy of the LDUP working group in postponing a final call on the current requirements draft to consider this very late submission from a previous non-participant is appreciated. Thanks also to Steven Legge, Alison Payne, Ed Reed and Uppili Srinivasan for taking time out for discussion during a hectic IETF meeting. The proposed mechanisms for propagation dampening and prioritization of change notifications are based on those developed as part of the Coda file system project [11] and implemented in Microsoft Active Directory [12]. The preference for resolving rather than reconciling conflicts and the basic concept of dynamically maintaining monolithic unity by a Langer Informational - Expires 7 October 2000 [Page 17] INTERNET-DRAFT Multiple Draft Conflict Resolution - MDCR 7 April 2000 distributed process of letting a hundred schools of thought contend, selecting a most progressive path, weeding obsolete paths and summing up while moving on, is of course derived from Mao [13]. The related concept of "Multiple Drafts" contending for visibility, with branching rather than linear causality, has connections to theories concerning the emergence of consciousness in large neural networks promoted by Daniel Dennett [14] as well as some philosophies of quantum mechanics. 13. References 1 Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, RFC 2026, October 1996. 2 Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 3 Legg, S. and Payne, A., "LDUP Update Reconciliation Procedures", Internet-Draft, draft-ietf-ldup-urp-02.txt, October 1999. 4 Weiser, R.F. and Stokes, E. "LDAP V3 Replication Requirements", Internet-Draft, draft ietf - -ldup-replica-req-02.txt, October 1999. 5 Chadwick, D.W., "Compound (Families of) Entries", Internet-Draft, draft-chadwick-families-00.txt, (expired) 5 December 1999. 6 Stokes, E. and Good, G., "The LDUP Replication Update Protocol", Internet-Draft, draft-ietf-ldup-protocol-01.txt, March 2000. 7 Reed, E. "LDAP Subentry Schema", Internet-Draft, draft-ietf-ldup- subentry-02.txt, March 2000. 8 Stokes, E., Harrison, A. and Good, G., "Extended Operations for Framing LDAP Operations", Internet-Draft, draft-ietf-ldup- framing-00.txt, March 2000. 9 Merrels, J., Reed, E. and Srinivasan, U. "LDAP Replication Architecture", draft-ietf-ldup-model-03.txt, March 2000. 10 Reed, E., "LDUP Replication Information Model", Internet-Draft, draft-ietf-ldup-infomod-01.txt, March 2000. 11 "Scientific Papers from the Coda Project", http://www.cs.cmu.edu/afs/cs/project/coda/Web/docs-coda.html 12 "Active Directory Replication", Chapter 6 of "Microsoft Windows 2000 Server Distributed Systems Guide", Microsoft Press, 2000. Also downloadable in file act_dir_guide.zip from www.msdn.com. Langer Informational - Expires 7 October 2000 [Page 18] INTERNET-DRAFT Multiple Draft Conflict Resolution - MDCR 7 April 2000 13 Mao Tsetung, "Four Essays on Philosophy", Foreign Languages Press, Peking, 1966. 14 Dennett, D.C,. "Consciousness Explained", Penguin 1992. 14. Author's Address Albert Langer Directory Designs PO Box 1116 North Fitzroy VIC 3068 AUSTRALIA Email: Albert.Langer@Directory-Designs.Org Full Copyright Statement Copyright (C) The Internet Society (2000). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Langer Informational - Expires 7 October 2000 [Page 19]