Internet DRAFT - draft-langer-ldup-mdcr
draft-langer-ldup-mdcr
LDUP Multiple Draft Conflict Resolution Albert Langer
Internet-Draft Directory Designs
Document: draft-langer-ldup-mdcr-00.txt 7 April 2000
Intended Category: Informational
Expires: 7 October 2000
LDUP Multiple Draft Conflict Resolution (MDCR)
Copyright (C) The Internet Society (2000). All Rights Reserved.
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026 [1].
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts. Internet-Drafts are draft documents valid for a maximum of
six months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet- Drafts
as reference material or to cite them other than as "work in
progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This draft is an individual submission to the IETF LDUP Working
Group. Distribution of this document is unlimited. Comments should
be sent to the LDUP Replication mailing list <ldup@imc.org> and to
the author.
This Internet-Draft expires on 7 October 2000.
1. Abstract
Concurrent changes in a Multi-Master directory are a partially
ordered tree. "LDUP Update Reconciliation Procedures" (URP) uses
timestamps to impose a strict linear order on this tree, and merges
changes to fit that imposed order. Merged changes violate the
semantics of the LDAP/X.500 data model. No specific "modifierName"
can be assigned to a merged change and requirements to support audit
trails and access controls and not to break applications or impede
future work on transactions cannot be met. The current LDUP
requirements draft avoids dealing with these critical issues.
Langer Informational - Expires 7 October 2000 [Page 1]
INTERNET-DRAFT Multiple Draft Conflict Resolution - MDCR 7 April 2000
This memo, "Multiple Drafts Conflict Resolution" (MDCR) proposes an
alternative. MDCR stores the tree of conflicts as "multiple drafts"
of entries at each replica, and resolves the conflicts between the
drafts without merging, based on the actual ordering of the tree.
MDCR guarantees atomic, coherent, isolated, and serialized changes
to individual entries, with convergence to eventual consistency
between replicas. The trade off is that some suppressed conflicting
changes become ephemeral. A complete history enables notification
support for durability and rollbacks by DUAs.
The choices that must be made between maintaining coherence of
entries and minimizing ephemeral changes should be addressed in LDUP
requirements.
2. Key terms used in this document
A "change" is an LDAP v3 protocol add, del, modify or modifyDN
request for which the requesting Directory User Agent (DUA) has been
sent a success response by an "originator" Directory Service Agent
(DSA) after schema and access control checks and application of the
change to a replicated entry.
A "report" is a specification of a change, published by the
originator and propagated in messages forwarded to and by other DSAs
that hold a replica of the same NamingContext (NC).
A "record" is a copy of a report stored by the originator or a
receiver.
An "update" is an alteration, performed in response to receiving a
report, on an entry within a replica by the DSA holding that
replica.
A "retraction" is an asynchronous notification that a previous
success response to a DUA has been invalidated due to conflicting
concurrent changes.
A "confirmation" is an asynchronous notification that no retraction
will be issued.
A "draft" is the result of any change or update for which no
confirmation has been issued. Drafts are visible to DUAs as the
current state of an entry at a particular replica and may be the
basis on which further changes are made by those DUAs. Transient
inconsistencies between replicas affect subsequent changes as well
as reads when multi-master replication is used. Entry drafts may not
just be out of date like shadow copies in a single master directory,
but also ephemeral like Internet-Drafts. It is inappropriate for
DUAs to rely on them as anything more than "work in progress". (It
is even more inappropriate for DSAs to pretend otherwise by
eclectically merging drafts and randomly assigning an author ;-).
Langer Informational - Expires 7 October 2000 [Page 2]
INTERNET-DRAFT Multiple Draft Conflict Resolution - MDCR 7 April 2000
These distinctions are maintained by a complete separation of report
propagation from update processing. Change propagation is the end
result of a deliberate synthesis of both, and may include the
propagation of retractions and confirmations to DUAs or their users.
The key words "MUST NOT IMPEDE" are a prohibition of specifications
that would make it significantly more difficult to later standardize
functionality generally described by the words immediately
following. Conformity is established by reference to some plausible
mechanism that might be used to implement that functionality,
despite the consequences of the conforming standard. Actual complete
specification of the future functionality and suggested mechanism is
of course left to future work.
The key words "SHOULD NOT IMPEDE" mean that the needs of future
standardization have been taken fully into account in drafting the
conforming standard. Conformity is established either by satisfying
"MUST NOT IMPEDE", or by showing that there is no known plausible
mechanism by which the functionality provided by the conforming
standard could do so, and by explaining the implications of the
choices made.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
this document are to be interpreted as described in RFC-2119 [2].
3. Introduction
Multi-Master LDAP replication MUST NOT break applications that rely
on the LDAP v3 X.500 data model and SHOULD NOT IMPEDE extensions for
transactions, referential integrity, audit trails, signed
operations, management of replicas with heterogeneous conforming
server implementations etc.
The key goals of MDCR are to satisfy those requirements, which are
not satisfied by URP [3] and some of which are either omitted or
ambiguously stated in the current LDUP requirements draft [4].
URP merges concurrent modifications made by different DUAs at
different replicas, to attributes and values of a single entry. This
inevitably breaks the semantics of "modifiersName", a mandatory
operational attribute actually understood by the directory service.
Grouping or transactions affecting multiple entries becomes
impossible when even single entries have lost coherence.
For the same reasons, URP must inevitably break the semantics of
other relationships between attributes values understood only by
applications. Among those applications are the management of
prescriptive access controls and other policy information, and hence
the viability of multi-master replication itself.
Langer Informational - Expires 7 October 2000 [Page 3]
INTERNET-DRAFT Multiple Draft Conflict Resolution - MDCR 7 April 2000
The relationships between a named set of attributes are precisely
what DUAs manipulate when they make changes to a directory entry. If
the attributes could be manipulated independently, they should be
stored in different entries. This can conveniently be done using
compound families of entries [5]. For example a DistributionList or
AccessControlGroup family parent could have a family child entry for
each member. This could be combined with translation of add and
delete attribute value operations for backward compatability with
existing DUAs that are unaware of multi-master replication issues.
An LDAP success response when making a change is not reliable if
replication updates cause the final result to differ from that
expected by each of the DUAs to which success is reported. A success
response is also not reliable if it can sometimes be followed by a
revocation. But at least applications can then be designed to take
some corrective action before end users are affected, rather than
after perplexing application behaviour has been reported from end
users to help desks.
The first thing anyone responding to a problem report would want to
know is "who changed what?". If Multi-Master replication makes it
impossible to answer that question, it is broken. System
administrators are in short supply and have better things to do.
The consequences of not supporting "modifiersName" are easily
understood by directory users and could turn out to be a major
competitive disadvantage for vendors making that choice (guess who
;-) - unless of course such vendors are rescued by adoption of
standards endorsing their choice.
The following alternative proposals are not based on any
implementation or operational experience and so are necessarily half
baked. They are presented purely for feedback. Hopefully there is
enough here to enable meaningful discussion of feasability, and to
prompt clarification of requirements to specify just what guarantees
are actually provided by LDUP multi-master protocols and what are
not. Decisions to break the data model of LDAP v3 should not just be
glossed over or buried within architecture of procedural documents.
They should be clearly explained and justified in a requirements
document, so as to actively solicit input from the many other areas
that will be relying on directory services.
4. Architecture Overview
Each replica reports each change it originates to all other
replicas. These reports are stored and propagated whether or not
they result in updates. The records for each entry are not a linear
or totally ordered log, but a partially ordered tree or forest.
Subsequent reports may result in a previously ignored report being
Langer Informational - Expires 7 October 2000 [Page 4]
INTERNET-DRAFT Multiple Draft Conflict Resolution - MDCR 7 April 2000
used in an update, switching from draft entries in one branch of the
tree to another.
Each replica independently builds a tree faithfully reflecting the
actual state of conflicting concurrent changes to an entry at all
replicas, based on its knowledge from reports received. Each replica
independently applies deterministic procedures to update its own
copy of an entry based on the knowledge it has. This knowledge
includes the "modifiersName" of the DUA that made each change, so
that conflicts with concurrent changes to access control information
affecting a change can also be detected.
The knowledge converges as reports propagate, so the state of the
entry at all replicas converges when all conflicting changes have
been propagated.
The update procedures select a sequence of changes that could have
been made as atomic, isolated and serialized changes at a single
master DSA, even though they may have actually been made at
different replicas. These changes may become durable. Conflicting
concurrent changes, not part of a durable sequence, are suppressed
by convergence to the entry state resulting from a durable sequence.
The conflicting changes are ephemeral.
As each replica has a complete history of any conflict, DUAs whose
changes were ephemeral and those whose changes were durable, can be
notified with retractions and confirmations. The history can then be
discarded, but can also be kept as a valuable audit trail. This
prepares a foundation for enabling transactions that affect more
than one entry.
The use of protocolOps to describe changes instead of primitives
avoids impeding future standardization of signed operations.
Clock synchronization is not required.
Report propagation procedures are designed to minimize replication
bandwidth by eliminating duplication and making it feasible for
replicas that have failed to resynchronize after restoration from
backups, using incremental rather than full replication. (Full
replication is the main source of bandwidth problems because it
causes undesirable peaks). Sequential delivery of reports from each
originator is guaranteed, despite interleaved delivery of reports
from different originators concerning changes to the same entry.
Multi-Master DSAs (MMDSAs) must also act as suppliers to Single-
Master DSAs (SMDSAs). Migration of existing single master
implementations to support single master LDUP and later multi-master
LDUP is made easier by the use of protocolOps and a ChangeLog in
MDCR. MMDSAs always know the previous state of an entry that they
have supplied to a consumer and can forward reports for the same
updates that they apply to that entry themselves. SMDSAs only
Langer Informational - Expires 7 October 2000 [Page 5]
INTERNET-DRAFT Multiple Draft Conflict Resolution - MDCR 7 April 2000
receive reports that they can apply immediately as updates and do
not need to implement MMDSA conflict resolution procedures. They
implement exactly the same report propagation procedures (restricted
to one way propagation to consumer SMDSAs), providing a natural
migration path for later upgrade to support Multi-Master
replication.
Supplying read only shadow copies also involves partial and
fractional replicated areas not supported for Multi-Master
replication, so details are not discussed here. Essentially MMDSAs
forward reports to shadow consumers only when they are applied in
updates (and local changes) instead of whenever they are received.
SMDSAs forward reports to other secondary shadow consumer SMDSAs
whenever they are received, just as MMDSAs do among themselves.
"Relay" DSAs could also be supported, connecting groups of MMDSAs
but only providing read only copies to DUAs. Likewise separate
"Connector" DUAs could implement the report propagation procedures
between existing DSA implementations and LDUP compliant DSAs.
The complete separation of report propagation from update processing
also enables possible future use of broadcast or multicast
protocols.
The directory is intended for non-volatile data, so conflicts are
rare and the probability of a particular number of concurrent
conflicts decreases exponentially with the number. The goals of
Multi-Master replication are high availability and local
availability of changeable copies. Occasional concurrent changes to
the same entry are not a goal but an unavoidable consequence.
Much of the detail below is needed only to avoid manual repairs when
the unusual does happen, as it inevitably does in any large
distributed system. Actual performance is dominated by simple
changes with no conflicts, so there is no point in implementations
attempting to optimize the performance, as opposed to the
correctness, of handling conflicts. Typically an implementation
would just add change records to a simple log file and keep the
trees of transient conflicts in RAM, resulting in greater simplicity
and efficiency than state based approaches.
MDCR is aimed at improved handling of conflicting modify changes. It
does not directly offer any significant improvement for add, del or
modifyDN changes. These are already handled by URP, so details have
been omitted pending further discussion with the URP authors.
Actual use of the "modifiersName" preserved by MDCR to ensure
updates are consistent with concurrent changes to access controls
involves consideration of all four types of changes and access
control mechanisms. This has therefore also been deferred.
Actual use of the history preserved by MDCR to issue confirmations
and retractions is closely related to work on notifications and
Langer Informational - Expires 7 October 2000 [Page 6]
INTERNET-DRAFT Multiple Draft Conflict Resolution - MDCR 7 April 2000
general grouping or transactions affecting more than one entry. This
has therefore also been deferred.
6. Information Model
When performing a change, a numbered report is generated listing
where the change was made, which entry was changed, what version of
that entry was changed, who changed it, when and how. (The great
remaining question, "why?" is beyond the scope of directory
standardization).
report ::= SEQUENCE {
originatorRSN ReportSequenceNumber -- report number
entryID UUID, -- which
originator ReplicaNumber, -- where
previous ReplicaNumber, -- what
version INTEGER, -- what
modifiersName DistinguishedName, -- who
modifyTimestamp UTCTime, -- when
protocolOp OCTETSTRING, -- how
oldParentID UUID OPTIONAL, -- how
newParentID UUID OPTIONAL -- how
}
The report number, originatorRSN, is described under Report
Propagation. It is not used for update processing.
6.1 Which
"EntryID" is the UUID of the entry that was the target DN of the
ProtocolOp at the originator. Having both entryID and target DN is
may be important for detecting certain conflicts resulting from
concurrent access control and possibly schema changes. These issues
not discussed in this draft.
6.2 Where
"Originator" is the ReplicaNumber of the DSA where the change was
performed and is stored as an operational attribute of each entry.
6.3 What
"Previous" is the value of originator that was stored with the entry
immediately prior to performing the change. Previous = originator
when an entry is first created.
"Version" is another operational attribute, initially 1 when an
entry is first created, and incremented by 1 for each change. A
change and incrementation may occur at one or more different
replicas, resulting in one or more different change reports with the
Langer Informational - Expires 7 October 2000 [Page 7]
INTERNET-DRAFT Multiple Draft Conflict Resolution - MDCR 7 April 2000
same next higher version but different originators. Further
conflicts may add more branches to the tree until all conflicting
reports have been fully propagated.
The combination of entryID, version and originator uniquely
identifies a report since changes are atomic and an originator
increments the version to the next consecutive number when
performing a change. It also uniquely identifies the attributes and
values of an entry draft, as explained below.
Updates are not changes and do not increment the version. Updates do
replace the version with a version that is not smaller, as well as
replacing other attributes. The replacement version from an update
is usually, but not always, larger. Successive versions of an entry
are not always consecutive but are always monotonically increasing.
The lexical ordering of version, modifyTimestamp and originator of
an entry is always strictly monotonically increasing. Version has
priority, so modifyTimestamp may move backwards.
This priority is based on detailed research done for the Coda file
system, a successor to the Andrew File System, and adopted by
Microsoft in Active Directory. It eliminates the need for accurate
clock synchronization and has other advantages. The Coda work also
supports "sometimes connected" clients with notifications by servers
about concurrent changes to information they have cached and
resynchronization of servers when the replication topology has
become disconnected. It has many other important ideas that should
be studied for LDUP work and provides complete open source code for
implementors (included in FreeBSD and some Linux distributions).
6.5 Who, When and How
"ModifiersName" and "modifyTimestamp" correspond to the creatorsName
and createTimestamp when an entry is first created and to the usual
operational attributes later.
"ProtocolOp" is the BER encoding of the actual add, del, modify or
modifyDN successful request operation as specified for the
protocolOp component of LDAPMessage in 4.1.1 of RFC 2251 [3].
OldParentID and NewParentID are not used for the modify operations
discussed in this draft. They are for processing updates where the
protocolOp is an add, del or modifyDN request.
"NewParentID" is the entryID of the parent of the entry at the
originator, immediately after performing a change. Present iff
protocolOp is a modifyDN that specifies a new parent DN.
"OldParentID" is the entryID of the parent of the entry at the
originator, immediately prior to performing a change. Present iff
protocolOp is an add, del or a modifyDN.
Langer Informational - Expires 7 October 2000 [Page 8]
INTERNET-DRAFT Multiple Draft Conflict Resolution - MDCR 7 April 2000
7. Report Propagation
7.1 Report Sequence Numbers
A ReportSequenceNumber (RSN) is a 64 bit non-negative integer.
Each DSA maintains a currentRSN monotonically incremented when any
record is stored, whether from a report generated by a local change
or a report received. It is also incremented for each replication
session (so that it is incremented even if no change reports are
received). The currentRSN applies to a DSA as a whole and is not
specific to a replica within the DSA.
RSNs are not necessarily consecutive in any context. They are just
unique, ordered and strictly monotonically increasing.
The currentRSN of the originator is assigned as originatorRSN when a
change is performed, and is included in the report generated.
A localRSN is added to each report as a record number whenever a
record is stored at any replica. A record is just a report with the
localRSN record number added. The localRSN is the currentRSN of the
DSA holding the replica in which the record is stored. When the
replica is also the originator, localRSN is equal to originatorRSN.
The localRSN assigned to a report is also sent with any message
containing the report. The message sent is just a copy of the record
held by the sender.
The value of localRSN included in a message is not stored with the
record by the receiver. Instead the highest localRSN received from a
supplier during a replication session is noted by the consumer as a
HighwaterMark for that supplier. The consumer DSA's own currentRSN
is stored as the localRSN for the record of a report stored by the
consumer.
record ::= SEQUENCE {
localRSN ReqportSequenceNumber,
-- sequential local record number
report
}
message ::= record
-- localUSN for a message is senders sequential message number
Each replica has an UpdatedMark for every replica. This holds, for
each originator, the highest originatorRSN of any record generated
by that originator which has been stored by the replica holding the
UpdatedMark.
Langer Informational - Expires 7 October 2000 [Page 9]
INTERNET-DRAFT Multiple Draft Conflict Resolution - MDCR 7 April 2000
Each replica also has a SeenMark for every replica. This holds, for
each originator, the lowest UpdatedMark for that originator, known
by the replica holding the SeenMark to be held by any replica.
7.2 Replication Sessions
The supplier HighwaterMark held by the consumer is transmitted to
the supplier at the start of the replication session and used by the
supplier to select only records that have a localUSN at least as
high as that HighwaterMark.
Messages MUST be transmitted from the supplier to the consumer in
strictly increasing order by localRSN, so that the new HighwaterMark
recorded by the consumer at the end of the session (including the
end of an interrupted session) enables resuming the next session
from where the last session left off.
Records MUST be stored in the same sequence as messages are
received, so that the sequence is preserved when applying updates
and propagating copies of records to other neighbors.
The full set of UpdatedMarks held by the consumer is also
transmitted to the supplier at the start of the replication session.
It is used to further select from among the records selected by the
HighwaterMark, only those that have not already been received by the
consumer via other replication agreements (including changes
previously reported to the supplier by the consumer through
replication in the opposite direction). This causes gaps in the
sequence of localUSNs of the messages that were sent from the
supplier to the consumer.
Since each replica propagates reports strictly in order of localRSN,
it also publishes the reports corresponding to local changes
strictly in order of originatorRSN, and this order is preserved by
subsequent propagation. Although reports can arrive via multiple
redundant paths, with interleaving of reports from different
originators, the originatorRSNs from each originator can only arrive
in order. Any gaps in the sequence of originatorRSNs received from a
particular originator can only result from non-consecutive
originatorRSNs issued by that originator, eg because it holds
multiple replicas or is distributed among multiple processors.
The correct sequence of originatorRSNs for each originator is
maintained because any consumer always requests earlier reports
before later ones, by sending their set of UpdatedMarks to the
supplier. Any supplier either obtained the reports in sequence as a
consumer or generated them in sequence as the originator and
therefore has the earlier reports and will deliver them to the
consumer in sequence. Interruption of a session and resumption from
another supplier will merely continue where the previous supplier
Langer Informational - Expires 7 October 2000 [Page 10]
INTERNET-DRAFT Multiple Draft Conflict Resolution - MDCR 7 April 2000
left off, whether or not the substitute supplier also has later
reports than the failed supplier. The replacement supplier will have
to check more of its records because the consumer cannot supply a
suitable HighwaterMark, but it will only transmit those records not
yet seen by the consumer, and with the same sequence of
originatorRSN for each originator (but different interleaving among
originators).
If a failed replica is restored from backups any reports it holds
that have already been propagated will not be requested by its
neighbours and any that have not been propagated will have also
prevented the propagation of later reports from the same originator
that have a higher originatorRSN.
The failed replica will resynchronize through a normal incremental
session using it's restored HighwaterMark and set of UpdatedMarks
and will not cause a bandwidth spike by full replication.
Thus the highest originatorUSN stored as the UpdatedMark for an
originator guarantees that the replica holding that mark has
received every report from that originator with a lower or equal
originatorUSN and has not received any with a higher originatorRSN.
Likewise the SeenMark stored for an originator at any replica
guarantees than any records stored at that replica with that
originator and an equal or higher originatorUSN have also been
stored at every other replica.
Note: These invariants still hold when reports in transit are
destroyed by the complete failure of intermediate replicas without
backups, or transmitted late after restoration from backups,
provided the replication topology remains connected. It also applies
even if an originator is excluded from the set of replicas as a
result of being offline for too long, since that results in later
reports being discarded before the failed replica can rejoin, and
does not affect the earlier reports already in transit. However it
does not follow that any similar invariants hold with respect to the
sequence of version numbers of entries. The invariant only applies
to reports generated at each originator separately.
A supplier MUST transmit only records with a higher originatorUSN
than the UpdateMark for the originator of the same record, as
provided by the consumer.
The combination of a HighwaterMark and a full set of UpdatedMarks
corresponds to the replicaUpdateVector in the current LDUP protocol,
but has a different syntax and semantics. Likewise the set of
SeenMarks corresponds to the PurgeVector defined in the current LDUP
architecture, but is used differently, as explained in section 9
below.
Langer Informational - Expires 7 October 2000 [Page 11]
INTERNET-DRAFT Multiple Draft Conflict Resolution - MDCR 7 April 2000
No significant change to the current LDUP protocol [6], as distinct
from the procedures, would be required by adoption of these
procedures. Nor are there any significant changes to the current
drafts for subschema [7], operation framing [8], architecture [9] or
information model [10], so far as they describe replication
agreements. The current draft architecture mixes update
reconciliation and report propagation, so sections 4.5, 4.6 and 10.5
are substantially affected, with minor changes elsewhere and to some
of the related syntax in the information model.
Note: The above approach to propagation can also be applied on a per
attribute or even a per attribute value basis, with replication of
primitives instead of operations, and update reconciliation instead
of conflict resolution. Active Directory does this on a per
attribute basis. The highest localRSN for any of the attributes of
an entry is copied to a changedRSN for the entry as a whole and the
HighwaterMark is based on changedRSN instead of localRSN. The
UpdatedMarks are used as above to eliminate changes to particular
attributes or attribute values that have an originatorRSN not
greater than the UpdatedMark for the corresponding originator.
(ActiveDirectory does not store a "modifiersName" per attribute. It
does not store one per entry either, because the semantics are
completely broken by replicating even entire attributes
independently, let alone attribute values).
Also note that each replica has all the information it needs to
generate the "primitives" used by URP, or any other update
processing method, when protocolOps are propagated instead of
primitives. Provided that the procedures at each replica are
deterministic and not dependent on clock synchronization, no reports
need to be obsoleted by update processing during report propagation
and likewise no additional reports need to be generated. This avoids
a combinatorial explosion of reports about merged reports, even if
merging is being done at each replica, provided it is done
deterministically and independently.
Thus it would be possible to combine the URP approach and the MDCR
approach to update processing described below, by allowing certain
operations in other branches of the tree to affect updates to
entries. For example "add self" and "delete self" operations could
be applied to the member attribute of an entry for a group of names,
even though they are not properly serialized with respect to other
concurrent changes to the entry. This might not break the semantics
of that operation and could avoid unnecessary suppression of
concurrent changes that could reasonably be merged. It would still
break the semantics of "modifierName", but in this particular case,
the DUA name is the added or deleted attribute value anyway.
Other variations and combinations are also possible.
Architectural decisions about report propagation should be
independent of update processing.
Langer Informational - Expires 7 October 2000 [Page 12]
INTERNET-DRAFT Multiple Draft Conflict Resolution - MDCR 7 April 2000
8. Update Processing
Each replica stores a record for each new report as it is received,
to build a local complete history of the known changes made by all
replicas to each entry. Any resulting update is then applied to the
local entry based on that known history.
Each replica independently processes its current records about an
entry whenever a new record about that entry is stored, and appends
the record to a tree rooted at the record of the first change, ie
the creation of the entry.
The tree is usually trivial and becomes empty when no further
conflicting changes are being propagated for an entry.
Typically implementations would simply append records to a log file
and hash extracts from the records in RAM for each recently used the
entryID. The RAM extracts might contain shorter primitives
referencing only attribute value assertions and entryIDs, derived
from the protocolOps and other attributes held only in the log file
after initial processing. It could contain nothing but the links,
unsorted as a blob of memory for each recently changed entry.
The position of each record in this tree is specified by the
attributes version, originator and previous of the corresponding
report. Total overhead is 12 bytes per record, allowing for 4
billion changes per entry, 16 thousand replicas per NC and a pointer
into a 4 GB log file (whose relevant records would usually still be
in disk cache buffers when needed).
The previous attribute of a report is identical to the originator
attribute of its predecessor report. A predecessor report with that
originator attribute and the next lower version must exist, because
it was generated for the predecessor change that resulted in the
immediately previous state of the entry at the originator when a
change was made. The version is the depth of the record node in the
tree because consecutive changes result in consecutive version
numbers and updates do not generate reports.
The originator is unique among siblings, since the combination of
version and originator uniquely determines a report within an entry,
and all siblings have the same version. Each node is therefore named
by the sequence of originator attributes from the root. This
corresponds to a sequence of replicas at which fully serialized
atomic, isolated changes were applied to each previous draft to
produce the next, just as though the changes were made at a single
master DSA without any conflicts. For the initial add operation and
subsequent modify operations, applying the protocolOp in the report
will generate the attributes and values of the entry state
corresponding to each successive version from its predecessor.
Langer Informational - Expires 7 October 2000 [Page 13]
INTERNET-DRAFT Multiple Draft Conflict Resolution - MDCR 7 April 2000
The tree constructed at any replica is consistent with that at any
other replica, ie if nodes with the same name exist in trees at two
replicas, they contain the same report. Again this follows because
nodes with the same name must have the same version as well as the
same originator and this uniquely determines a report.
Propagation delays result in different replicas having different
records and applying different updates. The trees converge as
reports propagate, resulting in convergence of the updates for
"eventual consistency".
Records may arrive out of order. The tree can be constructed despite
this, leaving any out of order records unattached until the gap is
filled.
When a new record is stored, if a record already attached to the
tree for the same entry has the next lower version and originator
equal to previous for the new record, the new record is attached as
a child. Otherwise it remains unattached. If it was attached, any
existing unattached records for that entry are checked to see if
they can now be attached, recursively.
There cannot be more than one predecessor to which a new child could
be attached because:
a) The previous attribute in a report is uniquely determined as the
originator attribute present in the entry immediately before the
change described by the report.
b) Any predecessor has an originator attribute equal to that
previous attribute and a version one less than the successor
version.
c) This combination of the predecessors originator and version
uniquely specifies a report within an entry.
The above pedantry establishes that the trees of records at each
replica based on each replica's limited current knowledge of reports
are always faithfully consistent with the state of the real trees of
conflicts.
A longest strand of the tree is selected, from the root to a leaf,
and the sequence of protocolOps in the records is applied in turn to
generate an update. This update replaces the current entry at that
replica.
The longest strands are those with the highest version number at the
leaf. If there is more than one longest strand, those with the
latest modifyTimestamp are selected. If there is still more than
one, the one with the highest originator is selected. This is unique
because the combination of version and originator is unique.
Langer Informational - Expires 7 October 2000 [Page 14]
INTERNET-DRAFT Multiple Draft Conflict Resolution - MDCR 7 April 2000
The operational attributes listed in the report at the leaf of the
selected strand replace those attributes of the updated entry,
without modification.
Procedures for applying add, del and modifyDN protocolOps are not
discussed in this draft. Procedures for modify are obvious and
result in identical user attributes and values for entries that have
the same version and originator at different replicas.
An update is not a change. It is a replication of a previous state
of an entry, already made visible at some other replica. No new
report results from applying an update and no new conflicting draft
entries are created.
9. Weeding and Summarizing
Weeding purges records of ephemeral changes that have been
suppressed and can no longer influence future updates to an entry.
Summarization consolidates records of durable changes that cannot be
suppressed by subsequent changes.
Notification procedures may be added to inform DUAs that made
changes about retraction of success responses for their ephemeral
changes or confirmation for their durable changes.
When a SeenMark for an originator rises above or equal to the
originatorRSN of a record of a change made by that originator, the
replica holding the SeenMark and record knows that all replicas must
have stored their own records of that change, so the record is
marked as "Seen".
Each replica MUST apply any update resulting from storing a change
record before the next increment of its currentRSN. This prevents it
from accepting a change from a DUA based on its version of the entry
prior to the update. That is necessary for the procedures below.
Note: A weaker requirement may be desirable to not constrain
implementations from optimizing their processing of updates. This
could need more complex specified delays and procedures than those
below. Any such specifications MUST be standardized as independent
optimizations are unlikely to interoperate correctly.
When a record and all its predecessors in the tree have been marked
"Seen", no subsequent change can occur unless it has a higher
version than that record. This follows because:
a) Each record in the sequence has a corresponding record stored at
every replica which has been appended to its predecessor in the tree
for its entry at that replica by the recursive update processing
Langer Informational - Expires 7 October 2000 [Page 15]
INTERNET-DRAFT Multiple Draft Conflict Resolution - MDCR 7 April 2000
procedures, regardless of the order in which those records arrived
at any replica.
b) Each replica has also updated its copy of the entry when
necessary to ensure it has the highest version number among all
records in the tree.
c) The entry at all replicas therefore has a version not less than
the last of that sequence of records marked "Seen".
d) Therefore no originator can make a change that does not increment
the version to a higher version.
If no more conflicts are occurring for an entry, report propagation
must result in the SeenMark rising until all records have been
marked seen.
When the strand of the tree ending at the record with the same
version and originator as the entry, has all been marked as "Seen",
none of the other leaves with the same, highest, version number can
be visible at any replica, even if they have previously been marked
"Seen". This follows because they all have either an earlier
modifyTimestamp or the same modifyTimestamp and a smaller
originator.
Once that has happened, if, after a delay based on the maximum
possible time for propagation of a change from another replica, no
further reports have been received that cause an update to the
entry, any future updates to the entry will be based on the entry.
The delay is necessary because changes could have been made at some
other replica before the current state of the entry was seen, that
have reached a higher version and are still propagating to this
replica.
Note: The method of dynamically calculating the delay needs careful
specification. It ought to be possible to improve these procedures
when doing so. It may well be possible to do earlier weeding and
summarizing.
After the delay, the record with the same version and originator as
the entry and any predecessors in the tree are marked "Durable".
Those changes are the basis from which all future changes will be
made. All other records are marked "Ephemeral". They have no effect
on the current state of the entry at any replica and cannot have any
future effect.
Any DSA that has a record marked "Durable" or "Ephemeral" and is
originator of that record, is responsible for any notifications to
users of the DUAs listed as "modifierName" for those entries.
Notifications could include the protocolOp, the state of the entry
before and after the change and the current state.
Langer Informational - Expires 7 October 2000 [Page 16]
INTERNET-DRAFT Multiple Draft Conflict Resolution - MDCR 7 April 2000
Sending notifications would presumably depend on controls the DUAs
used in their operations, attributes in the directory entry, perhaps
in some other NC, for the DUAs DN (modifiersName) etc. Details
should be considered in connection with other work on notifications
and transactions as the complete history maintained by these
procedures are likely to be relevant to both.
"Ephemeral" records may be weeded after issueing any notifications,
or if the originator was a different replica. This may only involve
detaching them from the tree in RAM, while retaining a full audit
trail in the log file.
"Durable" records can likewise be detached from the tree, as they
are summarized by the current visible entry. However this
consolidation must be preserved as the root of the tree of records
when the next change or update is applied, so that the protocolOps
in subsequent records have a base to start from when calculating
further updates. (If record extracts are converted to primitives it
may be possible for implementations to work backwards from the
current entry, and then forwards to a replacement).
11. Security Considerations
The directory is commonly used to hold security related information
and Multi-Master replication is particularly useful for providing
local and high availability of changeable authentication and
authorization information in a "single sign-on" environment. Great
care is required to maintain coherence between prescriptive and
entry access control information, other policy information and the
names and other attributes of entries, and to maintain an accurate
audit trail. This and other proposals for Multi-Master replication
need careful evaluation of security considerations, which has not
yet been done.
12. Acknowledgements
The courtesy of the LDUP working group in postponing a final call on
the current requirements draft to consider this very late submission
from a previous non-participant is appreciated. Thanks also to
Steven Legge, Alison Payne, Ed Reed and Uppili Srinivasan for taking
time out for discussion during a hectic IETF meeting.
The proposed mechanisms for propagation dampening and prioritization
of change notifications are based on those developed as part of the
Coda file system project [11] and implemented in Microsoft Active
Directory [12].
The preference for resolving rather than reconciling conflicts and
the basic concept of dynamically maintaining monolithic unity by a
Langer Informational - Expires 7 October 2000 [Page 17]
INTERNET-DRAFT Multiple Draft Conflict Resolution - MDCR 7 April 2000
distributed process of letting a hundred schools of thought contend,
selecting a most progressive path, weeding obsolete paths and
summing up while moving on, is of course derived from Mao [13].
The related concept of "Multiple Drafts" contending for visibility,
with branching rather than linear causality, has connections to
theories concerning the emergence of consciousness in large neural
networks promoted by Daniel Dennett [14] as well as some
philosophies of quantum mechanics.
13. References
1 Bradner, S., "The Internet Standards Process -- Revision 3", BCP
9, RFC 2026, October 1996.
2 Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", BCP 14, RFC 2119, March 1997.
3 Legg, S. and Payne, A., "LDUP Update Reconciliation Procedures",
Internet-Draft, draft-ietf-ldup-urp-02.txt, October 1999.
4 Weiser, R.F. and Stokes, E. "LDAP V3 Replication Requirements",
Internet-Draft, draft ietf - -ldup-replica-req-02.txt, October 1999.
5 Chadwick, D.W., "Compound (Families of) Entries", Internet-Draft,
draft-chadwick-families-00.txt, (expired) 5 December 1999.
6 Stokes, E. and Good, G., "The LDUP Replication Update Protocol",
Internet-Draft, draft-ietf-ldup-protocol-01.txt, March 2000.
7 Reed, E. "LDAP Subentry Schema", Internet-Draft, draft-ietf-ldup-
subentry-02.txt, March 2000.
8 Stokes, E., Harrison, A. and Good, G., "Extended Operations for
Framing LDAP Operations", Internet-Draft, draft-ietf-ldup-
framing-00.txt, March 2000.
9 Merrels, J., Reed, E. and Srinivasan, U. "LDAP Replication
Architecture", draft-ietf-ldup-model-03.txt, March 2000.
10 Reed, E., "LDUP Replication Information Model", Internet-Draft,
draft-ietf-ldup-infomod-01.txt, March 2000.
11 "Scientific Papers from the Coda Project",
http://www.cs.cmu.edu/afs/cs/project/coda/Web/docs-coda.html
12 "Active Directory Replication", Chapter 6 of "Microsoft Windows
2000 Server Distributed Systems Guide", Microsoft Press, 2000.
Also downloadable in file act_dir_guide.zip from www.msdn.com.
Langer Informational - Expires 7 October 2000 [Page 18]
INTERNET-DRAFT Multiple Draft Conflict Resolution - MDCR 7 April 2000
13 Mao Tsetung, "Four Essays on Philosophy", Foreign Languages
Press, Peking, 1966.
14 Dennett, D.C,. "Consciousness Explained", Penguin 1992.
14. Author's Address
Albert Langer
Directory Designs
PO Box 1116
North Fitzroy VIC 3068
AUSTRALIA
Email: Albert.Langer@Directory-Designs.Org
Full Copyright Statement
Copyright (C) The Internet Society (2000). All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph
are included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Langer Informational - Expires 7 October 2000 [Page 19]