Network Working Group                                            J. Lyon
Internet-Draft                                                 Microsoft
<draft-lyon-itp-nodes-00.txt>                           November 4, 1996
Expires in 6 months

                     Transaction Internet Protocol


Status of this Memo

   This document is an Internet-Draft.  Internet-Drafts are working
   documents of the Internet Engineering Task Force (IETF), its areas,
   and its working groups.  Note that other groups may also distribute
   working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   To learn the current status of any Internet-Draft, please check the
   "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
   Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
   munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
   ftp.isi.edu (US West Coast).

   Distribution of this document is unlimited. Please send comments to
   the author at <JimLyon@Microsoft.Com>.


Abstract

   In many applications where different nodes cooperate on some work,
   there is a need to guarantee that the work happens atomically.  that
   is, each node must reach the same conclusion as to whether the work
   is to be completed, even in the face of failures.  This document
   proposes a simple, easily-implemented protocol for achieving this
   end.

Introduction

   The standard method for achieving atomic committment is the two-phase
   commit protocol; see [1] for an introduction to atomic commitment and
   two-phase commit protocols.

   Numerous two-phase commit protocols have been implemented over the
   years.  However, none of them has become widely used in the Internet,
   due mainly to their complexity.  Most of that complexity comes from
   the fact that the two-phase commit protocol is bundled together with
   a specific program-to-program communication protocol, and that
   protocol lives on top of a very large infrastructure.

   This memo proposes a very simple two-phase commit protocol.  It
   achieves its simplicity by specifying only how different nodes agree
   on the outcome of a transaction; it allows (even requires) that the
   subject matter on which the nodes are agreeing be communicated via

Lyon                                                           [Page 1]

Internet-Draft      Transaction Internet Protocol      November 4, 1996

   other protocols.  By doing so, we avoid all of the issues related to
   application communication semantics, data representation, and
   security (to name just a few).

   It is envisioned that this protocol will be used mainly for a
   transaction manager on one Internet node to communicate with a
   transaction manager on another node.  While it is possible to use
   this protocol for application programs and/or resource managers to
   speak to transaction managers, this communication is usually
   intra-node, and most transaction managers already have more-than-
   adequate interfaces for the task.

   While we do not expect this protocol to replace existing ones, we do
   expect that it will be relatively easy for many existing
   heterogeneous transaction managers to implement this protocol for
   communication with each other.

   This protocol is layered on top of TCP.  It uses a different TCP
   connection for each simultaneous transaction that is shared between
   two nodes; however, after a transaction has ended, the TCP connection
   can be reused for a different transaction.


Transaction Identifiers

   Unfortunately, there is no globally-accepted standard for the format
   of a transaction identifier; various transaction managers have their
   own proprietary formats.  Therefore, for the purposes of this
   protocol, a transaction identifier is any sequence of printable
   ASCII characters (octets with values in the range 33 through 126,
   inclusive). A transaction manager may map its internal transaction
   identifiers into this printable sequence in any manner it sees fit.
   Furthermore, each party in a superior/subordinate relationship gets
   to assign its own identifier to the transaction; these identifiers
   are exchanged when the relationship is first established.  Thus, a
   transaction manager gets to use its own format of transaction
   identifier internally, but it must remember a foreign transaction
   identifier for each superior/subordinate relationship in which it is
   involved.


Pushing vs. Pulling Transactions

   Suppose that some program on node "A" has created a transaction, and
   wants some program on node "B" to do some work as part of the
   transaction.  There are two classical ways that he does this,
   referred to as the "push" model and the "pull" model.

   In the "push" model, the program on A first asks his transaction
   manager to export the transaction to node B.  A's transaction manager
   sends a message to B's TM asking it to instantiate the transaction as
   a subordinate of A, and return its name for the transaction.  The
   program on A then sends a message to its counterpart on B on the
   order of "Do some work, and make it part of the transaction that your

Lyon                                                           [Page 2]

Internet-Draft      Transaction Internet Protocol      November 4, 1996

   transaction manager already knows of by the name ...".  Because A's
   TM knows that it sent the transaction to B's TM, A's TM knows to
   involve B's TM in the two-phase commit process.

   In the "pull" model, the program on A merely sends a message to B on
   the order of "Do some work, and make it part of the transaction that
   my TM knows by the name ...".  The program on B asks its TM to enlist
   in the transaction.  At that time, B's TM will "pull" the transaction
   over from A.  As a result of this pull, A's TM knows to involve B's
   TM in the two-phase commit process.

   The protocol proposed here supports both the "push" and "pull"
   models.


Endpoint Identification

   In certain cases after TCP connection failures, one of the parties to
   the connection may have a responsibility to re-establish a connection
   to the other party in order to complete the two-phase-commit
   protocol.  If the party that initiated the original connection needs
   to re-establish it, the job is easy: he merely establishes a
   connection in the same way that he originally did it.  However, if
   the other party needs to re-establish the connection, he needs to
   know how to contact the initiator of the original connection.  He
   gets this information in one of two ways:

1. If he had never received an IDENTIFY command on the original
   connection, he will use the IP address from the original connection,
   and the standard transaction manager port number.  [UNDONE:  This
   standard port number is not yet assigned; we are temporarily using
   port 6789.]

2. If he had received a valid IDENTIFY command on the original
   connection, he will use the IP address and port number specified (or
   implied) in that command.

   An <endpoint identifier> as used in the IDENTIFY (and a few other)
   commands has one of the following formats:
      <dns name>
      <ip address>
      <dns name>:<port number>
      <ip address>:<port number>

   A <dns name> is a standard name, acceptable to the domain name
   service. It must be sufficiently qualified to be useful to the
   receiver of the command.

   An <ip address> is an Internet address, in the usual form: four
   decimal numbers separated by period characters.

   The <port number> is a decimal number specifying the port at which
   the transaction manager is listening for requests to establish TCP
   sessions.  If the port number is omitted from the endpoint

Lyon                                                           [Page 3]

Internet-Draft      Transaction Internet Protocol      November 4, 1996

   identifier, the standard transaction service port number is assumed.


States of a connection

   At any instant, only one party on the connection is allowed to send
   commands, while the other party is only allowed to respond to
   commands that he receives.  Throughout this document, the party that
   is allowed to send commands is called "primary"; the other party is
   called "secondary".  Initially, the party that initiated the
   connection is primary; however, a few commands cause the roles to
   switch.

   At any instant, a connection is in one of the following states.
   From the point of view of the secondary party, the state changes when
   he sends a reply; from the point of view of the primary party, the
   state changes when he receives a reply.

   Initial:  a newly created connection starts out in this state.  A
      connection also returns to this state when its previous work is
	  completed.  Upon entry to this state, the party that initiated the
	  connection becomes primary, and the other party becomes secondary.
	  There is no transaction associated with the connection in this
	  state.  From this state, the primary can send any of the following
	   commands:  IDENTIFY, BEGIN, PUSH, PULL, PULLFROM, QUERY and
      RECONNECT.

   Begun:  In this state, the connection is associated with an active
      transaction, which can only be completed by a one-phase protocol.
	  A BEGUN response to a BEGIN command places the connection into
	  this state.  Failure of the connection in Begun state implies that
	  the transaction will be aborted.  From this state, the primary can
	  send an ABORT, COMMIT or PUSHTO command.

   Enlisted: In this state, the connection is associated with an active
      transaction, which can only be completed by a two-phase protocol.
	  A PUSHED response to a PUSH command, a PULLED response to a PULL
      command, or a PULLEDAS response to a PULLFROM command places the
	  connection into this state.  Failure of the connection in Enlisted
	  state implies that the transaction will be aborted.  From this
	  state, the primary can send an ABORT, PREPARE or PUSHTO command.

   Prepared: In this state, the connection is associated with a
      transaction that has been prepared. A PREPARED response to a
	  PREPARE command, or a RECONNECTED response to a RECONNECT command
	  places the connection into this state.  Unlike other states,
	  failure of the connection in this state does not cause the
	  transaction to automatically abort.

   Error: In this state, a protocol error has occurred, and the
      connection is no longer useful.


Lyon                                                           [Page 4]

Internet-Draft      Transaction Internet Protocol      November 4, 1996

Protocol Versioning

   This document describes version 1 of the protocol. In order to
   accommodate future versions, both parties in the conversation can
   exchange the highest version number that each understands.  After
   such an exchange, communication can occur using the smaller of the
   highest version numbers (i.e., the highest version number that both
   understand).  This exchange is optional, and occurs using the
   IDENTIFY command (and IDENTIFIED response).  In the absence of such
   an exchange, communcation is assumed to be using version 1 of the
   protocol.

   If the highest version supported by one party is considered obsolete
   and no longer supported by the other party, no useful communication
   can occur.  In this case, the newer party should merely drop the
   connection.


Commands and Responses

   All commands and responses consist of one line of ASCII text, using
   only octets with values in the range 32 through 127 inclusive,
   followed by either a CR (an octect with value 13) or an LR (an octet
   with value 10).  Each line can be split up into one or more "words",
   where each word is successive words are separated by one or more
   space octets (value 32).

   Arbitrary numbers of spaces at the beginning and/or end of each line
   are allowed, and ignored.

   Lines that are empty, or consist entirely of spaces are ignored.
   (One implication of this is that you can terminate lines with both a
   CR and an LF if desired; the LF will be treated as terminating an
   empty line, and ignored.)

   In all cases, the first word of each line indicates the type of
   command or response; all defined commands and responses consist of
   upper-case letters only.

   For some commands and responses, subsequent words convey parameters
   for the command or response; each command and response takes a fixed
   number of parameters.

   All words on a command or response line after the last defined word
   are totally ignored.  These can be used to pass human-readable
   information for debugging or other purposes.

   Following is a list of all valid commands, and all possible
   responses to each:

   ABORT

      This command is valid in the Begun, Enlisted, and Prepared states.
	  It informs the secondary that the current transaction of the

Lyon                                                           [Page 5]

Internet-Draft      Transaction Internet Protocol      November 4, 1996

	  connection will abort.  Possible responses are:

      ABORTED
         The transaction has aborted; the connection enters Initial
		 state, and the initiator of the connection becomes primary.

      ERROR
         The command was issued in the wrong state, or was malformed.
		 The connection enters the Error state.

   BEGIN

      This command is valid only in the Initial state. It asks the
	  secondary to create a new transaction, which will be completed
	  with a one-phase protocol.  Possible responses are:

      BEGUN  <transaction identifier>
         A new transaction has been successfully begun, and that
		 transaction is now the current transaction of the connection.
		 The connection enters Begun state.

      NOTBEGUN
         A new transaction could not be begun; the connection remains in
		 Initial state.

      ERROR
         The command was issued in the wrong state, or was malformed.
		 The connection enters the Error state.


   COMMIT

      This command is valid in the Begun or Prepared states.  In the
	  Begun state, it asks the secondary to attempt to commit the
	  transaction; in the Prepared state, it informs the secondary that
	  the transaction has committed.  Possible responses are:

      ABORTED
         This response is possible only from the Begun state.  It
		 indicates that some party has vetoed the commitment of the
		 transaction, so it has been aborted instead of committing.  The		
         connection enters the Initial state.

      COMMITTED
         This response indicates that the transaction has been
		 committed, and that the primary no longer has any
		 responsibilities to the secondary with respect to the
		 transaction.  The connection enters the Initial state.

      ERROR
         The command was issued in the wrong state, or was malformed.
		 The connection enters the Error state.


Lyon                                                           [Page 6]

Internet-Draft      Transaction Internet Protocol      November 4, 1996

   ERROR

      This command is valid in any state; it informs the secondary that
	  a previous response was not recognized or was badly formed.  A
	  secondary should not respond to this command.  The connection
	  enters Error state.

   IDENTIFY <protocol version> <primary's endpoint identifier>

      This commend is valid only in the Initial state; it informs the
	  secondary party of the highest protocol version supported by the
	  primary, and of an address at which the primary can be reached
	  should the secondary ever need to initiate a connection.  Possible
	   responses are:

      IDENTIFIED <protocol version>
         The secondary has saved the identification.  The response
		 contains the highest protocol version supported by the
		 secondary.  All future communication is assumed to take place
		 using the smaller of the protocol versions in the IDENTIFY
         command and the IDENTIFIED response.  The connection remains in
		 Initial state.

      ERROR
         The command was issued in the wrong state, or was malformed.
		 The connection enters Error state.

   PREPARE

      This command is valid only in the Enlisted state; it requests the
	  secondary to prepare the transaction for commitment (phase one of
	  two-phase commit).  Possible responses are:

      PREPARED
         The subordinate has prepared the transaction; the connection
		 enters PREPARED state.

      ABORTED
         The subordinate has vetoed committing the transaction.  The
		 connection enters the Initial state, and the connection
		 initiator becomes primary.  After this response, the superior
		 has no responsibilities to the subordinate with respect to the
		 transaction.

      READONLY
         The subordinate no longer cares whether the transaction commits
		 or aborts.  The connection  enters the Initial state, and the
		 connection initiator becomes primary.  After this response, the
		 superior has no responsibilities to the subordinate with
		 respect to the transaction.

      ERROR
         The command was issued in the wrong state, or was malformed.
		 The connection enters the Error state.

Lyon                                                           [Page 7]

Internet-Draft      Transaction Internet Protocol      November 4, 1996

   PULL  <superior's transaction identifier>
         <subordinate's transaction identifier>

      This command is only valid in Initial state.  This command seeks
	  to establish a superior/subordinate relationship in a transaction,
	  with the primary party of the connection as the subordinate (i.e.,
	  he is pulling a transaction from the secondary party).  Possible
	  responses are:

      PULLED
         The relationship has been established.  Upon receipt of this
		 response, the specified transaction becomes the current
		 transaction of the connection, and the connection enters
         Enlisted state.  Additionally, the roles of primary and
		 secondary become reversed.  (That is, the superior becomes the
		 primary for the connection.)

      NOTPULLED
         The relationship has not been established (possibly, because
		 the secondary party no longer has the requested transaction).
		 The connection remains in Initial state.

      ERROR
         The command was issued in the wrong state, or was malformed.
		 The connection enters the Error state.

   PULLFROM  <meta-superior's endpoint identifier>
             <meta-superior's transaction identifier>
             <subordinate's transaction identifier>

      This command is valid only in Initial state.  It seeks to
	  establish a superior/subordinate relationship in a transaction,
	  with the primary party of the connection as the subordinate.  If
	  the secondary party does not already know about the transaction,
	  he is requested PULL the transaction from the specified
	  meta-superior, then establish the relationship.  Possible
	  responses are:

      PULLEDAS  <superior's transaction identifier>
         The relationship has been established, and the superior is
		 returning the identifier by which he knows the transaction
		 (this may or may not be the same as the meta-superior's
		 identifier).  The transaction becomes the current transaction
         of the connection, the connection enters Enlisted state, and
		 the superior becomes primary on the connection.

      NOTPULLED
         The relationship could not be established.  The connection
		 remains in Initial state.

      ERROR
         The command was issued in the wrong state, or was malformed.
		 The connection enters the Error state.


Lyon                                                           [Page 8]

Internet-Draft      Transaction Internet Protocol      November 4, 1996

   PUSH  <superior's transaction>

      This command is valid only in Initial state.  It seeks to
	  establish a superior/subordinate relationship in a transaction
	  with the primary as the superior.  Possible responses are:

      PUSHED  <subordinate's transaction identifier>
         The relationship has been established, and identifier by which
		 the subordinate knows the transaction is returned.  The
		 transaction becomes current for the connection, and the
		 connection enters Enlisted state.

      ALREADYPUSHED  <subordinate's transaction>
         The relationship has been established, and the identifier by
		 which the subordinate knows the transaction is returned.
		 However, the subordinate already knows about the transaction,
         and is expecting the two-phase commit protocol to arrive via a
		 different connection.  In this case, the connection remains in
		 Initial state.

      NOTPUSHED
         The relationship could not be established.  The connection
		 remains in Initial state.

      ERROR
         The command was issued in the wrong state, or was malformed.
		 The connection enters Error state.

   PUSHTO  <meta-subordinate's endpoint identifier>

      This command is valid only in Begun and Enlisted states. It
	  requests the secondary to establish a superior/subordinate
	  relationship with the specified third party.  Possible responses
	  are:

      PUSHEDAS <meta-subordinate's transaction identifier>
         The relationship has been established, and the identifier by
		 which the meta-subordinate knows the transaction is returned.
		 The connection remains in Begun or Enlisted state.

      NOTPUSHED
         The secondary was unable to establish the relationship.  The
		 connection remains in Begun or Enlisted state.

      ERROR
         The command was issued in the wrong state, or was malformed.
		 The connection enters ERROR state.

   QUERY  <superior's transaction identifier>

      This command is valid only in the Initial state.  A subordinate
	  uses this command to determine whether a specific transaction
	  still exists at the superior.  Possible responses are:


Lyon                                                           [Page 9]

Internet-Draft      Transaction Internet Protocol      November 4, 1996

      QUERIEDEXISTS
         The transaction still exists.  The connection remains in
		 Initial state.

      QUERIEDNOTFOUND
         The transaction no longer exists.  The connection remains in
		 Initial state.

      ERROR
         The command was issued in the wrong state, or was malformed.
		 The connection enters Error state.

   RECONNECT <subordinate's transaction identifier>

      This command is valid only in the Initial state.  A superior uses
	  the command to re-establish a connection for a transaction, when
	  the previous connection was lost during Prepared state.  Possible
      responses are:

      RECONNECTED
         The subordinate accepts the reconnection.  The conversation
		 enters Prepared state.

      NOTRECONNECTED
         The subordinate no longer knows about the transaction.  The
		 conversation remains in Initial state.

      ERROR
         The command was issued in the wrong state, or was malformed.
		 The connection enters Error state.


Error Handling

   If either party receives a line that it cannot understand (either a
   command or a response), it should respond with a line starting with
   the word "ERROR"; if either party receives a line starting with
   "ERROR", it should not send anything more, but should close the
   connection.  Receipt of an ERROR line indicates that the other party
   believes that you have not properly implemented the protocol.
   Regardless of which side is at fault, further communication is
   impossible.


Connection Failure

   Depending on the state of the connection, transaction managers will
   need to take various actions when the connection fails.

   If the connection fails in Initial state, the connection does not
   refer to a transaction.  No action is necessary.

   If the connection fails in Begun or Enlisted state, each party will
   abort the transaction.

Lyon                                                          [Page 10]

Internet-Draft      Transaction Internet Protocol      November 4, 1996


   If the connection fails in Prepared state, then the appropriate
   action is different for the superior and subordinate in the
   transaction.

   If the superior determines that the transaction commits, then it
   must eventually establish a new    connection to the subordinate, and
   send a RECONNECT command for the transaction.  If it receives a
   NOTRECONNECTED response, it need do nothing else.  However, if it
   receives a RECONNECTED response, it must send a COMMIT request and
   receive a COMMITTED response.


   If the superior determines that the transaction aborts, it is allowed
   to (but not required to) establish a new session and send a RECONNECT
   command for the transaction.  If it receives a RECONNECTED response,
   it should send an ABORT command.

   The above definition allows the superior to reestablish the
   connection before it knows the outcome of the transaction, if it
   finds that convenient.  Having succeeded in a RECONNECT command, the
   connection is back in Prepared state, and the superior can send a
   COMMIT or ABORT command as appropriate when it knows the transaction
   outcome.

   If a subordinate notices a connection failure in Prepared state, then
   it should periodically attempt to create a new connection to the
   superior and send a QUERY command for the transaction.  It should
   continue doing this until one of the following two events occurs:

   1. It receives a QUERIEDNOTFOUND response from the superior.  In this
      case, the subordinate should abort the transaction.

   2. The superior, on some connection that it initiated, sends a
      RECONNECT command for the transaction to the subordinate.  In this
	  case, the subordinate can expect to learn the outcome of the
	  transaction on this new connection.  If this new connection should
      fail before the subordinate learns the outcome of the transaction,
	  it should again start sending QUERY commands.


References

   [1]  Gray, J. and A. Reuter (1993), Transaction Processing: Concepts
        and Techniques.  San Francisco, CA: Morgan Kaufmann Publishers.
		ISBN 1-55860-190-2.


Security Considerations

   If a system implements this protocol, it is in essence allowing any
   other system to attempt to reach an atomic agreement about some piece
   of work.  However, since this protocol itself does not cause the work
   to occur, the security implications are minimal.  In particular, they

Lyon                                                          [Page 11]

Internet-Draft      Transaction Internet Protocol      November 4, 1996

   fall into the following two categories:

   1. Someone PUSHED a new transaction to us that we don't want.
      Depending on his correctness or intentions, he may or may not ever
	  complete it.  Thus, an arbitrary computer may cause us to save a
	  little bit of state.  An implementation concerned about this will
	  probably drop any connection that has not been completed within a
	  small time.

   2. Someone PULLED a transaction from us when we didn't want him to.
      In this case, he will become involved in the atomic committment
	  protocol.  At worst, he may cause a transaction to abort that
	  otherwise would have committed.  Since transaction managers
	  traditionally reserve the right to abort any transaction for any
	  reason they see fit, this does not represent a disaster to the
	  applications.  However, if done frequently, it may represent a
      denial-of-service attack.  Implementations that are concerned
	  about this can use cryptographic techniques to generate hard-to-
	  guess transaction identifiers.  (If an interloper cannot guess a
      transaction identifier, he can't join the transaction.)


Author's Address

   Jim Lyon
   Microsoft Corporation
   One Microsoft Way
   Redmond, WA  98052-6399, USA

   Phone: +1 (206) 936 0867
   Fax:   +1 (206) 936 7329
   Email: JimLyon@Microsoft.Com


Lyon                                                          [Page 12]