Internet DRAFT - draft-duker-as2-reliability

draft-duker-as2-reliability



Private Working Group                                     John Duker
Internet Draft                                      Procter & Gamble
Intended status: Informational                           Dale Moberg
Expires: April 22, 2015                                 Orion Health
                                                    October 22, 2014


Operational Reliability for EDIINT AS2 
<draft-duker-as2-reliability-16.txt>

Status of this Memo

This Internet-Draft is submitted to IETF in full conformance with the 
provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering 
Task Force (IETF), its areas, and its working groups.  Note that      
other groups may also distribute working documents as Internet-
Drafts. 
    
Internet-Drafts are draft documents valid for a maximum of six months 
and may be updated, replaced, or obsoleted by other documents at any 
time.  It is inappropriate to use Internet-Drafts as reference 
material or to cite them other than as "work in progress." 
    
The list of current Internet-Drafts can be accessed at 
     http://www.ietf.org/ietf/1id-abstracts.html 
The list of Internet-Draft Shadow Directories can be accessed at 
     http://www.ietf.org/shadow.html. 

This document may contain material from IETF Documents or IETF
Contributions published or made publicly available before November
10, 2008.  The person(s) controlling the copyright in some of this
material may not have granted the IETF Trust the right to allow
modifications of such material outside the IETF Standards Process. 
Without obtaining an adequate license from the person(s)
controlling the copyright in such materials, this document may not
be modified outside the IETF Standards Process, and derivative
works of it may not be created outside the IETF Standards Process,
except to format it for publication as an RFC or to translate it
into languages other than English.
          
Any questions, comments, and reports of defects or ambiguities in 
this specification may be sent to the mailing list for the EDIINT 
working group of the IETF, using the address <ietf-ediint@imc.org>. 
Requests to subscribe to the mailing list should be addressed to 
<ietf-ediint-request@imc.org>. 
    

Duker & Moberg         Expires -  April 22, 2015              [Page 1]


Internet-Draft   Operational Reliability for EDIINT AS2   October 2014

Abstract

One goal of this document is to define approaches to achieve a "once 
and only once" delivery of messages. The EDIINT AS2 protocol is 
implemented by a number of software tools on a variety of platforms 
with varying capabilities and with varying network service quality. 
Although the AS2 protocol defines a unique "Message-ID", current 
implementations of AS2 do not provide a standard method to prevent 
the same message (re-transmitted by the initial sender) from reaching 
back-end business applications at the initial receiver. 

A second goal is to reduce retransmissions and failures when AS2 is used
in a synchronous mode for transmitting MDNs.  There can be a large 
latency between receipt of the POSTed entity body and the MDN response 
caused by the operations of decompressing, decrypting, and signature 
checks. Uncoordinated timeout policies and intermediate devices dropping 
connections have interfered with reliable data exchange. The use of an 
HTTP 102(Processing) status code is described to mitigate these 
difficulties. Use of these reliability features is indicated by
presence of the "AS2-Reliability" value in the EDIINT-Features header.

Intended Status

The intent of this document is to be placed on the RFC track as an 
Informational RFC.

Feedback Instructions:
NOTE TO RFC EDITOR:  This section should be removed by the RFC editor 
prior to publication.

If you want to provide feedback on this draft, follow these 
guidelines:

-Send feedback via e-mail to the ietf-ediint list for discussion, 
with "AS2 Reliability" in the Subject field. To enter or follow the 
discussion, you need to subscribe to ietf-ediint@imc.org.

-Be specific as to what section you are referring to, preferably 
quoting the portion that needs modification, after which you state 
your comments.

-If you are recommending some text to be replaced with your suggested 
text, again, quote the section to be replaced, and be clear on the 
section in question.
 
Duker & Moberg         Expires -  April 22, 2015              [Page 2]


Internet-Draft   Operational Reliability for EDIINT AS2    October 2014


Table of Contents

1. Introduction
1.1 Key Word Conventions
1.2 Terminology and Scope Limitations
2. AS2 Modes of Operation
3. AS2 Reliability Concepts
4. Basic Initial Sender Operation
5. Initial Sender Operation for Retry Situations
6. Initial Sender Operation for Resend Situations
7. Initial Receiver (Server) Operation
8. Additional Reliability Considerations with Synchronous MDNs
9. Security Considerations
10. IANA Considerations
11. Acknowledgements
Normative References
Informative References
Appendix


1. Introduction

AS2 Reliability has the goal of ensuring that the AS2 protocol 
succeeds in exchanging business data payloads exactly once, provided 
that the network routing and transport (IP and TCP) layers are fully 
functional. That is, the goals for reliability are, first, that 
errors associated with HTTP server operation and server initiated sub 
processes do not prevent delivering messages or their receipt 
responses (MDNs) at least once and, second, that retry or resending 
operations made to compensate for these errors do not result in the 
same message payloads being submitted for further processing more 
than once. 

1.1     Key Word Conventions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 
document are to be interpreted as described in [RFC2119].

1.2     Terminology and Scope Limitations

Initial Sender: The AS2 application ("sending implementation") which 
transmits the Message containing the business payload to the "initial 
receiver".

Initial Receiver: The AS2 application ("receiving implementation") 
which receives the Message containing the business payload. The 
initial receiver sends a MDN back to the initial sender.

 
Duker & Moberg         Expires -  April 22, 2015              [Page 3]


Internet-Draft   Operational Reliability for EDIINT AS2    October 2014

Message: The business payload as embedded in a "wire" format and 
ready for transmission by a transfer protocol (including all MIME 
wrappings, headers, encodings, and security transformations).

Message Disposition Notification (MDN) - The Internet messaging 
format used to convey a receipt. This term is used interchangeably 
with receipt. See [RFC3798].

Message Identifier (Message-ID) - A globally unique identifier for a 
message. The sending implementation MUST guarantee that the Message-
ID is unique for a given AS2-To and AS2-From pair. See [RFC5322].

Message Integrity Check (MIC) - The name given to the quantity computed 
over the body part with a message digest or hash function, in support 
of the digital signature service. 

Payload: The business data exchanged between business applications. 

Retry:   When attempting to send a message using the POST method, the 
initial sender can encounter transient exceptions that result in a 
failure to obtain a HTTP status code or a transient HTTP error such 
as 503. "Retry" is the term used in this document to refer to an 
additional POST of the same message, with the same content (including 
the Message Integrity Check value) and with the same 
Message-ID value. A retry can occur after a few second delay or on a 
schedule. Retrying ceases when a message is sent (which is indicated 
by receiving a HTTP 200 range status code), or when a retry limit is 
exceeded. Concurrency is not allowed for retries for a given message.

Resend:  The AS2 protocol normally requests a (signed or unsigned) 
MDN response in the HTTP response message body. When a MDN is not 
received in a timely manner, the initial sender may choose to resend 
the original message. Because the message has already been sent, but 
has presumably not been processed according to expectation, the same 
message, with the same content and the same Message-ID value is sent 
again. This operation is referred to as a resend of the message. This 
document will suggest guidelines to prevent AS2 software 
implementations that receive duplicate messages from distributing 
that message to back-end business applications, as well as guidelines 
on resend intervals and resend counts for various modes of AS2 
operation. Resending ends when the MDN is received or the resend 
count is reached.

Resubmit: Accidents happen, and possibly the remote system will need 
to get a new copy (a "resubmit") of a message that was previously 
exchanged. In addition, neither Resending nor Retrying continue 
forever, but the data may still need to be exchanged at a later time, 
 
Duker & Moberg         Expires -  April 22, 2015              [Page 4]


Internet-Draft   Operational Reliability for EDIINT AS2    October 2014

so a message may need to be resubmitted. When data that failed to be 
exchanged or was exchanged but lost is resubmitted in a new message 
(with a new Message-ID value, and possibly with new timestamps, and 
boundary delimiters), it is called resubmission. Resubmission is 
normally a manual compensation and is not discussed in this document 
further.

Duplicate: Duplicate copies of the same message are messages between 
the same AS2-To and AS2-From organizations which have the same 
Message-ID. This document will recommend ways to respond to 
duplication in messages as indicated by messages being received with 
the same Message-ID values. These duplicates arise in some cases when 
retries and/or resends are allowed, and success indicators (such as 
HTTP "200 OK" or MDNs) are not received by the initial sender.


2. AS2 Modes of Operation

There are many user selectable options within the AS2 protocol, but 
there are two main modes of operation commonly used that differ in 
how the MDN is returned to the initial sender.

Transferring data via AS2 involves two organizations that are 
identified using AS2-To and AS2-From header values. The initial 
sender places its identifying value in the AS2-From header and POSTS 
an AS2 formatted message to the organization whose value is found in 
the AS2-To header. The initial sender can indicate that it wants to 
receive its MDN on a different connection by including a header, 
Receipt-Delivery-Option, with an http, https, or (rarely) mailto URL 
as its value. Use of the mailto URL is not further considered in this 
document.

When the MDN is requested to be returned on a distinct TCP connection 
using the included URL, the AS2 operation mode is called 
"asynchronous." An asynchronous MDN is returned when the initial 
receiver has the information and resources available to do so, and 
the formatted MDN is POSTED to the delivery URL with the initial 
sender identifier as the AS2-To value and the original recipient as 
the AS2-From value.

The AS2 protocol does not specify how asynchronous MDN delivery is 
scheduled; it is left to the receiving implementation to determine 
how MDNs will be returned. The protocol in effect uses the "200" 
level status code to determine that the initial message has been 
sent. The initial sender then enters into a state of "waiting" or 
"expecting" a MDN to be received.

While it is expected that a MDN be returned in a timely fashion, 
there has not been an agreed upon deadline, and receiving 
implementations have had flexibility in scheduling return. It is 
therefore possible that an indefinite waiting period occur when a MDN 
 
Duker & Moberg         Expires -  April 22, 2015              [Page 5]


Internet-Draft   Operational Reliability for EDIINT AS2    October 2014


is "lost" (for whatever reason). Most implementations do eventually 
time out this wait for a lost MDN. Implementations also try to 
recover from this protocol failure by resending the original message. 
Under certain circumstances, therefore, duplicate messages can arrive 
at the recipient. 

When no Receipt-Delivery-Option is included in the original message, 
and a MDN is requested (whether signed or unsigned), the AS2 
operation mode is said to be "synchronous." This means that the 
protocol requires that the MDN arrive back on the same TCP 
connection, and in the MIME body of the HTTP reply message. When 
using the synchronous mode, there can be a timeout by either side 
waiting for the HTTP reply, and that timeout usually aborts the 
protocol by closing the connection. In such a case, the message has 
not been successfully sent, so the payload from the message should 
not be distributed to a back end business application, and the 
message can only be retried (or perhaps resubmitted, an option not 
discussed here). Therefore resend compensation will not be discussed 
for the synchronous mode, but instead retry compensation will be the 
main topic.


3. AS2 Reliability Concepts

Introducing timeouts for various wait states does not in itself 
promote the goal of delivering the message. Instead it simply cleans 
up the protocol state machine so that it can be restarted. Delivery 
thus requires that the message be sent again either in a retry or a 
resend operation.

It is important to have precise understanding of what "sending the 
same message again" means. When exchanging business data, there are 
both payload transaction identifiers and message identifiers. In AS2, 
the message identifier is the value of the Message-ID header, and the 
procedures described below will assume that each message has a unique 
Message-ID value in the message headers. Implementations MUST NOT 
change any content of the message when retrying or resending. This 
requirement allows implementations to use Message-ID values to detect 
duplicate messages, and avoid sending their payloads to the internal 
business applications that process the business data. [Note: 
duplicate payloads could still be sent, but they would have to be 
sent in different messages. Implementations MAY provide duplicate 
detection for payloads as well. Implementations will need to be 
informed about the specific business data (such as the interchange 
control numbers of the ISA [or UNB] header of ANSI X12 [or EDIFACT] 
payloads, or the InstanceIdentifier in the DocumentIdentification block 
of a Standard Business Document Header (SBDH) used in some XML 
messages) in order to offer a service for duplicate payload detection.]
                                                                     
                                                                     
Duker & Moberg         Expires -  April 22, 2015              [Page 6]


Internet-Draft   Operational Reliability for EDIINT AS2    October 2014


4. Basic Initial Sender Operation

Sending implementations MUST be able to configure how much time is 
allowed before closing an unresponsive HTTP connection prior to 
receiving the HTTP status code reply.

Sending implementations MUST retain an exact copy of every message 
(including the Message-ID value) which is attempted to be sent or is 
sent. Repackaging a payload will not necessarily produce the same 
message, because MIME boundary delimiter values, timestamps, and 
other dynamic data used in assembling messages may not be the same. 
It is implementation dependent how this copy is retained.

Several parameters relating to the number and schedule for retries 
and resends need to be described. Implementations MUST allow the 
configurability of these parameters but are allowed to use an 
implementation dependent "back off" algorithm for lengthening 
intervals.


5. Initial Sender Operation for Retry Situations

Relevant conditions:

Connection refused:
Closed connection prior to HTTP reply received.
Transient exception (such as 503) in HTTP reply codes

Behavior and Capabilities:

o Maximum number of retries.

Sending implementations MUST be capable of configuring either a maximum 
number of retries, and/or a total elapsed time for the retry duration. 
Sending implementations MUST stop retrying when a successful send 
occurs, when reaching the total retry number, or when reaching the
total elapsed time for retry duration. The count of retries SHALL begin
with the first retry counting as the first one. So, if five retries are
allowed, a total of 6 attempts can be made to send the message using
the retry operation, provided retry does not attain success or 
otherwise stop. 

o Minimum initial interval of retries.

Sending implementations MUST be capable of configuring a minimum 
retry interval. The minimum interval pertains to the first retry, and 
(depending on an implementation dependent algorithm) remaining ones. 
The function governing lengthening intervals between retries MUST 
increase monotonically (stay the same or increase). The minimum retry
 
Duker & Moberg         Expires -  April 22, 2015              [Page 7]


Internet-Draft   Operational Reliability for EDIINT AS2    October 2014


interval begins after the failure to send is determined. 

o Maximum retry duration

Sending implementations MUST be capable of configuring either a period 
within which all retries of the same message are attempted and/or a 
maximum number of retries. After this period expires, a payload would
have to be resubmitted to be exchanged. This interval begins after the 
initial attempt to deliver is known not to have succeeded. (Success is 
marked by the reception of a 200 level HTTP status code.) A retry in 
progress when the maximum retry interval is reached does not have to be 
stopped. The retry process may exceed the maximum retry duration before 
the maximum number of retries is reached.

Implementations are permitted to restrict the range of values for the 
configurability of the above maximum and minimum values. 
Implementations should engage some kind of "back off" algorithm to 
avoid exacerbating resource use on heavily loaded servers. (High 
workloads are often behind the "connection refused" or "server busy" 
error conditions.) Implementations are also allowed to alter ranges of 
configurability for one range of value based upon a user selection of 
some other maximum or minimum value, but no requirements are made on 
implementations as to how these restrictions are defined.

o Diagnostic logging

Sending implementations SHOULD keep a record of the condition that 
caused a failure in sending a message as this log may help identify a 
cause of and a solution to a sending failure. For example, if the 
time involved in all retries of sending a message has approximately 
the same value and the error is reported as an unexpected close in a 
connection, then a review of the values governing closing 
connections on both sides, followed by their adjustment can be 
useful. Of course, other factors may be involved-- ranging from 
network congestion to unpredictably large payloads to be exchanged-- 
that may also need further tweaking. 


6. Initial Sender Operation for Resend Situations

Because successful delivery of the message in the synchronous MDN 
mode implies that the initial sender must receive a response which 
contains both a HTTP 200 level status code and a MDN in the body of 
the response, the resend operation is not defined for the synchronous 
MDN mode of operation.

Relevant conditions:

MDN not received after a resend interval has expired.
 
Duker & Moberg         Expires -  April 22, 2015              [Page 8]


Internet-Draft   Operational Reliability for EDIINT AS2    October 2014


Behavior and Capabilities:

Sending implementations MUST note when successful sending has occurred 
(when using asynchronous MDNs), so that resending may conform with the 
Resend configuration parameters.

o Maximum number of Resends

Sending implementations MUST be capable of configuring either a total
number of resends, and/or a total elapsed time for the resend duration.
Sending implementations MUST stop resending when a MDN is received, 
when reaching the total number of resends, or when exceeding the total 
allowed time for resends.

o Minimum interval between Resends

Sending implementations MUST be capable of configuring an interval of 
time separating resends. Implementations MUST ensure for a given 
message that retry and resend operations are not interwoven. For 
example, during a resend attempt, retries could occur. In this 
situation, the sending implementation MUST ensure that another resend 
does not start while retries are still occurring.

o Maximum resend duration.

Sending implementations MAY configure a total duration for resend 
operations and MUST NOT start additional resend attempts when that 
duration is exceeded. This interval begins when the first successful 
send operation occurs. (Success for the sender is determined by its 
reception of a 200 level HTTP status code.)

 
7. Initial Receiver (Server) Operation

Behavior and Capabilities:


Receiving implementations MUST return an appropriate MDN (when a MDN 
is requested) even when a message is detected as a duplicate.

Duplicate elimination is based on Message-ID values. Receiving 
implementations MUST retain Message-ID values for the pairs of 
organizations exchanging data, beginning with the successful receipt 
of the message. Successful receipt may possibly occur from the 
receiving implementation's point of view even if the initial sender 
does not see the HTTP reply status code, thereby causing the initial 
sender to initiate a retry.

 
Duker & Moberg         Expires -  April 22, 2015              [Page 9]

Internet-Draft   Operational Reliability for EDIINT AS2    October 2014

Receiving implementations SHOULD retain Message-IDs until the initial 
sender has exhausted all retry and resend durations. Since the 
receiving implementation may not know these durations, the receiving 
implementation MUST retain each Message-ID for a minimum of five days 
unless users explicitly agree to configure the time period for a 
different time period. 

Receiving implementations MUST be configurable so that backend 
business applications are not sent the contents (payload) from the 
same message more than once. It is recommended that implementers make 
this the default. (However different messages, as determined by their 
Message-ID values, may still send the same payload contents.)

Receiving implementations MAY be configurable so that backend 
business applications do not receive the same payload more than once 
(for mutually agreed upon business data types). This functionality 
is, however, not specified in this document.


8. Additional Reliability Considerations with Synchronous MDNs

There can be combinations of server and client behavior that, even 
when the network is fully functional, still interfere with reliable 
AS2 data exchange. 

When clients, their operating systems, or intermediate HTTP relay 
agents choose to close TCP connections before the server has had time 
to complete the processing needed to create the reply, repetition of 
the client HTTP request need not lead to a successful outcome, no 
matter how often the retry operation is repeated. 

Timeouts while waiting for a HTTP response may themselves create 
errors. The intent of these timeouts is to avoid waste of resources 
tied up in possibly indefinite delays ("hangs") in HTTP response. 
However, with short timeout periods and for very large files, the 
security processing required to be able to form the MDN may, 
especially under very heavy loads, lead to a particularly bad 
outcome. The initial sender may attempt to repeatedly retry its HTTP 
POST creating additional load with no better outcome (timeout before 
MDN reply is received).

In order to avoid this timeout situation, receiving implementations 
MUST support the HTTP 102 (Processing) status code [HTTP-Codes]. The 
102 (Processing) status code is an interim response used to inform the 
client that the server has accepted the complete request, but has not 
yet completed it.  This status code SHOULD only be sent when the server 
has a reasonable expectation that the request will take significant 
time to complete. As guidance, if a method is taking longer than 20 
seconds (a reasonable, but arbitrary value) to process, the server MUST 
return a 102 (Processing) response. The server MUST also send a final 
status code indicating success (200 range) or otherwise after the 
request has been completed, as required by the HTTP protocol.

Duker & Moberg         Expires -  April 22, 2015             [Page 10]


Internet-Draft   Operational Reliability for EDIINT AS2    October 2014


Receiving implementations MUST NOT delay the sending of a MDN in order 
to allow the message payload to be processed by a back end application.
The HTTP 102 status code is intended to keep a session active while the 
receiving implementation is processing the message to create a MDN.


9. Security Considerations

None

10. IANA Considerations

None

11. Acknowledgements

The authors wish to extend gratitude to the following individuals,
- John Koehring of Axway for his comments on early drafts.
- Yury Bogucharov of Microsoft for his help clarifying retry and
resend approaches (clarified in version 4 of this draft).
The authors also wish to thank all of the participants in the Drummond 
Group AS2 Reliability Interoperability testing for their feedback and 
helpful comments.

Normative References

[AS2] RFC4130 "MIME-Based Secure Peer-to-Peer Business Data 
Interchange Using HTTP, Applicability Statement 2 (AS2)", D. Moberg, 
R. Drummond, July 2005.
  
[RFC2119] RFC2119 "Key Words for Use in RFC's to Indicate Requirement 
Levels", S. Bradner, March 1997.

[HTTP-Codes] HTTP-Codes "HTTP Status Code Registry", 
http://www.iana.org/assignments/http-status-codes, October 2007

[RFC3798] "Message Disposition Notification" T. Hansen, G. Vaudreuil,
May 2004

[RFC5322] "Internet Message Format" P. Resnick, October 2008

Informative References

[AS1] RFC3335 "MIME-based Secure Peer-to-Peer Business Data 
Interchange over the Internet using SMTP", T. Harding, R. 
Drummond, C. Shih, September 2002.

[AS3] RFC4823 "FTP Transport for Secure Peer-to-Peer
Business Data Interchange over the Internet", T. Harding, R. Scott, 
April 2007.

Duker & Moberg         Expires -  April 22, 2015             [Page 11]

Internet-Draft   Operational Reliability for EDIINT AS2    October 2014

Authors' Addresses

John Duker
Procter & Gamble
2 Procter & Gamble Plaza
Cincinnati, OH 45202 USA
Email: john.duker.ecom@gmail.com

Dale Moberg
Orion Health
Evans Office Complex, Building C
Suite C-100, 7350 East Evans Rd
Scottsdale, Arizona 85260 USA
Email: dale.moberg@gmail.com

Copyright (c) 2014 IETF Trust and the persons identified as the document
authors.  All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions 
Relating to IETF Documents (http://trustee.ietf.org/license-info) in 
effect on the date of publication of this document. Please review these
documents carefully, as they describe your rights and restrictions with
respect to this document. Code Components extracted from this document 
must include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as 
described in the Simplified BSD License.
    
All IETF Documents and the information contained therein are provided on
an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 
THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
THE INFORMATION THEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.