HTTP/1.1 200 OK Date: Tue, 09 Apr 2002 05:05:49 GMT Server: Apache/1.3.20 (Unix) Last-Modified: Mon, 03 Nov 1997 17:46:00 GMT ETag: "2f5839-731c-345e0dd8" Accept-Ranges: bytes Content-Length: 29468 Connection: close Content-Type: text/plain Network Working Group Jacob Palme Internet Draft Stockholm University/KTH draft-ietf-mhtml-info-07.txt Category-to-be: Informational Expires: May 1998 November 1997 Sending HTML in MIME, an informational supplement to the RFC: MIME Encapsulation of Aggregate HTML Documents (MHTML) Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as ``work in progress.'' To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet- Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). This memo provides information for the Internet community. This' memo does not specify an Internet standard of any kind. Distribution of this memo is unlimited. 1. Abstract The memo ''MIME Encapsulation of Aggregate HTML Documents (MHTML)'' (draft-ietf-mhtml-rev-02.txt) specifies how to send packaged aggregate HTML objects in MIME format. This memo is an accompanying informational document, intended to be an aid to developers. This document is not an Internet standard. Issues discussed are implementation methods, caching strategies, problems with rewriting of URIs, making messages suitable both for mailers which can and which cannot handle Multipart/related and handling recipients which do not have full Internet connectivity. 2. Table of Contents 1. Abstract 2. Table of Contents 3. Introduction 4. Implementation methods 4.1 Method 1: Combining web browser and MIME receiving program 4.2 Method 2: Rewriting the HTML 4.3 Method 3: Using a translation table 4.4 Method 4: Using a proxy HTTP server to retrieve referenced body parts 4.5 Method 5: Putting the mail client into a proxy HTTP server 4.6 Other methods 4.7 Communication between web browser mail client 5. Problems with rewriting URIs when copying HTML documents 6. Caching of body parts 7. Recipients which cannot handle the Multipart/related Content-Type 8. Use of the Content-Type: Multipart/alternative 8.1 Multipart/alternative inside Multipart/related 8.2 Multipart/alternative outside Multipart/related 8.3 Comparing the two methods 9. Recipient may not have full Internet connectivity 10. Encoding of non-ascii characters 11. Conversion from HTTP to MIME 12. Acknowledgments 13. References 14. Author's Address Mailing List Information Further discussion on this document should be done through the mailing list MHTML@SEGATE.SUNET.SE. To subscribe to this list, send a message to LISTSERV@SEGATE.SUNET.SE which contains the text SUB MHTML Archives of this list are available by anonymous ftp from FTP://SEGATE.SUNET.SE/lists/mHTML/ The archives are also available by email. Send a message to LISTSERV@SEGATE.SUNET.SE with the text "INDEX MHTML" to get a list of the archive files, and then a new message "GET " to retrieve the archive files. Comments on less important details may also be sent to the editor, Jacob Palme . More information may also be available at URL: HTTP://www.dsv.su.se/~jpalme/ietf/jp-ietf-home.HTML 3. Introduction [MHTML] specifies how to send packaged aggregate HTML objects in MIME multipart format. This memo is an accompanying informational document, intended to be an aid to developers. This document is not an Internet standard. 4. Implementation methods The [MHTML] standard has been intentionally written to be implementable both in cases where a web browser and a program receiving MIME objects, such as an email program, are combined, and when they are separate programs. Implementation is of course easier if the web browser is combined with the MIME receiving client. Below is described different implementation methods. Real implementations may sometimes combine ideas from more than one of the different methods described below. Note: In the future, web browsers will probably be able to take a whole document of Content-Type: message or Content-Type: multipart as one single file to be displayed. When web browsers get this functionality, the problems described below will be much easier to handle, just send the whole combined MIME message as a single file to the web browser. 4.1 Method 1: Combining web browser and MIME receiving program This is the architecturally simplest approach. A web-browser with a built in MIME receiving program (such as an email program) will be able to use its own web browser capabilities to display HTML-formatted messages. Since it is the same program, that program will more easily be able to connect a URL in the HTML text to a body part in the message. 4.2 Method 2: Rewriting the HTML +---------+ +--------+ | Web | | Mail | | browser | | client | +-------+-+ +-+------+ | | +--+-------------------------------+--+ | +----------+ +--+ +--+ | | | Start | | | | | Related | Figure 1 | | HTML | | | | | body part | | | document | | | | | parts | | +----------+ +--+ +--+ | +-------------------------------------+ If the web browser is separate from the MIME receiving client, the MIME client might turn over the HTML body part to the web browser and ask it to display it (Figure 1). One way of doing this is to store the HTML body part in a file, and ask the web browser to display this file. If multipart/related is used, this can be implemented by storing all the body parts within the multipart/related in an otherwise empty folder/directory. The mail client may have to rewrite the HTML, replacing URI-s with (possibly relative) URL-s which the Web browser can resolve as file names in the same directory/folder where the HTML document itself is stored when turning it over to the Web browser. Problems with such rewriting of URIs is discussed in chapter 5 below. 4.3 Method 3: Using a translation table +---------+ +--------+ | Web | | Mail | | browser | | client | +-------+-+ +-+------+ | | +--+------------------------------+-+ | +--------+ +--+ +--+ | | | Trans- | | | | | Related | Figure 2 | | lation | | | | | body part | | | table | | | | | parts | | +--------+ +--+ +--+ | +-----------------------------------+ An alternative to rewriting the HTML file before turning it over to the Web browser may be to use a translation table, in case the Web browser has the capability to use such a table to rewrite URL-s on the fly while displaying the document (Figure 2). This requires that the Web browser is capable of receiving CID: URL-s and resolving them using this translation table in the same way as for other URL-s. 4.4 Method 4: Using a proxy HTTP server to retrieve referenced body parts +--------+ +-----------+ +--------+ | Proxy | | Data base | | Mail | | web |-------| of cached |-------| server | | server | | objects | | | +----+---+ +-----------+ +----+---+ | | +----+----+ +----+---+ Figure 3 | Web | | Mail | | browser | | client | +-------+-+ +-+------+ | | +--+------------------------------+-+ | Start HTML object | +-----------------------------------+ Yet another method is to use a proxy web server, to which the web browser requests are sent, and which will then use the cached body parts instead of normal web retrieval from the network (Figure 3). If the Web browser is set to use this proxy server for all URL-s, including CID URL-s, no rewriting of the HTML will be necessary. 4.5 Method 5: Putting the mail client into a proxy HTTP server +--------+--------+ | Proxy | Mail | | HTTP | client | | server | | +--------+--------+ | HTTP protocol Figure 4 | +----+----+ | Web | | browser | +---------+ A mail client can also be included in an HTTP server (Figure 4). The user will then not have to install any mail client software in his personal computer, all the mail functionality is mapped on HTTP and HTML elements. 4.6 Other methods The mail client and the web browser can of course communicate in other ways, such as using inter-process communication. 4.7 Communication between web browser mail client Many web browsers have API-s to allow other programs to communicate with them. There is however no accepted real or de-facto standard for such API-s, which means that a mail program which relies on such API-s will only be able to use those Web browser, whose API they support. Note however, that most of the methods described above can be implemented with a very minimal such API. The only API function needed is to be able to tell a Web browser, when it is started, to open a particular file. And this API function is a standardized part of the operating system on most platforms. In particular, method 1 and 3 above uses the functionality that a relative URL is resolved with the location of the base document as base. This means that if the base document is a file, relative URL-s will be resolved as FILE URL-s in the same directory/folder where the HTML document itself is placed. There is a need for buttons in the Web page which the user can use to get back to the mail program again after reading the mail with the Web browser. A common technique to achieve this is to define a new MIME data type for this button. The Web browser is then configured to transfer control to the mail client when the user pushes this button, i.e. downloads a file of this new MIME type. 5. Problems with rewriting URIs when copying HTML documents Sending of HTML-formatted messages is based on the assumption that an HTML documents, together with in-line objects like images, applets and frames, can be copied into a MIME message. Such copying may require rewriting of URIs containing references between the different message parts. The MHTML standard [MHTML] has been carefully prepared to allow existing web pages to be copied without such rewriting, through the use of the Content-Base and Content-Location MIME content heading fields. There is however a problem if the source HTML document contains relative URIs in parameters to objects and applets, such as in the example below: From: foo1@bar.net To: foo2@bar.net Subject: A simple example Mime-Version: 1.0 Content-Type: multipart/related; boundary="boundary-example-1"; type=Text/HTML Content-Base: "http://www.ietf.cnri.reston.va.us" --boundary-example 1 Content-Type: Text/HTML; charset=US-ASCII ... text of the HTML document... ...etc... --boundary-example-1 Content-Location: "image.gif" Content-Type: IMAGE/GIF Content-Transfer-Encoding: BASE64 R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5 ..etc... --boundary-example-1-- Only the object might know that the imageurl parameter is a relative URI. It's nearly impossible for the HTML parser to understand that the parameter is a relative URI. Simply searching for "image.gif" is not robust, as the string "image.gif" may be used elsewhere. URIs in scripts can also have similar problems. One might envisage even more difficult cases, an applet might take a parameter "subject" and another parameter "range" and when subject="auto" and range="1-5" it could compute, and try to use auto1.gif, auto2.gif ... auto5.gif as relantive URLs. Some implementation methods described in chapter 4 above, for example method 2 described in chapter 4.2, may require rewriting of the URIs in the HTML document. There is no perfect solution to this problem. One way of alleviating the problem is to produce the original document using only absolute URIs, preferably of the CID type, since they are more easily identifiable. Another way of alleviating the problem is to make all URIs and Content-Locations into simple relative URIs containing file names only (without paths, preferably using a file name format common to most platforms, i.e. 1-6 ascii letters or digits, a period, and 1-3 extension ascii letters or digits). An implementation using method 2 described in chapter 4.2 above can then just store the parts as files in an empty directory on the recipient computer with the Content-Locations as file names, and then turn the start HTML file over to a web browser, and need not rewrite the URIs at all. This simple variant of use of the MHTML standard is probably most robust, and those implementors who can control the production of the HTML documents to be sent are thus recommended to use this variant. 6. Caching of body parts Suppose a message contains body parts with the Content-Location header as defined in [MHTML]. A receiving agent might then put this body part into a web cache, with the URI in the Content-Location as its name, so that later retrievals of this URI use the cached body parts. There is however no guarantee that such a cached item is correct. Such caching is thus not recommended for use in other ways than for resolution of links within one particular MIME message. The MHTML standard does not cover links between different messages, but if you want to implement this, use of Content-ID and/or Message-ID, rather than Content-Location, is recommended. If incoming messages are stored in a store where messages can be automatically deleted (purged), purging of body parts should not occur before purging of the whole message, to which they belong. If an incoming message contains a body part which is linked via Content- Location, then no HTTP lookup should be performed to check if the body part is recent. The message should thus still contain the old HTML document, even if the HTTP-available document has been revised. (Example: "Here is the weather map of October 29, 1997"). Exception from this is: (a) If the linked document is not enclosed in the message, but referred to via Content-Type: message/external-body, then the latest version' should be shown using ordinary HTTP caching conventions. (b) If a new message is sent with a Supersedes reference to the old message, the old message should still show the old version of all the body parts, but it might be wise to inform the user that a superseding message is available. 7. Recipients which cannot handle the Multipart/related Content-Type A message sent according to the specifications in [MHTML] may have recipients, whose mailers cannot handle the Multipart/related Content-Type in the way specified in [MHTML]. According to [MIME1] a mailer which encounters an unknown subtype to Multipart, should handle this as Multipart/mixed. To improve this, Multipart/alternative can be used as discussed in section 8 of this memo. Content-Disposition, as specified in [CONDISP] and in [MHTML], section 10, can also be used as an aid to mailers which do not understand Multipart/related. Captions on images, which are included in the HTML text, might for non-HTML-capable recipients be found in the Content-Description header [CONDISP]. Do not assume, however, that HTML-capable user agents will display the Content-Description header, they may assume that this information is included in the HTML text instead. 8. Use of the Content-Type: Multipart/alternative If the message is sent to recipients, all of which may not have mailers capable of handling the Text/HTML content-type, then the "Content-Type: Multipart/Alternative" [MIME1] can be used in two ways: 8.1 Multipart/alternative inside Multipart/related The Multipart/alternative is put inside the "Content-Type Multipart/related", body parts can be specified with "Content-Type: Text/plain" as the first choice, and "Content-Type: Text/HTML" as the second choice. Example: Content-Type: Multipart/related; boundary="boundary-example-1"; type=MULTIPART/ALTERNATIVE --boundary-example 1 Content-Type: MULTIPART/ALTERNATIVE Boundary: boundary-example-2 --boundary-example-2 Content-Type: Text/plain ... plain text version of the document for recipients whose mailers cannot handle Text/HTML ... --boundary-example-2 Content-Type: Text/HTML; charset=US-ASCII Content-ID: content-id-example@example.host ... text of the HTML document ... --boundary-example-2-- --boundary-example-1 Content-Type: Image/GIF ... a body part, to which the HTML document has a link ... --boundary-example-1-- Note that the type parameter of Multipart/related in this case should be Multipart/alternative and not Text/HTML. 8.2 Multipart/alternative outside Multipart/related The multipart/alterantive is put outside the Multipart/Related, with Multipart/Related as one alternative and Multipart/Mixed as the other alternative. Note however that the [MHTML] does not recommend links from inside Multipart/Related to objects outside of the Multipart/Related, so putting inline images outside the Multipart/Related is not suitable. Instead, such inline images may have to repeated in both branches of the multipart/alternative with this method. Example: Content-Type: MULTIPART/ALTERNATIVE Boundary: boundary-example-1 --boundary-example-1 Content-Type: Multipart/mixed; boundary="boundary-example-3" --boundary-example-3 Content-Type: Text/plain; charset=US-ASCII ... plain text version of the message for recipients whose mailers cannot handle Text/HTML ... --boundary-example-3 Content-Type: Image/GIF ... A picture associated with the plain text message ... --boundary-example-3-- --boundary-example-1 Content-Type: Multipart/related; boundary="boundary-example-1"; type=Text/HTML --boundary-example 2 Content-Type: Text/HTML; charset=US-ASCII Content-ID: content-id-example@example.host ... text of the HTML document ... --boundary-example-2 Content-Type: Image/GIF ... a body part, to which the HTML document has a link ... --boundary-example-2-- --boundary-example-1-- 8.3 Comparing the two methods When choosing between these two methods of employing multipart/alternative, note the following: (1) Clients which do not support Multipart/related, and which thus will interpret it as Multipart/mixed, will with choice 8.1 display the inline objects. Thus, a recipient whose mailer can handle image/gif but not multipart/related will still be shown the images, they will not be suppressed by being inside a suppressed branch of the Multipart/alternative. (2) Choice 8.2 will not show inline images in the Multipart/Related, unless this information is repeated in both branches of the Multipart/Alternative. A general warning: Some mailers do not support "Content-Type: Multipart/alternative", and may then interpret it as Multipart/mixed. 9. Recipient may not have full Internet connectivity The recipient of a message sent by email may not always have full Internet connectivity. The recipient may be behind a gateway or firewall which prohibits or restricts Internet connectivity. This means that the recipient may not be able to resolve URI-s in an email message, unless the referred-to documents are included in the email message itself. Thus, it is often suitable to include in an email message all documents which are referred to (directly or indirectly) by URI-s in the message. This may of course not always be possible, in some cases the set of referred-to documents (directly or indirectly) may be the whole WWW document space, i.e. millions of documents. A choice must then be made how much to include. Of course, it is most important to include all inline objects, i.e. objects linked by such hyperlinks as IMG, etc., which specify that the linked objects are to be shown to the user immediately. In the case of ACTION elements in HTML forms, by making these ACTION elements of the "mailto:" URL type, rather than the "http:" URL type, you will enable also recipients without full Internet connectivity to fill in and send in your forms. The HTML specification [HTML2] allows default action when no ACTION element is included, but this default action may not be suitable when sending the HTML document via email. Thus, it is better to always put an explicit ACTION element into HTML forms sent by email. 10. Encoding of non-ascii characters Displayed text Displayed text | ^ V | +-------------+ +----------------+ | HTML editor | | HTML viewer | | | | or Web browser | +-------------+ +----------------+ | ^ V | HTML markup HTML markup | ^ V | +---------+ +---------------+ +-------------+ +---------------+ | MIME | | MIME content- | | MIME | | MIME content- | | encap- | | transfer- | | heading | | transfer- | | sulator | | encoder | | interpreter | | decoder | +---------+ +---------------+ +-------------+ +---------------+ | | ^ ^ V V +-----------+ | | MIME heading + MIME content->| Transport |->MIME heading + MIME content +-----------+ Figure 5 Definitions (see Figure 5): Displayed text A visual representation of the intended text. HTML markup A sequence of characters formatted according to the HTML specification [HTML2]. MIME content A sequence of octets physically forwarded via email, may use MIME content-transfer-encoding as specified in [MIME1]. HTML editor Software used to produce HTML markup. MIME content- Software used to encode non-US-ASCII characters transfer-encoder as specified in [MIME1]. MIME content- Software used to decode non-US-ASCII characters transfer-decoder as specified in [MIME1]. MIME heading Software used to interpret the information in MIME interpreter headings. HTML viewer Software used to display HTML documents to recipients. Some implementations may have a choice of whether to represent non-ascii characters at the HTML layer (using "&" entity references or numeric character references as defined in [HTML2] section 3.2.1) or at the MIME layer (using Content-Transfer-Encoding as defined in [MIME1] section 5). In choosing between these two representation methods, note the following effects: (1) Modifying HTML markup may disrupt security content integrity checks. (2) The choice of modifying HTML markup may be more suitable for recipients whose mailers do not support MIME. (3) Using MIME Content-Transfer-Encoding may be more suitable for recipients who have MIME-compliant mailers but do pass the text over to a web browser. 11. Conversion from HTTP to MIME Information received or retrieved using HTTP cannot always be sent unchanged as email using the "Content-Type: Text/HTML", because of the restrictions which MIME places on the format of "Content-Type: Text/HTML". The same problem may occur for documents retrieved via HTTP, which are in other textual formats than HTML. In particular, note the following: (a) Content-encodings allowed in HTTP, but not allowed in MIME, must be removed. (b) HTTP allows line breaks as bare CRs or bare LFs or something else, while MIME only allows line breaks as CRLF in subtypes of the Text content-type. (c) HTTP allows character sets like Unicode-1-1, which do not represent line breaks as CRLFs, such text may have to be rewritten to character sets like Unicode-1-1-UTF-7 in which line breaks are represented as CRLFs. A good overview of the differences, with regard to the use of "Content-Type: Text", between MIME and HTTP, can be found in [HTTP] appendix C. If you want to send HTTP unchanged via email, you might consider using the "Content-Type: Message/HTTP" instead of the "Content-Type: Text/HTML". 12. Acknowledgments Harald Tveit Alvestrand, Richard Baker, Dave Crocker, Martin J. Duerst, Roy Fielding, Lewis Geer, Al Gilman, Paul Hoffman, Alexander Hopmann, Mark K. Joseph, Greg Herlihy, Valdis Kletnieks, Daniel LaLiberte, Ed Levinson, Jay Levitt, Albert Lunde, Larry Masinter, Keith Moore, Gavin Nicol, Pete Resnick, Jon Smirl, Einar Stefferud, Jamie Zawinski and several other people have helped us with preparing this memo. I alone take responsibility for any errors which may still be in the memo. 13. References Temporary note: This list contains some references to Internet drafts. It is anticipated that these Internet drafts will become RFC-s before this memo. The references will then in this memo be changed to refer to the corresponding RFC instead. This list also includes some RFC-s which are not up to date, and which will be replaced by new memos presently in ietf draft status. Ref. Author, title --------- ------------------------------------------------------- [CONDISP] R. Troost, S. Dorner: "Communicating Presentation Information in Internet Messages: The Content- Disposition Header", RFC 1806, June 1995. [HOSTS] R. Braden (editor): "Requirements for Internet Hosts -- Application and Support", STD-3, RFC 1123, October 1989. [HTML2] T. Berners-Lee, D. Connolly: "Hypertext Markup Language - 2.0", RFC 1866, November 1995. [HTTP] T. Berners-Lee, R. Fielding, H. Frystyk: Hypertext Transfer Protocol -- HTTP/1.0. RFC 1945, May 1996. [MHTML] J. Palme & A. Hopmann: "Packaging Aggregate HTML Objects in MIME Email", draft-ietf-mhtml-rev- 02.txt , October 1997. [MIDCID] E. Levinson: "Message/External-Body Content-ID Access Type", draft-ietf-mhtml-cid-v2-00.txt, July, 1997. [MIME1] N. Freed & N. Borenstein: "MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies", RFC 2045, November 1996. [MIME2] N. Freed & N. Borenstein: "Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types". RFC 2046, November 1996. [NEWS] M.R. Horton, R. Adams: "Standard for interchange of USENET messages", RFC 1036, December 1987. [REL] Harald Tveit Alvestrand, Edward Levinson: "The MIME Multipart/Related Content-type", , August 1997. [RELURL] R. Fielding: "Relative Uniform Resource Locators", RFC 1808, June 1995. [RFC822] D. Crocker: "Standard for the format of ARPA Internet text messages." STD 11, RFC 822, August 1982. [SMTP] J. Postel: "Simple Mail Transfer Protocol", STD 10, RFC 821, August 1982. [URL] T. Berners-Lee, L. Masinter, M. McCahill: "Uniform Resource Locators (URL)", RFC 1738, December 1994. [URLBODY] N. Freed and Keith Moore: "Definition of the URL MIME External-Body Access-Type", RFC 2017, October 1996. 14. Author's Address Jacob Palme Phone: +46-8-16 16 67 Stockholm University and KTH Fax: +46-8-783 08 29 Electrum 230 Email: jpalme@dsv.su.se S-164 40 Kista, Sweden Working group chairman: Einar Stefferud