Network Working Group Jacob Palme Internet Draft Stockholm University/KTH draft-ietf-mhtml-info-10.txt Category-to-be: Informational Expires: December 1998 June 1998 Sending HTML in MIME, an informational supplement to the RFC: MIME Encapsulation of Aggregate Documents, such as HTML (MHTML) Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as ``work in progress.'' To view the entire list of current Internet-Drafts, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). This memo provides information for the Internet community. This' memo does not specify an Internet standard of any kind. Distribution of this memo is unlimited. Copyright (C) The Internet Society 1998. All Rights Reserved. 1. Abstract The memo "MIME Encapsulation of Aggregate Documents, such as HTML (MHTML)" (draft-ietf-mhtml-rev-05.txt) specifies how to send packaged aggregate HTML objects in MIME format. This memo is an accompanying informational document, intended to be an aid to developers. This document is not an Internet standard. Issues discussed are implementation methods, caching strategies, problems with rewriting of URIs, making messages suitable both for mailers which can and which cannot handle Multipart/related and handling recipients which do not have full Internet connectivity. Differences from the previous version 9 of this draft (1) A paragraph about one disadvantage with MAILTO action elements has been added to section 10. (2) A new section 13: Default font size has been added (3) A new temporary section "Issue list" immediately below has been added Issue list Section in Issue description this draft 4 Should some more method of communication between html viewer and e-mail program be described? Are the methods correctly described? 5 Are there any more problems with rewriting URIs which should be described in section 5? 8 Is it OK to say that senders should not assume that recipients will show the value of Content-Description inside Multipart/Related (since HTML has other methods of showing this, for example the
And here is the IETF logo with transparent background:
--boundary-example-1
Content-Location: ietflogo.gif
Content-Base: http://www.ietf.cnri.reston.va.us/images/
Content-Type: IMAGE/GIF
Content-Transfer-Encoding: BASE64
R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5
NSBJRVRGLiBVbmF1dGhvcml6ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A
etc...
--boundary-example-1--
.
Saving the above message as text might give the following file:
From: Alice
And here is the IETF logo with transparent background:
Saving the same text as aggregate might give the following file
From: Alice
And here is the IETF logo with transparent background:
--boundary-example-1
Content-Location: ietflogo.gif
Content-Base: http://www.ietf.cnri.reston.va.us/images/
Content-Type: IMAGE/GIF
Content-Transfer-Encoding: BASE64
R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5
NSBJRVRGLiBVbmF1dGhvcml6ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A
etc...
--boundary-example-1--
Saving the same text as archiving aggregate might give the following file
(where the missing body part is fetched through http and added to the
saved file):
From: Alice
And here is the IETF logo with transparent background:
--boundary-example-1
Content-Location: ietflogo.gif
Content-Base: http://www.ietf.cnri.reston.va.us/images/
Content-Type: IMAGE/GIF
Content-Transfer-Encoding: BASE64
R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5
NSBJRVRGLiBVbmF1dGhvcml6ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A
etc...
--boundary-example-1
Content-Location: ietflogo2e.gif
Content-Base: http://www.ietf.cnri.reston.va.us/images/
Content-Type: IMAGE/GIF
Content-Transfer-Encoding: BASE64
R0lGODlhGAGgANX/ACkpKTExMTk5OUJCQkpKSlJSUlpaWmNjY2tra3Nzc3t7e4
SEhIyMjJSUlJycnKWlpa2trbW1tcDAwM7Ozv/eQnNzjHNzlGtrjGNjhFpae1pa
etc...
--boundary-example-1--
Saving the same message as message might give the following file:
from:
And here is the IETF logo with transparent background:
--boundary-example-1
Content-Location: ietflogo.gif
Content-Base: http://www.ietf.cnri.reston.va.us/images/
Content-Type: IMAGE/GIF
Content-Transfer-Encoding: BASE64
R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5
NSBJRVRGLiBVbmF1dGhvcml6ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A
etc...
--boundary-example-1--
--boundary-example-2--
8. Recipients which cannot handle the Multipart/related Content-Type
A message sent according to the specifications in [MHTML] may have
recipients, whose mailers cannot handle the Multipart/related
Content-Type in the way specified in [MHTML].
According to [MIME1] a mailer which encounters an unknown subtype to
Multipart, should handle this as Multipart/mixed.
To improve this, Multipart/alternative can be used as discussed in
section 9 of this memo.
Content-Disposition, as specified in [CONDISP] and in [MHTML], section
10, can also be used as an aid to mailers which do not understand
Multipart/related.
Captions on images, which are included in the HTML text, might for
non-HTML-capable recipients be found in the Content-Description header
[CONDISP]. Do not assume, however, that HTML-capable user agents will
display the Content-Description header, they may assume that this
information is included in the HTML text instead.
9. Use of the Content-Type: Multipart/alternative
If the message is sent to recipients, all of which may not have mailers
capable of handling the Text/HTML content-type, then the "Content-Type:
Multipart/Alternative" [MIME1] can be used in two ways:
9.1 Multipart/alternative inside Multipart/related
The Multipart/alternative is put inside the "Content-Type
Multipart/related", body parts can be specified with "Content-Type:
Text/plain" as the first choice, and "Content-Type: Text/HTML" as the
second choice.
Example:
Content-Type: Multipart/related; boundary="boundary-example-1";
type=MULTIPART/ALTERNATIVE
--boundary-example 1
Content-Type: MULTIPART/ALTERNATIVE
Boundary: boundary-example-2
--boundary-example-2
Content-Type: Text/plain
... plain text version of the document for recipients
whose mailers cannot handle Text/HTML ...
--boundary-example-2
Content-Type: Text/HTML; charset=US-ASCII
Content-ID: content-id-example@example.host
... text of the HTML document ...
--boundary-example-2--
--boundary-example-1
Content-Type: Image/GIF
... a body part, to which the HTML document has a link ...
--boundary-example-1--
Note that the type parameter of Multipart/related in this case should be
Multipart/alternative and not Text/HTML.
9.2 Multipart/alternative outside Multipart/related
The multipart/alternative is put outside the Multipart/Related, with
Multipart/Related as one alternative and Multipart/Mixed as the other
alternative. Note however that the [MHTML] does not recommend links from
inside Multipart/Related to objects outside of the Multipart/Related, so
putting inline images outside the Multipart/Related is not suitable.
Instead, such inline images may have to repeated in both branches of the
multipart/alternative with this method.
Example:
Content-Type: MULTIPART/ALTERNATIVE
Boundary: boundary-example-1
--boundary-example-1
Content-Type: Multipart/mixed; boundary="boundary-example-3"
--boundary-example-3
Content-Type: Text/plain; charset=US-ASCII
... plain text version of the message for recipients
whose mailers cannot handle Text/HTML ...
--boundary-example-3
Content-Type: Image/GIF
... A picture associated with the plain text message ...
--boundary-example-3--
--boundary-example-1
Content-Type: Multipart/related; boundary="boundary-example-1";
type=Text/HTML
--boundary-example 2
Content-Type: Text/HTML; charset=US-ASCII
Content-ID: content-id-example@example.host
... text of the HTML document ...
--boundary-example-2
Content-Type: Image/GIF
... a body part, to which the HTML document has a link ...
--boundary-example-2--
--boundary-example-1--
9.3 Comparing the two methods
When choosing between these two methods of employing
multipart/alternative, note the following:
(1) Clients which do not support Multipart/related, and which thus will
interpret it as Multipart/mixed, will with choice 9.1 display
the inline objects. Thus, a recipient whose mailer can handle
image/gif but not multipart/related will still be shown the images,
they will not be suppressed by being inside a suppressed branch of
the Multipart/alternative.
(2) Choice 9.2 will not show inline images in the Multipart/Related,
unless this information is repeated in both branches of the
Multipart/Alternative.
A general warning: Some mailers do not support "Content-Type:
Multipart/alternative", and may then interpret it as Multipart/mixed,
even though support of multipart/alternative is required for MIME
conformance.
9.4 Reducing the download time
If a message is sent as multipart/alternative, this would normally mean
that the mail client downloads both variants, and then shows only one of
the to the user. This will thus increase the download time. A way of
avoiding this problem is to use the FETCH command of IMAP, which allows a
client to download only certain body parts from a multipart message.
10. Recipient may not have full Internet connectivity
The recipient of a message sent by email may not always have full
Internet connectivity. The recipient may be behind a gateway or firewall
which prohibits or restricts Internet connectivity.
This means that the recipient may not be able to resolve URI-s in an
email message, unless the referred-to documents are included in the email
message itself. Thus, it is often suitable to include in an email message
all documents which are referred to (directly or indirectly) by URI-s in
the message. This may of course not always be possible, in some cases the
set of referred-to documents (directly or indirectly) may be the whole
WWW document space, i.e. millions of documents. A choice must then be
made how much to include. Of course, it is most important to include all
inline objects, i.e. objects linked by such hyperlinks as IMG, etc.,
which specify that the linked objects are to be shown to the user
immediately.
In the case of ACTION elements in HTML forms, by making these ACTION
elements of the "mailto:" URL type, rather than the "http:" URL type, you
will enable also recipients without full Internet connectivity to fill in
and send in your forms. The HTML specification [HTML2] allows default
action when no ACTION element is included, but this default action may
not be suitable when sending the HTML document via email. Thus, it is
better to always put an explicit ACTION element into HTML forms sent by
email.
A disadvantage with the "mailto:" URL as ACTION, however, is that this
may not work if the user has not specified his e-mail address in the
preferences of this HTML viewer. This is common for multi-user
workstations.
11. Encoding of non-ascii characters
Displayed text Displayed text
| ^
V |
+-------------+ +----------------+
| HTML editor | | HTML viewer |
| | | or Web browser |
+-------------+ +----------------+
| ^
V |
HTML markup HTML markup
| ^
V |
+---------+ +---------------+ +-------------+ +---------------+
| MIME | | MIME content- | | MIME | | MIME content- |
| encap- | | transfer- | | heading | | transfer- |
| sulator | | encoder | | interpreter | | decoder |
+---------+ +---------------+ +-------------+ +---------------+
| | ^ ^
V V +-----------+ | |
MIME heading + MIME content->| Transport |->MIME heading + MIME content
+-----------+
Figure 5
Definitions (see Figure 5):
Displayed text A visual representation of the intended text.
HTML markup A sequence of characters formatted according to the
HTML specification [HTML2].
MIME content A sequence of octets physically forwarded via email,
may use MIME content-transfer-encoding as specified
in [MIME1].
HTML editor Software used to produce HTML markup.
MIME content- Software used to encode non-US-ASCII characters
transfer-encoder as specified in [MIME1].
MIME content- Software used to decode non-US-ASCII characters
transfer-decoder as specified in [MIME1].
MIME heading Software used to interpret the information in MIME
interpreter headings.
HTML viewer Software used to display HTML documents to recipients.
Some implementations may have a choice of whether to represent non-ascii
characters at the HTML layer (using "&" entity references or numeric
character references as defined in [HTML2] section 3.2.1) or at the MIME
layer (using Content-Transfer-Encoding as defined in [MIME1] section 5).
In choosing between these two representation methods, note the following
effects:
(1) Modifying HTML markup may disrupt security content integrity
checksums. If the checksums are computed between the HTML editor
and the MIME encapsulator, then making the encoding in the MIME
encapsulator will not break the checksums.
(2) The choice of modifying HTML markup may be more suitable for
recipients whose mailers do not support MIME.
(3) Using MIME Content-Transfer-Encoding may be more suitable for
recipients who have MIME-compliant mailers but do pass the text over
to a document viewer (web browser).
12. Conversion from HTTP to MIME
Information received or retrieved using HTTP cannot always be sent
unchanged as email using the "Content-Type: Text/HTML", because of the
restrictions which MIME places on the format of "Content-Type:
Text/HTML". The same problem may occur for documents retrieved via HTTP,
which are in other textual formats than HTML. In particular, note the
following:
(a) Content-encodings allowed in HTTP, but not allowed in MIME, must
be removed.
(b) HTTP allows line breaks as bare CRs or bare LFs or something
else, while MIME only allows line breaks as CRLF in subtypes
of the Text content-type.
(c) HTTP allows character sets like Unicode-1-1, which do not
represent line breaks as CRLFs, such text may have to be
rewritten to character sets like Unicode-1-1-UTF-7 in which
line breaks are represented as CRLFs.
A good overview of the differences, with regard to the use of
"Content-Type: Text", between MIME and HTTP, can be found in [HTTP]
appendix C.
If you want to provide web documents, which can be sent through e-mail
without modification (which might break integrity checksums), then you
SHOULD provide them up in the canonical form, with line breaks as CRLF,
and avoid lines longer than 76 characters/line.
If you want to send HTTP unchanged via email, you might consider using
the "Content-Type: Message/HTTP" instead of the "Content-Type:
Text/HTML". Note that with this Content-Type, the whole object, as sent
through HTTP, can be encoded as a single object with, for example, BASE64
encoding. After decoding of the BASE64, the resulting object can have
HTTP peculiar formats, like single LF or single CR between lines.
However, some mailers may not be capable of handling the Message/HTTP
Content-Type.
Example, the binary part of the following message
Content-Type: message/http
Content-Transfer-Encoding: base64
SFRUUC8xLjEgMjAwIE9LDURhdGU6IFNhdCwgMTQgRmViIDE5OTggMTM6MDM6MzggR01U
DVNlcnZlcjogQXBhY2hlLzEuMi40DUxhc3QtTW9kaWZpZWQ6IFdlZCwgMjMgSnVsIDE5
... ... ...
might, when the base64 encoding above is decoded, yield:
HTTP/1.1 200 OK
Date: Sat, 14 Feb 1998 13:03:38 GMT
ETag: "43788-124-33d658c5"
Content-Length: 292
Accept-Ranges: bytes
Content-Type: text/html
... ...
13. Default font size
Many HTML editors and viewers allow the user to specify the size of the
default font ( or according to personal
wishes, for example 10 pt or 12 pt or 14 pt depending on eye sight and
screen distance. This setting should *not* cause a change in the FONT
SIZE= value in the generated HTML which is produced and sent. The reason
for this is that otherwise users may inadvertently send whole letters
with the text in or , which may be easy to
read for the sender but difficult to read for some recipients.
Similarly, a user choice of default FONT, to for example GENEVA or ARIAL,
should not cause or to be sent. User
who wish to send e-mail with or must
explicitly specify this, for example using a FONT command in their HTML
editor or e-mail text editor.
14. Acknowledgments
Harald Tveit Alvestrand, Richard Baker, Dave Crocker, Martin J. Duerst,
Roy Fielding, Lewis Geer, Al Gilman, Paul Hoffman, Alexander Hopmann,
Mark K. Joseph, Greg Herlihy, Valdis Kletnieks, Daniel LaLiberte, Ed
Levinson, Jay Levitt, Albert Lunde, Larry Masinter, Keith Moore, Gavin
Nicol, Pete Resnick, Jon Smirl, Einar Stefferud, Jamie Zawinski and
several other people have helped us with preparing this memo. I alone
take responsibility for any errors which may still be in the memo.
15. References
Temporary note: This list contains some references to Internet drafts. It
is anticipated that these Internet drafts will become RFC-s before this
memo. The references will then in this memo be changed to refer to the
corresponding RFC instead. This list also includes some RFC-s which are
not up to date, and which will be replaced by new memos presently in ietf
draft status.
Ref. Author, title
--------- -------------------------------------------------------
[CONDISP] R. Troost, S. Dorner: "Communicating Presentation
Information in Internet Messages: The Content-
Disposition Header", RFC 1806, June 1995.
[HOSTS] R. Braden (editor): "Requirements for Internet Hosts --
Application and Support", STD-3, RFC 1123, October
1989.
[HTML2] T. Berners-Lee, D. Connolly: "Hypertext Markup Language
- 2.0", RFC 1866, November 1995.
[HTTP] T. Berners-Lee, R. Fielding, H. Frystyk: Hypertext
Transfer Protocol -- HTTP/1.0. RFC 1945, May 1996.
[MHTML] J. Palme & A. Hopmann: "Packaging Aggregate HTML
Objects in MIME Email", draft-ietf-mhtml-rev-
02.txt , October 1997.
[MIDCID] E. Levinson: "Message/External-Body Content-ID Access
Type", draft-ietf-mhtml-cid-v2-00.txt, July, 1997.
[MIME1] N. Freed & N. Borenstein: "MIME (Multipurpose Internet
Mail Extensions) Part One: Mechanisms for Specifying
and Describing the Format of Internet Message Bodies",
RFC 2045, November 1996.
[MIME2] N. Freed & N. Borenstein: "Multipurpose Internet Mail
Extensions (MIME) Part Two: Media Types". RFC 2046,
November 1996.
[NEWS] M.R. Horton, R. Adams: "Standard for interchange of
USENET messages", RFC 1036, December 1987.
[REL] Harald Tveit Alvestrand, Edward Levinson: "The MIME
Multipart/Related Content-type",