Network Working Group M. Duerst Internet-Draft Aoyama Gakuin University Obsoletes: 2368 (if approved) L. Masinter Expires: April 27, 2006 Adobe Systems Incorporated J. Zawinski DNA Lounge October 24, 2005 The mailto URI scheme draft-duerst-mailto-bis-01 Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on April 27, 2006. Copyright Notice Copyright (C) The Internet Society (2005). Abstract This document defines the format of Uniform Resource Identifiers (URI) for references to electronic mail addresses. It updates the syntax of 'mailto' URIs from [RFC2368] for better compatibility with IRIs ([RFC3987]). Duerst, et al. Expires April 27, 2006 [Page 1] Internet-Draft The mailto URI scheme October 2005 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Syntax of a mailto URI . . . . . . . . . . . . . . . . . . . . 3 3. Semantics and Operations . . . . . . . . . . . . . . . . . . . 5 4. Unsafe Headers . . . . . . . . . . . . . . . . . . . . . . . . 6 5. Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 6. Deployment of UTF-8-Based Percent-Encoding . . . . . . . . . . 6 7. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 7.1. Examples Conforming to RFC2368 . . . . . . . . . . . . . . 7 7.2. Examples of Complicated Email Addresses . . . . . . . . . 8 7.3. Examples Using UTF-8-Based Percent-Encoding . . . . . . . 8 8. Security Considerations . . . . . . . . . . . . . . . . . . . 10 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 10. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 11 10.1. Changes between draft 00 and draft 01 . . . . . . . . . . 11 10.2. Changes from RFC 2368 . . . . . . . . . . . . . . . . . . 11 11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 12 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 12 12.1. Normative References . . . . . . . . . . . . . . . . . . . 12 12.2. Informative References . . . . . . . . . . . . . . . . . . 13 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 14 Intellectual Property and Copyright Statements . . . . . . . . . . 15 Duerst, et al. Expires April 27, 2006 [Page 2] Internet-Draft The mailto URI scheme October 2005 1. Introduction The mailto URI scheme is used to identify resources that are reached using Internet mail. In its simplest form, a mailto URI contains an Internet mail address. For interaction with resources that requires message headers or message bodies to be specified, the mailto URI scheme also allows setting mail header fields and the message body. This specification extends the previous scheme definition to also allow character data to be percent-encoded based on UTF-8, which offers a better and more consistent way of dealing with non-ASCII characters. In this document, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in [RFC2119]. 2. Syntax of a mailto URI The syntax of a "mailto" URI is described using the ABNF of [RFC4234], and non-terminal symbol definitions from [RFC3986]: mailtoURI = "mailto:" [ to ] [ headers ] to = [ addr-spec *("%2C" addr-spec ) ] headers = "?" header *( "&" header ) header = hname "=" hvalue hname = *urlc hvalue = *urlc addr-spec = local-part "@" domain local-part = dot-atom / quoted-string "addr-spec" is as specified in [RFC2822], i.e. it is a mail address, possibly including "phrase" and "comment" components. However, the following changes apply: 1. All characters that can appear in "addr-spec" but are not in the unreserved category in [RFC3986] have to be percent-encoded. Examples are parentheses, commas, and the percent sign ("%"), which commonly occur in the "addr-spec" syntax. Care has to be taken both when encoding as well as when decoding to make sure these operations are applied only once. 2. "obs-local-part" and "NO-WS-CTL" as defined in [RFC2822] are not allowed. 3. Whitespace and comments within "local-part" are not allowed. They do not have any operational semantics. Duerst, et al. Expires April 27, 2006 [Page 3] Internet-Draft The mailto URI scheme October 2005 4. Percent-encoding can be used to denote non-ASCII characters in the part of a "mailbox" that denotes a domain name, in order to denote an internationalized domain name. The considerations for reg-name in [RFC3986] apply. In particular, non-ASCII characters must first be encoded according to UTF-8 [STD63], and then each octet of the corresponding UTF-8 sequence must be percent-encoded to be represented as URI characters. URI producing applications must not use percent-encoding in domain names unless it is used to represent a UTF-8 character sequence. When the internationalized domain name is used to compose a message, the name must be transformed to the IDNA encoding [RFC3490]. URI producers should provide these domain names in the IDNA encoding, rather than percent-encoded, if they wish to maximize interoperability with legacy mailto: URI interpreters. 5. Percent-encoding in the LHS of an email address is reserved for potential future internationalization. Non-ASCII characters must first be encoded according to UTF-8 [STD63], and then each octet of the corresponding UTF-8 sequence must be percent-encoded to be represented as URI characters. Any other percent-encoding of non-ASCII characters is prohibited. When a LHS containing non- ASCII characters will be used to compose a message, the LHS must be transformed to conform to whatever encoding may be defined in a future specification for the internationalization of email addresses. "hname" and "hvalue" are encodings of an [RFC2822] header name and value, respectively. As with "to", all URI reserved characters must be encoded. The special hname "body" indicates that the associated hvalue is the body of the message. The "body" hname should contain the content for the first text/plain body part of the message. The "body" hname is primarily intended for generation of short text messages for automatic processing (such as "subscribe" messages for mailing lists), not general MIME bodies. Within mailto URIs, the characters "?", "=", "&" are reserved. Because the "&" (ampersand) character is reserved in HTML and XML, any mailto URI which contains an ampersand must be spelled differently in HTML and XML than in other contexts. A mailto URI which appears in an HTML or XML document must escape the "&", e.g. as "&". Non-ASCII characters can be encoded in hvalue as follows: Duerst, et al. Expires April 27, 2006 [Page 4] Internet-Draft The mailto URI scheme October 2005 1. MIME encoded words (as defined in [RFC2047]) are permitted in header values, but not in an hvalue of a "body" hname. 2. Non-ASCII characters can be encoded according to UTF-8 [STD63], and then each octet of the corresponding UTF-8 sequence is percent-encoded to be represented as URI characters. When hvalues encoded in this way are used to compose a message, the hvalue must be transformed into MIME encoded words, except for an hvalue of a "body" hname, which has to be encoded according to [RFC2045]. Please note that for MIME encoded words and for bodies in composed email messages, encodings other than UTF-8 MAY be used as long as the characters are properly transcoded. MIME encoded words and UTF-8-based percent-encoding SHOULD not both be used in the same hvalue. Also note that it is legal to specify both "to" and an "hname" whose value is "to". That is, mailto:addr1%2C%20addr2 is equivalent to mailto:?to=addr1%2C%20addr2 is equivalent to mailto:addr1?to=addr2 3. Semantics and Operations A mailto URI designates an "internet resource", which is the mailbox specified in the address. When additional headers are supplied, the resource designated is the same address, but with an additional profile for accessing the resource. While there are Internet resources that can only be accessed via electronic mail, the mailto URI is not intended as a way of retrieving such objects automatically. In current practice, resolving URIs such as those in the "http" scheme causes an immediate interaction between client software and a host running an interactive server. The "mailto" URI has unusual semantics because resolving such a URI does not cause an immediate interaction. Instead, the client creates a message to the designated address with the various header fields set as default. The user can edit the message, send this message unedited, or choose not to send the message. The operation of how any URI scheme is resolved is not Duerst, et al. Expires April 27, 2006 [Page 5] Internet-Draft The mailto URI scheme October 2005 mandated by the URI specifications. 4. Unsafe Headers The user agent interpreting a mailto URI SHOULD choose not to create a message if any of the headers are considered dangerous; it may also choose to create a message with only a subset of the headers given in the URI. Only the Subject, Keywords, and Body headers are believed to be both safe and useful in the general case. In cases where the source of an URI is well known, and/or specific fields are limited to specific well-known values, other headers may be considered safe, too. The creator of a mailto URI cannot expect the resolver of a URI to understand more than the "subject" and "body" headers. Clients that resolve mailto URIs into mail messages should be able to correctly create [RFC2822]-compliant mail messages using the "subject" and "body" headers. 5. Encoding [RFC3986] requires that many characters in URIs be encoded. This affects the mailto scheme for some common characters that might appear in addresses, headers or message contents. One such character is space (" ", ASCII hex 20). Note the examples below that use "%20" for space in the message body. Also note that line breaks in the body of a message MUST be encoded with "%0D%0A". People creating mailto URIs must be careful to encode any reserved characters that are used in the URIs so that properly-written URI interpreters can read them. Also, client software that reads URIs must be careful to decode strings before creating the mail message so that the mail messages appear in a form that the recipient will understand. These strings should be decoded before showing the message to the user. The mailto URI scheme is limited in that it does not provide for substitution of variables. Thus, a message body that must include a user's email address can not be encoded using the mailto URI. This limitation also prevents mailto URIs that are signed with public keys and other such variable information. 6. Deployment of UTF-8-Based Percent-Encoding UTF-8-based percent-encoding should only be used in actual mailto Duerst, et al. Expires April 27, 2006 [Page 6] Internet-Draft The mailto URI scheme October 2005 URIs once it is well deployed in software that interprets mailto URIs (such as mail user agents). 7. Examples 7.1. Examples Conforming to RFC2368 URIs for an ordinary individual mailing address: A URI for a mail response system that requires the name of the file in the subject: A mail response system that requires a "send" request in the body: A similar URI could have two lines with different "send" requests (in this case, "send current-issue" and, on the next line, "send index".) An interesting use of mailto URIs is when browsing archives of messages. Each browsed message might contain a mailto URI like: A request to subscribe to a mailing list: A URI for a single user which includes a CC of another user: Another way of expressing the same thing: Note the use of the "&" reserved character, above. The following example, by using "?" twice, is incorrect: Duerst, et al. Expires April 27, 2006 [Page 7] Internet-Draft The mailto URI scheme October 2005 ; WRONG! According to [RFC2822], the characters "?", "&", and even "%" may occur in addr-specs. The fact that they are reserved characters in this URI scheme is not a problem: those characters may appear in mailto URIs, they just may not appear in unencoded form. The standard URI encoding mechanisms ("%" followed by a two-digit hex number) must be used in these cases. To indicate the address "gorby%kremvax@example.com" one would do: To indicate the address "unlikely?address@example.com", and include another header, one would do: As described above, the "&" (ampersand) character is reserved in HTML and must be replaced e.g. with "&". Thus, a complex URI that has internal ampersands might look like: Click mailto:?to=joe@xyz.com&cc=bob@xyz.com&body=hello to send a greeting message to Joe and Bob. 7.2. Examples of Complicated Email Addresses Following are a few examples of how to treat email addresses that contain complicated escaping syntax. Email address: "not@me"@example.org; corresponding mailto: URI: mailto:%22not%40me%22@example.org. Email address: "oh\\no"@example.org; corresponding mailto: URI: mailto:%22oh%5C%5Cno%22@example.org. Email address: "\\\"it's\ ugly\\\""@example.org; corresponding mailto: URI: mailto:%22%5C%5C%5C%22it's%22%20ugly%5C%5C%5C%22%22@example.org. 7.3. Examples Using UTF-8-Based Percent-Encoding Sending a mail with the subject "coffee" in French, i.e. "cafe" where the final e is an e-acute, using UTF-8 and percent-encoding: mailto:user@example.org?subject=caf%C3%A9 Duerst, et al. Expires April 27, 2006 [Page 8] Internet-Draft The mailto URI scheme October 2005 The same subject, this time using an encoded-word (escaping the "=" and "?" characters used in the encoded-word syntax, because they are reserved): mailto:user@example.org?subject=%3D%3Futf-8%3FQ%3Fcaf%3DC3%3DA9%3F%3D The same subject, this time encoded as iso-8859-1: mailto:user@example.org?subject=%3D%3Fiso-8859-1%3FQ%3Fcaf%3DE9%3F%3D Going back to straight UTF-8 and adding a body with the same value: mailto:user@example.org?subject=caf%C3%A9&body=caf%C3%A9 This mailto URI may result in a message looking like this: From: sender@example.net To: user@example.org Subject: =?utf-8?Q?caf=C3=A9?= Content-Type: text/plain;charset=utf-8 Content-Transfer-Encoding: quoted-printable caf=C3=A9 The software sending the email is not restricted to UTF-8, but can use other encodings. The following shows the same email using iso- 8859-1 two times: From: sender@example.net To: user@example.org Subject: =?iso-8859-1?Q?caf=E9?= Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable caf=E9 Different content transfer encodings (i.e. "8bit" or "base64" instead of "quoted-printable") and different encodings in encoded words (i.e. "B" instead of "Q") can also be used. For more examples of encoding the word coffee in different languages, see [RFC2324]. The following example uses the Japanese word "natto" (U+7D0D U+8C46) as a domain name label, sending a mail to a user at "natto".example.org: mailto:user@%E7%B4%8D%E8%B1%86.example.org?subject=Test&body=NATTO Duerst, et al. Expires April 27, 2006 [Page 9] Internet-Draft The mailto URI scheme October 2005 When constructing the email, the domain name label is converted to punycode. The resulting message may look as follows: From: sender@example.net To: user@xn--99zt52a.example.org Subject: Test Content-Type: text/plain Content-Transfer-Encoding: 7bit NATTO 8. Security Considerations The mailto scheme can be used to send a message from one user to another, and thus can introduce many security concerns. Mail messages can be logged at the originating site, the recipient site, and intermediary sites along the delivery path. If the messages are not encoded, they can also be read at any of those sites. A mailto URI gives a template for a message that can be sent by mail client software. The contents of that template may be opaque or difficult to read by the user at the time of specifying the URI. Thus, a mail client should never send a message based on a mailto URI without first showing the user the full message that will be sent (including all headers that were specified by the mailto URI), fully decoded, and asking the user for approval to send the message as electronic mail. The mail client should also make it clear that the user is about to send an electronic mail message, since the user may not be aware that this is the result of a mailto URI. A mail client should never send anything without complete disclosure to the user of what is will be sent; it should disclose not only the message destination, but also any headers. Unrecognized headers, or headers with values inconsistent with those the mail client would normally send should be especially suspect. MIME headers (MIME- Version, Content-*) are most likely inappropriate, as are those relating to routing (From, Bcc, Apparently-To, etc.) Note that some headers are inherently unsafe to include in a message generated from a URI. For example, headers such as "From:", "Bcc:", and so on, should never be interpreted from a URI. In general, the fewer headers interpreted from the URI, the less likely it is that a sending agent will create an unsafe message. Examples of problems with sending unapproved mail include: Duerst, et al. Expires April 27, 2006 [Page 10] Internet-Draft The mailto URI scheme October 2005 mail that breaks laws upon delivery, such as making illegal threats; mail that identifies the sender as someone interested in breaking laws; mail that identifies the sender to an unwanted third party; mail that causes a financial charge to be incurred on the sender; mail that causes an action on the recipient machine that causes damage that might be attributed to the sender. Programs that interpret mailto URIs should ensure that the SMTP "From" address is set and correct. The security considerations of [RFC3986], [RFC3490], [RFC3491], and also apply. [RFC3987] 9. IANA Considerations This document changes the definition of the mailto: URI scheme; the registry of URI schemes should refer to this document rather than its predecessor, [RFC2368]. 10. Change Log 10.1. Changes between draft 00 and draft 01 Added clarification about permitted syntax and escaping on email address LHS, and more complicated examples. Added text about more save headers in case origin or mailto URIs is known. Fixed date of [RFC3986] Added a sentence referencing [RFC2119] Added Jamie back in as a co-author. Changed address/affiliation for Martin. 10.2. Changes from RFC 2368 Duerst, et al. Expires April 27, 2006 [Page 11] Internet-Draft The mailto URI scheme October 2005 For interoperability with IRIs ([RFC3987]), allowed percent- encoding, fixed to UTF-8, in the domain name part of an email address, in LHS part of an address (currently reserved because not operationally usable), and in hvalue parts. Changed from 'URL' to 'URI' Updated references: ABNF to [RFC4234]; message syntax to [RFC2822], URI Generic Syntax to [RFC3986] Expanded "#mailbox", because the "#" shortcut is no longer available; needs checking 11. Acknowledgments This document was derived from [RFC2368]; the acknowledgments from this specification still applies. In addition, we thank Paul Hoffman for his work on [RFC2368]. Valuable input on this document was received from (in no particular order): Paul Hoffman, Charles Lindsey, Tim Kindberg, Frank Ellermann, Etan Wexler, and Michael Haardt. 12. References 12.1. Normative References [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", November 1996. [RFC2047] Moore, K., "MIME Part Three: Message Header Extensions for Non-ASCII Text", RFC 2047, November 1996. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2822] Resnik, P., "Internet Message Format", RFC 2822, April 2001. [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, "Internationalizing Domain Names in Applications (IDNA)", RFC 3490, March 2003. [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile for Internationalized Domain Names (IDN)", Duerst, et al. Expires April 27, 2006 [Page 12] Internet-Draft The mailto URI scheme October 2005 RFC 3491, March 2003. [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, January 2005. [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource Identifiers (IRIs)", RFC 3987, January 2005. [RFC4234] Crocker, D. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", RFC 4234, October 2005. [STD63] Yergeau, F., "UTF-8, a transformation format of ISO 10646", STD 63, RFC 3629, November 2003. 12.2. Informative References [RFC2324] Masinter, L., "Hyper Text Coffee Pot Control Protocol (HTCPCP/1.0)", RFC 2324, April 1998. [RFC2368] Hoffman, P., Masinter, L., and J. Zawinski, "The mailto URL scheme", RFC 2368, July 1998. Duerst, et al. Expires April 27, 2006 [Page 13] Internet-Draft The mailto URI scheme October 2005 Authors' Addresses Martin Duerst (Note: Please write "Duerst" with u-umlaut wherever possible, for example as "Dürst" in XML and HTML.) Aoyama Gakuin University 5-10-1 Fuchinobe Sagamihara, Kanagawa 229-8558 Japan Phone: +81 466 49 1170 Fax: +81 466 49 1171 Email: mailto:duerst@it.aoyama.ac.jp URI: http://www.sw.it.aoyama.ac.jp/D%C3%BCrst/ Larry Masinter Adobe Systems Incorporated 345 Park Ave San Jose, CA 95110 USA Phone: +1-408-536-3024 Email: LMM@acm.org URI: http://larry.masinter.net/ Jamie Zawinski DNA Lounge 375 Eleventh Street San Francisco, CA 94103 USA Email: jwz@jwz.org Duerst, et al. Expires April 27, 2006 [Page 14] Internet-Draft The mailto URI scheme October 2005 Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Copyright Statement Copyright (C) The Internet Society (2005). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. Duerst, et al. Expires April 27, 2006 [Page 15]