IETF N. Tomkinson Internet-Draft N. Borenstein Intended status: Standards Track Mimecast Ltd Expires: May 14, 2015 November 10, 2014 Multiple Language Content Type draft-tomkinson-slim-multilangcontent-00 Abstract This document defines an addition to the Multipurpose Internet Mail Extensions (MIME) standard to make it possible to send one message that contains multiple language versions of the same information. The translations would be identified by a language code and selected by the email client based on a user's language settings or locale. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on May 14, 2015. Copyright Notice Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Tomkinson & Borenstein Expires May 14, 2015 [Page 1] Internet-Draft Multiple Language Content Type November 2014 1. Introduction Since the invention of email and the rapid spread of the internet, more and more people have been able to communicate in more and more countries and in more and more languages. But during this time of technological evolution, email has remained a single language communication tool, whether it is English to English, Spanish to Spanish or Japanese to Japanese. Also during this time, many corporations have established their offices in multi-cultural cities and formed departments and teams that span continents, cultures and languages so the need to communicate efficiently with little margin for miscommunication has grown exponentially. The objective of this document is to define an addition to the Multipurpose Internet Mail Extensions (MIME) standard, to make it possible to send a single message to a group of people in such a way that all of the recipients can read the email in their preferred language. The methods of translation of the message content are beyond the scope of this document, but the structure of the email itself is defined herein. Whilst this document depends on identification of language in message parts for non-real-time communication, there is a companion document that is concerned with a similar problem for real-time communication: [I-D.gellens-slim-negotiating-human-language] 1.1. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 2. The Content-Type Header Field When there is a requirement to send a message in a number of different languages and the translations are to be embedded in the same message, the multipart subtype "multipart/multilingual" SHOULD be used to help the receiving email client make sense of the message structure. The suggested multipart subtype "multipart/multilingual" has similar semantics to "multipart/alternative" (as discussed in RFC 2046 [RFC2046]) in that each of the message parts is an alternative version of the same information. The primary difference between "multipart/multilingual" and "multipart/alternative" is that when using "multipart/multilingual", the message part to select for Tomkinson & Borenstein Expires May 14, 2015 [Page 2] Internet-Draft Multiple Language Content Type November 2014 rendering is chosen based on the value of the Content-Language header field instead of the ordering of the parts and the Content-Types. The syntax for this multipart subtype conforms to the common syntax for subtypes of multipart given in section 5.1.1. of RFC 2046 [RFC2046]. An example "multipart/multilingual" Content-Type header field would look like this: Content-type: multipart/multilingual; boundary=01189998819991197253 3. The Message Parts A multipart/multilingual message will have a number of message parts: exactly one multilingual preface, one or more language message parts and zero or one unmatched message part. The details of these are described below. 3.1. The Multilingual Preface In order for the message to be received and displayed in non- conforming email clients, the message SHOULD contain an explanatory message part which MUST-NOT be marked with a Content-Language field and MUST be the first of the message parts. Because non-conforming email clients are expected to treat the message as multipart/mixed (in accordance with sections 5.1.3 and 5.1.7 of RFC 2046 [RFC2046]) they may show all of the message parts sequentially or as attachments. Including and showing this explanatory part will help the message recipient understand the message structure. This initial message part SHOULD explain briefly to the message recipient that the message contains multiple languages and the parts may be rendered sequentially or as attachments. This SHOULD be presented in the same languages that are provided in the subsequent language message parts. Whilst this section of the message is useful for backward compatibility, it SHOULD only be shown when rendered by a non- conforming email client because conforming email clients SHOULD only show the single language message part identified by the user's preferred language (or locale) and the language message part's Content-Language. For an example of a Multilingual Preface, see the examples in Section 7. Tomkinson & Borenstein Expires May 14, 2015 [Page 3] Internet-Draft Multiple Language Content Type November 2014 3.2. The Language Message Parts The language message parts are translations of the same message content. These message parts MAY be ordered so that the first part after the multilingual preface is in the language believed to be the most likely to be recognised by the recipient. All of the language message parts MUST have a Content-Language field and a Content-Type field and SHOULD have a Subject field. The Content-Type for each individual language part MAY be any MIME type (including multipart subtypes such as multipart/alternative). However, it is recommended that the Content-Type of the language parts is kept as simple as possible for interoperability with existing email clients. The language parts are not required to have matching Content-Types or multipart structures. For example, there might be an English part of type "text/html" followed by a Spanish part of type "application/pdf" followed by a Chinese part of type "image/jpeg". Whatever the content-type, the contents SHOULD be composed for optimal viewing in the specified language. 3.3. The Unmatched Message Part If there is content intended for the recipient to see if they have a preferred language other than one of those specified in the language parts, another part MAY be provided. This would be useful when a language independent graphic is available. When this unmatched part is present, it MUST be the last part, MUST NOT have a Content- Language field and SHOULD-NOT have a Subject field. 4. Message Part Selection The logic for selecting the message part to render and present to the recipient is quite straightforward and is summarised in the next few paragraphs. Firstly, if the email client does not understand multipart/ multilingual then it SHOULD treat the message as if it was multipart/ mixed and render message parts accordingly. If the email client does understand multipart/multilingual then it SHOULD ignore the multilingual preface and select the best match for the user's preferred language from the language message parts available. This may be implemented in a variety of ways and is dependent on how the email client manages its preferred language data. The ultimate goal is to render the most appropriate translation for the user. Similarly, the subject should be chosen from the matched language message part. Tomkinson & Borenstein Expires May 14, 2015 [Page 4] Internet-Draft Multiple Language Content Type November 2014 If there is no match for the user's preferred language (or there is no preferred language information available) the email client SHOULD select the unmatched part (if one exists) or the first language part (directly after the multilingual preface) if an unmatched part does not exist. The Subject header field value should be used whenever a suitable translation cannot be identified. Additionally, interactive implementations MAY offer the user a choice from among the available languages. 5. The Content-Language Field The Content-Language field in the individual language message parts is used to identify the language in which the message part is written. Based on the value of this field, a conforming email client can determine which message part to display (given the user's language settings or locale). The Content-Language MUST comply with RFC 3282 [RFC3282] (which defines the Content-Language field) and BCP 47/RFC 5646 [RFC5646] (which defines the structure and semantics for the language code values). While RFC 5646 provides a mechanism accommodating increasingly fine-grained distinctions, in the interest of maximum interoperability, each Content-Language value SHOULD be restricted to the largest granularity of language tags; in other words, it is RECOMMENDED to specify only a Primary-subtag and NOT to include subtags (e.g., for region or dialect) unless the languages might be mutually incomprehensible without them. Examples of this field for English, German and an instruction manual in Spanish and French, could look like the following: Content-Language: en Content-Language: de Content-Language: es, fr 6. The Subject Field in the Language Message parts On receipt of the message, conforming email clients will need to render the subject in the correct language for the recipient. To enable this the Subject field SHOULD be provided in each language message part. The value for this field should be a translation of the email subject. US-ASCII and 'encoded-word' examples of this field may look like this: Tomkinson & Borenstein Expires May 14, 2015 [Page 5] Internet-Draft Multiple Language Content Type November 2014 Subject: A really simple email subject Subject: =?iso-8859-1?Q?un_asunto_de_correo_electr=F3nico_sencillo?= See RFC 2047 [RFC2047] for the specification of 'encoded-word'. 7. Examples 7.1. An Example of a Simple Multiple language email message Below is an example of a simple multiple language email message formatted using the method detailed in this document. Tomkinson & Borenstein Expires May 14, 2015 [Page 6] Internet-Draft Multiple Language Content Type November 2014 From: Nik To: Nathaniel Subject: example of a message in Spanish and English Content-type: multipart/multilingual; boundary=01189998819991197253 --01189998819991197253 This is a message in two languages: English and Spanish. It says the same thing in each language. If you can read it in one language, you can ignore the other translations. The other translations may be presented as attachments or grouped together. Este es un mensaje en dos idiomas: Ingles y Espanol. Dice lo mismo en cada idioma. Si puede leerlo en un idioma, puede ignorar las otras traducciones. Las otras traducciones pueden presentes como archivos adjuntos o agrupados. --01189998819991197253 Content-Language: en Content-Type: text/plain Subject: example of a message in Spanish and English Hello, this message content is provided in your language. --01189998819991197253 Content-Language: es Content-Type: text/plain Subject: =?iso-8859-1?Q?ejemplo_pr=E1ctico_de_mensaje_ en_espa=F1ol_e_ingl=E9s?= Hola, el contenido de este mensaje esta disponible en su idioma. --01189998819991197253 Content-Type: image/gif ..GIF image showing iconic or language-independent content here.. --01189998819991197253-- 7.2. An Example of a Complex Multiple language email message Below is an example of a more complex multiple language email message formatted using the method detailed in this document. Note that the language message parts have multipart contents and would therefore require further processing to determine the content to display. Tomkinson & Borenstein Expires May 14, 2015 [Page 7] Internet-Draft Multiple Language Content Type November 2014 From: Nik To: Nathaniel Subject: example of a message in Spanish and English Content-type: multipart/multilingual; boundary=01189998819991197253 --01189998819991197253 This is a message in two languages: English and Spanish. It says the same thing in each language. If you can read it in one language, you can ignore the other translations. The other translations may be presented as attachments or grouped together. Este es un mensaje en dos idiomas: Ingles y Espanol. Dice lo mismo en cada idioma. Si puede leerlo en un idioma, puede ignorar las otras traducciones. Las otras traducciones pueden presentes como archivos adjuntos o agrupados. --01189998819991197253 Content-Language: en Content-Type: multipart/alternative; boundary=multipartaltboundary Subject: example of a message in Spanish and English --multipartaltboundary Content-Type: text/plain Hello, this message content is provided in your language. --multipartaltboundary Content-Type: text/html

Hello, this message content is provided in your language.

--multipartaltboundary-- --01189998819991197253 Content-Language: es Content-Type: multipart/mixed; boundary=multipartmixboundary Subject: =?iso-8859-1?Q?ejemplo_pr=E1ctico_de_mensaje_ en_espa=F1ol_e_ingl=E9s?= --multipartmixboundary Content-Type:application/pdf ..PDF file in Spanish here.. --multipartmixboundary Content-Type:image/jpeg Tomkinson & Borenstein Expires May 14, 2015 [Page 8] Internet-Draft Multiple Language Content Type November 2014 ..JPEG image showing Spanish content here.. --multipartmixboundary-- --01189998819991197253 Content-Type: image/gif ..GIF image showing iconic or language-independent content here.. --01189998819991197253-- 8. Changes from Previous Versions 8.1. Changes from draft-tomkinson-multilangcontent-01 to draft- tomkinson-slim-multilangcontent-00 o File name and version number changed to reflect the proposed WG name SLIM (Selection of Language for Internet Media). o Replaced the Subject-Translation field in the language message parts with Subject and provided US-ASCII and non-US-ASCII examples. o Introduced the language-independent unmatched message part. o Many wording improvements and clarifications throughout the document. 9. Acknowledgements The authors are grateful for the helpful input received from many people but would especially like to acknowledge the help of Harald Alvestrand, Mark Davis, Doug Ewell, Randall Gellens, Alexey Melnikov, Fiona Tomkinson, Simon Tyler and Daniel Vargha. The authors would also like to thank Luis de Pablo for his work on the Spanish translations. 10. IANA Considerations The multipart/multilingual MIME type will be registered with IANA. 11. Security Considerations This document has no additional security considerations beyond those that apply to the standards and procedures on which it is built. Tomkinson & Borenstein Expires May 14, 2015 [Page 9] Internet-Draft Multiple Language Content Type November 2014 12. References 12.1. Normative References [RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types", RFC 2046, November 1996. [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail Extensions) Part Three: Message Header Extensions for Non-ASCII Text", RFC 2047, November 1996. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3282] Alvestrand, H., "Content Language Headers", RFC 3282, May 2002. [RFC5646] Phillips, A. and M. Davis, "Tags for Identifying Languages", BCP 47, RFC 5646, September 2009. 12.2. Informational References [I-D.gellens-slim-negotiating-human-language] Randy, R., "Negotiating Human Language in Real-Time Communications", draft-gellens-slim-negotiating-human- language-00 (work in progress), October 2014. Authors' Addresses Nik Tomkinson Mimecast Ltd CityPoint, One Ropemaker Street London EC2Y 9AW United Kingdom Email: rfc.nik.tomkinson@gmail.com Nathaniel Borenstein Mimecast Ltd 480 Pleasant Street Watertown MA 02472 North America Email: nsb@mimecast.com Tomkinson & Borenstein Expires May 14, 2015 [Page 10]