INTERNET-DRAFT Donald E. Eastlake 3rd Motorola Expires July 2001 January 2001 Mapping Between Content-Types and URIs ------- ------- ------- ----- --- ---- Donald E. Eastlake 3rd Status of This Document Distribution of this document is unlimited. Comments should be sent to the author. This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC 2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract Multipurpose Internet Mail Extension (MIME) Content-Type headers and Uniform Resource Identifiers (URIs) are both being used, in different contexts, to label entities. A mapping is specified such that the union of their meaning can be expressed in either syntax. D. Eastlake 3rd [Page 1] INTERNET-DRAFT Mapping Between Content-Types & URIs Table of Contents Status of This Document....................................1 Abstract...................................................1 Table of Contents..........................................2 1. Introduction............................................3 1.1 Introduction to URIs and MIME Type/Content-Type........3 1.2 Definitions and Conventions............................3 1.3 Additional Features....................................4 1.4 Overview of Remaining Sections.........................4 2. Simple Mapping..........................................5 2.1 Simple Mapping of Content-Type to URI..................5 2.1.1 The Basic Case.......................................5 2.1.2 More Complete Rules..................................6 2.2. Simple Mapping of URI to Content-Type, The Basic Case.6 2.3 Content-Type Mapping Special Case for Basic Closure....7 2.4 URI Mapping Special Case for Basic Closure.............8 3. Controlled Mapping......................................9 4. Troublesome Characters.................................10 5. IANA Considerations and Potential Conflicts............10 6. Security Considerations................................11 References................................................12 Author's Address..........................................13 Expiration and File Name..................................13 D. Eastlake 3rd [Page 2] INTERNET-DRAFT Mapping Between Content-Types & URIs 1. Introduction Both MIME types and URIs have come to be used for type labeling and similar information. In most protocols where there are provisions for a general "type label", the label is restricted to the syntax of a URI or the syntax of a Content-Type. In some cases, it will be useful to be able to express labels of the "other" syntax. That is, it may be useful in a URI syntax slot to also be able to express a MIME type or Content- Type and, conversely, it may be useful in a Content-Type syntax slot to also be able to express a URI. This document specifies how. 1.1 Introduction to URIs and MIME Type/Content-Type The IETF Multipurpose Internet Mail Extensions (MIME) message body standards have developed into a general tagging and bagging mechanism. This mechanism has spread from SMTP mail to USENET, HTTP, and other protocols. In MIME, the type of an object is given in a "Content-Type" header line. [RFC 2045, 2046, 2048] Such a line consists of a MIME type and optionally additional parameters. A MIME type consists of a MIME top level type, a slash, and a MIME subtype. The original Uniform Resource Locator (URL [RFC 1738]), used to point to World Wide Web (WWW) resources, has grown into the more general Uniform Resource Identifier (URI [RFC 2396]). Increasingly URIs are used as general labels for algorithms, XML namespaces [XML NAME], web based protocol data types, etc. (In some of these label uses, URIs are considered opaque while in other cases they are assumed to reference something which explicates their meaning.) 1.2 Definitions and Conventions Concerning URIs, please note the following: (1) In this document, the term URI is used to include URI Reference. That is, it includes the case where an octothorpe ("#") followed by a fragment identifier is suffixed to a pure URI. (2) Only absolute URIs are mappable. Relative URIs, with just a hierarchial part, are not included in URI as used in this document. They must first be converted to absolute URIs as described in [RFC 2396]. (3) For presentation purposes, URIs are shown inside angle D. Eastlake 3rd [Page 3] INTERNET-DRAFT Mapping Between Content-Types & URIs brackets ("<...>") but these angle brackets are not actually a part of the URI. Concerning Content-Types, please note the following: Content-Type values are shown preceeded by "Content-Type: " and, when long, they are ling folded as per [RFC 822]. This prefix and line folder are for presentation purposes and are not actually a part of the Content-Type. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119]. 1.3 Additional Features Note that a URI or Content-Type could get converted back and forth multiple times between these two syntaxes. To stop this from resulting in ever longer and more complex tags, a check is specified so that if a coversion is of a previously converted syntax, the prevous conversion is reversed, in so far as practical. To improve the repeatability of the results from single or multiple steps of syntax conversion, capitalization and puctuation recommendations are made where tokens are case insensitive or variable punctuation is allowed. Finally, in cases where the default conversion does not provide for sufficient control, optional elements are defined for inclusion in URIs and Content-Types that provide substantial control conver the mapping output. 1.4 Overview of Remaining Sections Sections 2 and 3 below give an explanation of the mapping sepcified more or less in Engligh. The material is organized to start with the simplest and most common rules and then add exceptions for special cases and additional user control. Section 4 lists characters that must be URI ("%") encoded when mapping from a URI to a Content-Type. Section 5 covers IANA Considerations and potential conflicts. Section 6 give Security Considerations. D. Eastlake 3rd [Page 4] INTERNET-DRAFT Mapping Between Content-Types & URIs 2. Simple Mapping This section describes simple mappings such that any MIME Type or Content-Type can be mapped into a URI and any URI can be mapped into a Content-Type. Other than checks for mutiple conversions, the mapping is simple. It can produce only a special scheme URI for the mapping of a Content-Type and only a special sub-type tree in the "application" top level type for the mapping of a URI. Section 3 below describes additional features optionally allowing much greater control over the result of the mapping. 2.1 Simple Mapping of Content-Type to URI Section 2.1.1 below describes the most basic case of converting a simple MIME type to a URI. Section 2.1.2 extends this to converting a general Content-Type to a URI. Section 2.3 adds the check necessary to recognize where the MIME type being coverted is of the form indicating it was previous converted from a URI using basic mapping and is being converted back. 2.1.1 The Basic Case For the simplest case of a Content-Type consisting of just a MIME type, create a URI with scheme "ContentType" and a scheme dependent part consisting of the MIME type. For example Content-Type: image/JPEG simply converts to White space is not allowed in URIs so it must be removed. Scheme names (the part before the first ":" in a URI) are case insensitive but for readability and repeatability, the capitalization "ContentType" SHOULD be used. Similarly, MIME top level types and subtypes (the fields before and after the "/" in a MIME type field, respectively) are case insensitive but SHOULD be all lower cased when mapped to the URI form. Note: There is no "//" after the "ContentType:" scheme as used herein. Such a "//" would imply a specific structuring of the scheme dependent part appearing in the URI after the "ContentType:" as defined in [RFC 2396]. Since that full structuring is not used, "//" is not used. The meaning of URIs starting with "ContentType://" is reserved for future definition. D. Eastlake 3rd [Page 5] INTERNET-DRAFT Mapping Between Content-Types & URIs Note: "Content-Type", with hyphen, is syntactically allowed as a scheme name. However, [RFC 2717] reserves embedded hyphens in scheme names to indicate the prefix of an alternate tree of scheme names. Therefore, the un-hyphenated ContentType is used. 2.1.2 More Complete Rules A Content-Type header frequently includes more than just the mandatory MIME type. It can also have type dependent parameters, including private parameters, such as Content-Type: text/plain; charset="us-ascii"; x-mac-type="54455854"; x-mac-creator="4D4F5353" Content-Type: image/tiff; application=faxbw Content-Type parameters are mapped into a "query portion" suffix of the URI in much the same way that HTML form fields [HTML] are. That is, they are concatenated to the MIME type after a "?" and, if there is more than one parameter, separated by "&". Thus the above Content-Types would be mapped into the following URIs: Parameter values in the mapped URI MUST always be enclosed in double quotes ('"'). If the Content-Type has a trailing ";" but no parameters, then "?" SHOULD NOT be added to the URI. 2.2. Simple Mapping of URI to Content-Type, The Basic Case This section describes the basic case of mapping a URI to a Content- Type. Section 2.4 adds the check to see if the URI appears to be the result of a previous converion from a Content-Type and if so undoes that conversion in so far as practical. In the basic case, a URI maps to a Content-Type with a top level MIME type of "application" a MIME sub-type in the "uri." tree. In addition, any "query" parameters in the URI are mapped to Content- Type parameters and if the URI ends with a fragment identifier, it is mapped to the special Content-Type parameter "URI-Fragment". Any special characters in the URI that might be troublesome (see section 4) are encoded by replacing them with a "%" followed by two hex digits for the character code. D. Eastlake 3rd [Page 6] INTERNET-DRAFT Mapping Between Content-Types & URIs Note: Current URI syntax permits scheme dependent parts in which "?" does not indicate a query section; however, no such syntaxes have been publicly defined. Some examples of the basic case follow: convert to Content-Type: application/uri.http%3A%2F%2Fexample.com%2Ftag42 Content-Type: application/uri.mailto%3Aexample.net; subject="misc"; body="line1%250D%250Aline2" Content-Type: application/uri.xyz%3A%2F%2Fabc.text%2Fdef; h="ijk"; URI-Fragment="lmn" Content-Type parameters values extracted from the query portion of a URI MUST be surrounded with double quotes ('"'). When URI encoding, if the hex value has any letters (a-f) in it, they SHOULD be upper cased. 2.3 Content-Type Mapping Special Case for Basic Closure A URI may have been converted back to a Content-Type and get converted back. To stop this from resulting in an ever more complex syntax, a check MUST be made to see if the MIME subtype of a Content-Type being converted is in the "uri." subtype tree (see section 2.2 above). If so, the URI is computed from the subtype by stripping the "uri." prefix and performing one level of undoing URI encoding. (Note: The top level MIME type is ignored in this case.) In addition, Content-Type parameters, if any, are added as a "query portion" and a "URI-Fragment" parameter is added as a fragment. For example: Content-Type: application/uri.mailto%3Auser%40host.example Content-Type: application/uri.http%3A%2F%2Fx.test; foo="123", bar="abcd" D. Eastlake 3rd [Page 7] INTERNET-DRAFT Mapping Between Content-Types & URIs Content-Type: application/uri.http%3A%2F%2Fa%3Ab%40c.text%2Fx%2Fy; URI-Fragment="z" convert to Note: If a Content-Type or MIME Type is being written by a user and they know that there is a URI which is a more natural expression of the labeling desired, they can simply use an "application/uri." MIME Type to start with. 2.4 URI Mapping Special Case for Basic Closure It is desireable that an arbitrary Content-Type be recovered semanticly intact when mapped to a URI and then that URI is mapped back to a Content-Type. To achieve this, the following special case is added to the simple case described in section 2.2 above. If the URI scheme is "ContentType:", then the Content-Type is computed from the remaining part of the URI (the scheme specific part), by replacing the first question mark ("?") and all query section ampersands ("&") with semi-colon space ("; "), and then undoing one level of URI encoding, i.e., replacing percent sign ("%") followed by two hex digits with the character having that hex value. For example map to Content-Type: model/vnd.example.longish.subtype.name Content-Type: text/plain; charset="US-ASCII"; x-obscure="value" Note: A URI produced by simple mapping from a normal Content-Type will never have a fragment suffix. Note: If a URI is being written by a user and they know that there is a Content-Type which is a more natural expression of the labeling D. Eastlake 3rd [Page 8] INTERNET-DRAFT Mapping Between Content-Types & URIs desired, they can simply use a "ContentType:" scheme to start with. 3. Controlled Mapping [Is this controlled mapping stuff below too complex? Would it be better to just have sections 2 and 3 above and drop controlled conversion?] As an additional feature, there may be cases where a URI is designed knowing that it might be converted to a Content-Type and it is desired to control the MIME type so that it would have a more appropriate top level than "application" or a more appropriate subtype than one in the "uri." tree. To accomplish this, a special URI query part parameter "MIME-Type" is defined. If a URI is not of scheme ContentType and this special parameter is found, then the MIME type is set to the parameter value and the URI body (all of the URI except "query" parameters and any fragment identifier) is preseved in a "URI-body" Content-Type parameter. Similarly, there may be cases where a Content-Type is designed knowing that it might be converted to a URI and it is desired to control the URI scheme and non-query scheme dependent parts so that it is not necessary to have a scheme of "ContentType:" or scheme dependent part calculated as indicated in section 2.1. To accomplish this, a special Content-Type parameter "URI-body" is defined. If a Content-Type does not have a MIME subtype in the "uri." tree and this parameter is present, it controls the non-query portion of the URI mapped to and the original MIME type is preserved in a URI query parameter called "MIME-Type". For example Content-Type: application/xml; URI-Body="http://xml.example" would map to and would map to Content-Type: message/rfc822; URI-Body="mailto:joe@blow.text"; URI-Fragment="123" D. Eastlake 3rd [Page 9] INTERNET-DRAFT Mapping Between Content-Types & URIs 4. Troublesome Characters Troublesome characters are defined as those not permitted in a token in [RFC 2045] with the addition of percent sign and the deletion of double quote. That is, any character code from 0 through 32 inclusive or charcter code 127 or any of "(", ")", "<", ">", "@", ",", ";", ":", "\", "/", "[", "]", "?", "%", or "=" are troublesome characters. 5. IANA Considerations and Potential Conflicts This document allocates and specifies the following: (1) The "ContentType" URI scheme. (2) The "uri." MIME subtype tree. Since this subtree is totally delegated to the URI specification, there are no independent publication or review requirements for it. Any valid URI can be used after the "uri." in any MIME top level type, after troublesome characters (see section 4) in the URI are % escaped. (3) In the context of automatic URI to Content-Type type conversion, a meaning is specified for the "MIME-Type" URI query section parameter. (4) In the context of automatic Content-Type to URI conversion, a meaning is specified for the "URI-Body" and "URI-Fragment" Content-Type parameters. Because this document specifies the "ContentType" URI scheme and the "uri." MIME subtype tree, no conflict can arise due to other uses of them. However, there is no precident for the specifiction of Content-Type parameters valid across all MIME types, such as URI-Body and URI- Fragment, and in fact [RFC 2046] denies their possibility. Nor is there any precident for the specification of a universal URI query parameter such as MIME-Type. The probability that any different use is currently being made or will in the future have to be made of these seems low enough that it can be ignored. It is possible that some processing systems are sensitive to the presence of parameters they do not understand and will indicate errors when presented with controlled mapping URIs or Content-Types. However, Content-Type parameters and URI query parameters are usually handled on receipt by such mechanisms as storing the name-value pair in an associative array or as "environment variables" and ignorning extra parameters. In fact, Content-Type processors are required by [RFC 2046] to ignore any parameters they do not understand and to ignore parameter order. D. Eastlake 3rd [Page 10] INTERNET-DRAFT Mapping Between Content-Types & URIs 6. Security Considerations In some sense, the security considerations for MIME and content types [RFC 2046], URIs [RFC 2396], and for every individual MIME type and URI scheme can apply. In addition, the deployment of mapping aware software may enable the introduction into or transmission through MIME or content type contexts of URI semantics, including possibly dangerous action schemes such as "mailto", and the introduction into or tramismission through URI contexts of MIME and content type semantics, including possibly dangerous exeuctable data types or the like. Finally, implementation of controlled mapping may enable a malicious user, by adding one of the special parameters specified herein, to cause a surprising change in the semantics of a URI or Content-Type produced by the mapping from an apparently innocuous Content-Type or URI. D. Eastlake 3rd [Page 11] INTERNET-DRAFT Mapping Between Content-Types & URIs References [HTML] - Dave Raggett, Arnaud Le Hors, Ian Jacobs, "HTML 4.01 Specifcation", , December 1999. [RFC 822] - D. Crocker, "Standard for the format of ARPA Internet text messages", Aug-13-1982. [RFC 1738] - T. Berners-Lee, L. Masinter, M.McCahill, "Uniform Resource Locators (URL)", December 1994. [RFC 2045] - N. Freed & N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", November 1996. [RFC 2046] - N. Freed & N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types", November 1996. [RFC 2048] - N. Freed, J. Klensin & J. Postel, "Multipurpose Internet Mail Extensions (MIME) Part Four: Registration Procedures", November 1996. [RFC 2119] - S. Bradner, "Key words for use in RFCs to Indicate Requirement Levels", March 1997. [RFC 2396] - T. Berners-Lee, R. Fielding, L. Masinter, "Uniform Resource Identifiers (URI): Generic Syntax", August 1998. [RFC 2717] - R. Petke, I. King, "Registration Procedures for URL Scheme Names", November 1999. [RFC 2718] - L. Masinter, H. Alvestrand, D. Zigmond, R. Petke, "Guidelines for new URL Schemes", November 1999. [XML NAME] - Tim Bray, Dave Hollander, Andrew Layman, "Namespaces in XML", , 14 January 1999. D. Eastlake 3rd [Page 12] INTERNET-DRAFT Mapping Between Content-Types & URIs Author's Address Donald E. Eastlake 3rd Motorola 155 Beaver Street Milford, MA 01757 USA Telephone: +1 508-261-5434 (w) +1 508-634-2066 (h) FAX: +1 508-261-4447 (w) EMail: Donald.Eastlake@motorola.com Expiration and File Name This draft expires July 2001. Its file name is draft-eastlake-cturi-01.txt. D. Eastlake 3rd [Page 13]