Internet Draft A. Kristensen HP Labs Expires in six months 18 November 1998 XML Encoded Form Values Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." To view the entire list of current Internet-Drafts, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net (Europe), munnari.oz.au (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). Abstract This document proposes an XML encoding for sets of named values. The primary application is as a transmission format for form values being submitted to a processing agent over the Web. The main advantage over other form value encodings is that it allows field names to be associated with structured values without resorting to non-XML encodings. The multipart/related MIME type is used for carrying non- XML media. 1. Introduction Online forms is an important mechanism for performing user interaction on the Web and in other applications. A form consists of a number of fields each of which are identified by a name and which can have a value. This document defines a language based on XML for the encoding of form values [XML]. Such an encoding is necessary when form values are to be saved to a file or to be transmitted over a network to a processing agent such as a Web server. Using XML for encoding form values means that XML machinery can readily be reused to allow structured and typed fields values, and to Kristensen [Page 1] Internet Draft XML Encoded Form Values November 1998 allow hyperlinking to values and from within values. These benefits can to a high degree be achieved with judicial use of MIME [RFC-2388, MIME] but as XML increasingly looks set to become the transfer encoding and even native data format of choice for many media types on the Internet it is potentially simpler and more elegant to stay within XML for functions such as form processing as well. There is nothing form- or Web-specific about the proposal. It can be used with a wide variety of applications and transport protocols. It consists of a simple XML language for encoding name-value pairs and a specification for how such data sets are carried over MIME-like transport protocols. The multipart/related MIME type is used for non-XML encoded data. 2. Name-Value Pair XML Document Type Definition A form value is a set of named values. In the following XML DTD the form value is represented by a "map" element which consists of a set of named "item" elements, each of which represents a single form field: The name of the form field is given by the mandatory "name" attribute of item elements. The value of a form field is either available "inline" as the contents of the corresponding item element or, if the "href" attribute is defined, "out-of-line" as the contents of the resource identified by that URI. The resource may be transported as part of the same data unit, e.g. as a separate MIME bodypart, or may be remote. Map items are considered unordered and there is no requirement that they have unique names. Data items which are themselves XML elements can be typed by associating them with an XML namespace [XML-NS]. Within this document the "map" element itself is assumed to have the XML namespace identifier "http://www.ietf.org/XML/NS/map". User agents are RECOMMENDED to provide this namespace identifier along with form values. As an alternative to XML namespace typing a "type" attribute can be given to map items. This is used to give the MIME type of the corresponding data field when no namespace identifier is available or Kristensen [Page 2] Internet Draft XML Encoded Form Values November 1998 when the resource has a known MIME type. It allows a processing agent not to attempt handling the specified item if it knows it cannot handle the MIME type. The following example could be the XML encoding of a data set resulting from a form asking for name and contact information: Joe Bloggs +1 22 333 4444 bloggs@example.com As XML was designed with internationalization in mind field names and values can contain any characters whatsoever, although they may need to be escaped according to the XML specification. For example, had the value of the email field been "Joe Bloggs " it could be represented as: Joe Bloggs <bloggs@example.com> User and processing agents MUST encode and parse "map" data sets according to the XML specification. 3. Structured Values Field values are not limited to being simple text strings but can be any arbitrarily complex XML structure. The following data set contains an XML encoded digital business card [vCard-XML] together with other data items, all of which possibly originate from an online form: Joe Bloggs BloggsJoe +1-22-333-4444 bloggs@example.com yes Software Development From a friend. Kristensen [Page 3] Internet Draft XML Encoded Form Values November 1998 When a map associates a name with a value which is itself an XML element this element MAY be typed by giving it an XML namespace attribute. The vCard namespace identifier used in this example is fictional - none is currently specified. Namespaces are more suitable as a typing mechanism than DTDs of formal public identifiers as they were designed to apply to individual elements of a larger XML document, not necessarily to the document as a whole. 4. Use of MIME When transported over HTTP, Internet email, or other MIME-like transports, XML "map" entities are carried with a media type of either "text/xml" or "application/xml" as defined in [RFC-2376]. Two Internet media types were defined because the definition of the "text" top-level MIME type is such that it doesn't allow XML documents with any character encoding, whereas the "application" top-level type does. The following is an example of a form being submitted using HTTP: POST /cgi-bin/order-pizza HTTP/1.0 Content-Type: text/xml; charset="utf-8" deep-pan vegetarian 4.1 XML type information in MIME Content-Type header It has been proposed to add optional XML typing information to the text/xml and application/xml media types, e.g. in the form of a namespace parameter. The example above might then become POST /cgi-bin/order-pizza HTTP/1.0 Content-Type: text/xml; charset="utf-8"; ns="http://www.ietf.org/XML/NS/map" deep-pan vegetarian Being able to put XML typing information, be it a DTD or namespace URL or a formal public identifier, in the MIME header allows processing agents only to look at the header to decide how to handle a particular bodypart. Kristensen [Page 4] Internet Draft XML Encoded Form Values November 1998 4.2 Including Non-XML Data in XML Maps There are situations in which it is useful to be able to include non-XML encoded data into a form data set, e.g. as the result of submitting a file. If the data is textual in nature it may be convenient simply to escape reserved characters or to use a CDATA section, as in Now is the time for all good men...]]> However it is usually more desirable to treat non-XML data as separate MIME entities so that Content-* descriptors other than Content-Type can be associated with the resource. For this reason an XML map can be represented as the root bodypart of a multipart/ related aggregate MIME object with other body parts being referenced from the root using URLs [MULREL]. This is useful both for transporting non-XML map items and for non- XML objects which are logically part of map items and which are referenced from within them. In the following example an XML encoded document is submitted as the value of a form along with a contained image and a non-XML encoded vCard: POST /cgi-bin/submit HTTP/1.0 Content-Type: multipart/related; boundary="example-mulrel"; type="text/xml" --example-mulrel Content-Type: text/xml; charset="utf-8" Life, the Universe, and Everything Life, the Universe, and Everything ... ... ... Kristensen [Page 5] Internet Draft XML Encoded Form Values November 1998 --example-mulrel Content-Type: image/gif Content-Transfer-Encoding: base64 Content-Location: image1.gif ASggFXEAqaMd/JAJAcARcsEPN5AJmRAf/uAPN5AX uzIc/kAU/MARHpEJyZEd/MAPxNIPAdAPA4AGIOMP ... --example-mulrel Content-Type: text/x-vCard Content-ID: <3571.19981117.162336@hplb.hp.com> BEGIN:vCard VERSION:3.0 FN:Joe Bloggs N:Bloggs;Joe EMAIL;TYPE=INTERNET:bloggs@example.com TEL;TYPE=VOICE:+1-22-333-4444 END:vCard --example-mulrel-- This is very much like the way in which MIME is used to encapsulate HTML documents for the purpose of transporting them in email messages [MHTML]. In this example there are three bodyparts. The first is the root and represents the form value. There are three fields. The "title" field is a simple text string and is inlined as the content of the "item" element. The value of the "vcard" field is transported in its own bodypart as a "text/x-vCard" object. This bodypart is identified by the value of its "Content-ID" header [MIME] and is referenced from the form value using a "cid" URL [CID]. The value of the third form field, named "doc", is a complicated, inlined structure (presumably representing a marked-up document). A descendant element of this value has a URL reference to an image. The image is transported in a separate bodypart and is identified by a "Content-Location" header and referenced using a relative URL. This may be preferred in some cases over using Content-ID's so as to avoid having to rewrite references embedded in existing resources. A processing agent may not be able to recognize the contained "image" element as a reference to a bodypart but it can still treat it semi- intelligently, e.g. pass it to some other application or save it to a file. Kristensen [Page 6] Internet Draft XML Encoded Form Values November 1998 5. User Agent Issues Having a single well-specified mechanism for associating structured values with form fields is clearly desirable. However, it is only useful if user-agents are capable of generating such structured values. HTML doesn't currently define a mechanism for doing this but one could easily imagine defining or extending an existing HTML grouping mechanism so as to make forms generate structured form values. Doing so is beyond the scope of this specification, though. HTML does allow user-agents to submit files, though, using the multipart/form-data MIME type defined in [RFC-2388]. When using the map XML form encoding, a user-agent may use whatever knowledge is available to it from the underlying platform to determine whether a file is XML encoded, and if so may choose to inline the file as the contents of the field value. Form applications other than HTML browsers may use data sources other than GUI input fields and files, and so may have other means for constructing and submitting structured XML field values. The encodings discussed in this document are not in any way dependent on features of current Web browsers and should thus be readily usable for such applications. As an example, some browsers are configurable with business card information and so could easily generate XML encoded vCards if a typing mechanism for HTML form fields existed. 6. Security Considerations This proposal is believed not to introduce security considerations not already present with the multipart/form-data encoding of form data sets [RFC-2388] and MHTML [MHTML]. 7. References [CID] E. Levinson, "Content-ID and Message-ID Uniform Resource Locators", RFC 2392, August 1998. [HTML40] D. Raggett, A. Le Hors, and I. Jacobs, "HTML 4.0 Specification", World Wide Web Consortium Recommendation, 24 April 1998, http://www.w3.org/TR/REC-html40. [MHTML] J. Palme and A. Hopmann, "MIME E-mail Encapsulation of Aggregate Documents, such as HTML (MHTML)", RFC 2110, March 1997. [MIME] N. Borenstein and N. Freed, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", RFC 2045, November 1996. Kristensen [Page 7] Internet Draft XML Encoded Form Values November 1998 [MULREL] E. Levinson, "The MIME Multipart/Related Content-type", RFC 2387, August 1998. [XML] T. Bray, J. Paoli, and C. M. Sperberg-McQueen, "Extensible Markup Language (XML) 1.0", World Wide Web Consortium Recommendation, 10 February 1998, http://www.w3.org/TR/REC-xml. [XML-NS] T. Bray, D. Hollander, and A. Layman, "Namespaces in XML", World Wide Web Consortium Working Draft, 16 September 1998, http://www.w3.org/TR/WD-xml-names. [RFC-2376] E. Whitehead and M. Murata, "XML Media Types", RFC 2376, July 1998. [RFC-2388] L. Masinter, "Returning Values from Forms: multipart/ form-data", RFC 2388, August 1998. [vCard-XML] F. Dawson and P. Hoffman, "The vCard v3.0 XML DTD", Internet Draft, http://www.internic.net/internet-drafts/ draft-dawson-vcard-xml-dtd-01.txt, October 1998. 8. Author's Address Anders Kristensen Hewlett-Packard Laboratories Filton Road, Stoke Gifford Bristol BS34 8QZ United Kingdom E-mail: ak@hplb.hpl.hp.com Kristensen [Page 8]