INTERNET-DRAFT Ari Luotonen Expires: May 8, 1996 Netscape Communications Corporation John Franks Northwestern University November 8, 1995 Byte Ranges With HTTP URLs Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as ``work in progress.'' To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Overview There are a number of Web applications that would benefit from being able to request the server to give a byte range of a document. As an example an Adobe PDF viewer needs to be able to access individual pages by byte range; the table that defines those ranges is located at the end of the PDF file. It may be argued that this should be left as a server-specific feature in the opaque URL, as the "parameters" used in URLs that may be available or useful can vary from server to server. However, there are reasons why standardizing the byte range feature would be beneficial. One of the primary reasons is to be able to support byte ranges in proxy servers. Without a standard proxy servers will have to treat each different byte range of a given document as a separate document. Should the notion of a byte range be standard, not only would it prevent portions of documents to be multiply cached, but it would Luotonen, Franks [Page 1] HTTP URL BYTE RANGES INTERNET-DRAFT November 1995 make it possible for the proxy to generate range responses directly from its cache, and reassemble the entire document from the pieces. This specification defines only the ;bytes= parameter; it shows other possible schemes as examples (lines, chapters), but doesn't define them. Byte range is the single most applicable approach to various document types. This specification is simple enough to be adopted quickly by the server authors/vendors, and be quickly and easily exploited on the client side. Existing proxies don't break this, and once this specification becomes official it will actually be possible to support this in a smart way on proxies. This specification can be applied to document types for which byte ranges make sense; there are types for which they don't, and this specification is not trying to enforce semantics for byte ranges for them. In a case when byte ranges aren't meaningful the client doesn't request them (why would it?), and the server should tag them as invalid requests (in case someone wrote them by hand). As an example, it makes sense to get a portion of a large flat text file, a page from a PDF file, or the remaining part of a truncated GIF image. It does not (necessarily) make sense to take a portion of a database query result. Byte range requests are typically generated by software, not written by humans. The Allow-Ranges HTTP Header The server needs to let the client know that it can support byte ranges. This is done through the Allow-Ranges HTTP header when a server is returning a document that supports byte ranges: Allow-Ranges: bytes The server will send this header only for documents for which it will be able to satisfy the byte range request, e.g. for PDF documents, or images, which can be partially reloaded if the user interrupts the page load, and image gets only partially cached. Syntax of the "bytes" URL Parameter The name of the byte range parameter is bytes. It is attached to the Luotonen, Franks [Page 2] HTTP URL BYTE RANGES INTERNET-DRAFT November 1995 end of the path part of the URL, separated by a semicolon and followed by an equal sign and the byte range specification. Any possible query string or fragment id is placed after the bytes parameter. On the server side, if the URL points to a CGI application, the byte range parameter is included in the PATH_INFO variable. The range specification obeys the following rules: * The range consists two non-negative integers, separated by a hyphen. * The first integer must always be less than or equal to the second one. * The range is inclusive; as an example, the range 500-1000 includes bytes from 500 to 1000, including 500 and 1000. * The first byte in a document is byte number 0. * One of the numbers may be missing, but not both at the same time. * If the first number is missing, it means to return the n last bytes of the document, where n is the second number. If n is equal to, or larger than, the size of the document minus one, then the entire file is returned. * If the second number is missing, it means the end of document. That is, all the bytes starting from byte n until the end of the document, where n is the first number. * If the second number is larger than the size of the document minus one, it is taken to mean the size of the document minus one (that is, the end of the document). * In the case that the second integer is smaller than the first one, an empty range is returned. * There may be multiple ranges, separated by a comma. The order of the ranges is the preferred order in which the ranges should be returned. * The byte ranges refer to ranges in data as they are transferred over the network (and retrieved by the client). E.g. if in an imaginary system the server stores all lines terminated by CR LF, but turns them into a single LF before sending the data, then byte Luotonen, Franks [Page 3] HTTP URL BYTE RANGES INTERNET-DRAFT November 1995 ranges refer to ranges inside this modified data (the one with single LF line separators). That is, the ranges refer to the data that the client would see. * The byte ranges apply to the "raw" data, that is, the data encoded by Content-encoding; but not to the "armored" data, that is, the data encoded by content-transfer-encoding. Examples of the "bytes" URL parameter The first 500 bytes: http://host/dir/foo;bytes=0-499 The second 500 bytes: http://host/dir/foo;bytes=500-999 All bytes except for the first 500 until the end of document: http://host/dir/foo;bytes=500- The last 500 bytes of the document: http://host/dir/foo;bytes=-500 Two separate ranges: http://host/dir/foo;bytes=50-99,200-249 The first 100 bytes, 1000 bytes starting from the byte number 500, and the remainder of the document starting from byte number 4000 (byte numbering starts from zero): http://host/dir/foo;bytes=0-99,500-1499,4000- The first 100 bytes, 1000 bytes starting from the byte number 500, and the last 200 bytes of the document: http://host/dir/foo;bytes=0-99,500-1499,-200 Byte Range HTTP Response If the request includes multiple ranges, the response is sent back as a multipart MIME message, with content-type multipart/x-byteranges. A server may send also a single byte range as a multipart message. Luotonen, Franks [Page 4] HTTP URL BYTE RANGES INTERNET-DRAFT November 1995 If there are overlapping ranges the behaviour for each range doesn't change. That is, a range will not be truncated, merged, or left out, just because there is an overlap. The following HTTP response header is sent back to provide verification and information about the range and total size of the document: Range: bytes X-Y/Z where: X is the number of the first byte returned (the first byte is byte number zero). Y is the number of the last byte returned (in case of the end of the document this is one smaller than the size of the document in bytes). Z is the total size of the document in bytes. Examples of the Range: HTTP Response Header The first 500 bytes of a 1234 byte document: Range: bytes 0-499/1234 The second 500 bytes of the same document: Range: bytes 500-999/1234 All bytes until the end of document, except for the first 500 bytes: Range: bytes 500-1233/1234 The last 500 bytes of the same document: Range: bytes 734-1233/1234 Multipart MIME messages Multipart MIME is defined in [RFC-1521]. With byteranges, the multipart MIME message uses content-type multipart/x-byteranges, with a boundary parameter. Example: Luotonen, Franks [Page 5] HTTP URL BYTE RANGES INTERNET-DRAFT November 1995 Content-type: multipart/x-byteranges; boundary=THIS_STRING_SEPARATES --THIS_STRING_SEPARATES Content-type: application/x-pdf Range: bytes 500-999/8000 ...the first range... --THIS_STRING_SEPARATES Content-type: application/x-pdf Range: bytes 7000-7999/8000 ...the second range... --THIS_STRING_SEPARATES-- Caching and Proxies The server must give Last-modified headers for each range request whenever possible, and the client side must take care of having all the fragments in sync. Conditional GET (the GET request with the If- modified-since header) works as expected with byte ranges. Ranges can be cached, and if the Last-modified header matches they can be combined. In fact, existing proxies will cache ranges but they won't know that they are part of a larger document. This may cause partially multiple copies in the cache, but not otherwise incorrect behaviour. The purpose of this document is to specify a standard way to represent byte ranges so that proxies can benefit from this information. The client side should monitor the Last-modified header value returned by the server, and make sure that all of its individual fragments are in sync. If there are older ones they should be immediately discarded and re-retrieved. This request should have the line: Pragma: no-cache to force intermediate proxies to reload the out-of-date fragment even if the proxy is not configured to do the check every time. Other than that, proxies are already designed to solve problems with out-of-date documents, so that issue needs not be covered here in greater detail. Suffices to say that proxies should make sure that each individual byte range is in sync with respect to its Last- modified time. Luotonen, Franks [Page 6] HTTP URL BYTE RANGES INTERNET-DRAFT November 1995 Future Considerations Multiple URL Parameters If at some point there will be multiple simultaneous URL parameters, they should be separated by the semicolon character. Example: http://host/dir/foo;param1=bar;param2=xyzzy This specification doesn't define semantics for cases with multiple URL parameters. Future specifications should define semantics for these. Until then, multiple URL parameters should be treated as error conditions (501 Not implemented), just as a single URL parameter is treated as invalid by servers without support for the feature specified in this document. Other Possible Ranges There are other kinds of ranges that can be addressed in a similar fashion; this document does not define them, but both the URL parameter scheme described here and the Range: HTTP header are defined so that it is possible to extend them. As an example, there might be a lines URL parameter, with the same kind of range specification, and the Range: header would then specify the numbers in lines. Example: http://host/dir/foo;lines=20-30 The response from a 123 line document would be: Range: lines 20-30/123 This could be useful for such things as structured text files like address lists or digests of mail and news, but isn't meaningful to such document types as GIF or PDF. Other examples might be document format specific ranges, such as chapters: http://host/dir/foo;chapters=6-9 Range: chapters 6-9/12 Luotonen, Franks [Page 7] HTTP URL BYTE RANGES INTERNET-DRAFT November 1995 URL Encoding The semicolon(s), the "bytes" keyword, hyphen(s), and digits, must not be encoded using the URL encoding mechanism. If they are encoded, it is done to prevent them from being understood as a byterange request. References [RFC-1521] N. Borenstein, N. Freed, "MIME (Multipurpose Internet Mail Extensions), Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies", RFC 1521, Bellcore, Innosoft, September 1993 [HTTP] T. Berners-Lee, R. Fielding, H. Frystyk, "Hypertext Transfer Protocol -- HTTP/1.0", draft-ietf-http-v10-spec-04.html, October 14, 1995. Authors' Addresses: Ari Luotonen Netscape Communications Corporation 501 E. Middlefield Road Mountain View, CA 94043 USA John Franks Department of Mathematics Northwestern University Evanston, IL 60208-2730 Luotonen, Franks [Page 8]