Network Working Group M. Kerwin Internet-Draft QUT Intended status: Standards Track July 29, 2013 Expires: January 30, 2014 The 'file' URI Scheme draft-kerwin-file-scheme-06 Abstract This document specifies the file Uniform Resource Identifier (URI) scheme that was originally specified in [RFC1738]. The purpose of this document is to keep the information about the scheme on standards track, since [RFC1738] has been made obsolete, and to promote interoperability by resolving disagreements between various implementations. Note to Readers This draft should be discussed on its github project page [github]. Status of This Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on January 19, 2014. Copyright Notice Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents Kerwin Expires January 19, 2014 [Page 1] Internet-Draft 'file' URI July 2013 carefully, as they describe your rights and restrictions with respect to this document. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. History . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2. Conventions and Terminology . . . . . . . . . . . . . . . 4 2. Scheme Definition . . . . . . . . . . . . . . . . . . . . . . 4 2.1. Syntax . . . . . . . . . . . . . . . . . . . . . . . . . 5 3. Implementation Notes . . . . . . . . . . . . . . . . . . . . 6 3.1. Hierarchical Structure . . . . . . . . . . . . . . . . . 6 3.2. Relative file paths . . . . . . . . . . . . . . . . . . . 7 3.3. Drives, drive letters, mount points, file system root . . 7 3.4. UNC File Paths . . . . . . . . . . . . . . . . . . . . . 8 3.5. Namespaces . . . . . . . . . . . . . . . . . . . . . . . 9 4. Encoding and Character Set Considerations . . . . . . . . . . 9 5. Security Considerations . . . . . . . . . . . . . . . . . . . 10 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 6.1. URI Scheme Name . . . . . . . . . . . . . . . . . . . . . 10 6.2. Status . . . . . . . . . . . . . . . . . . . . . . . . . 10 6.3. URI Scheme Syntax . . . . . . . . . . . . . . . . . . . . 10 6.4. URI Scheme Semantics . . . . . . . . . . . . . . . . . . 11 6.5. Encoding Considerations . . . . . . . . . . . . . . . . . 11 6.6. Applications/Protocols That Use This URI Scheme Name . . 11 6.7. Interoperability Considerations . . . . . . . . . . . . . 12 6.8. Security Considerations . . . . . . . . . . . . . . . . . 12 6.9. Contact . . . . . . . . . . . . . . . . . . . . . . . . . 12 6.10. Author/Change Controller . . . . . . . . . . . . . . . . 12 6.11. References . . . . . . . . . . . . . . . . . . . . . . . 12 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 13 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 13 8.1. Normative References . . . . . . . . . . . . . . . . . . 13 8.2. Informative References . . . . . . . . . . . . . . . . . 14 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 15 Kerwin Expires January 19, 2014 [Page 2] Internet-Draft 'file' URI July 2013 1. Introduction The 'file' URI scheme has historically had little or no interoperability between platforms. Further, implementers on a single platform have often disagreed on the syntax to use for a particular filesystem. This document attempts to resolve those problems, and define a standard scheme which is interoperable between different extant and future implementations. Additionally, it aims to ease implementation by conforming to a general syntax that allows existing URI parsing machinery to parse 'file' URIs. URIs were previously defined in [RFC1738], which was updated by [RFC3986]. Those documents also specify how to define schemes for URIs. The first definition for many URI schemes appeared in [RFC1738]. Because that document has been made obsolete, this document copies the 'file' URI scheme from it to allow that material to remain on standards track. 1.1. History This section is non-normative. The 'file' URI scheme was first defined in [RFC1630], which, being an informational RFC, does not specify an Internet standard. The definition was standardised in [RFC1738], and the scheme was registered with the Internet Assigned Numbers Authority (IANA) [IANA-URI-Schemes]; however that definition omitted certain language included by former that clarified aspects such as: o the use of slashes to donate boundaries between directory levels of a hierarchical file system; and o the requirement that client software convert the 'file' URI into a file name in the local file name conventions. The Internet draft [I-D.draft-hoffman-file-uri] was written in an effort to keep the 'file' URI scheme on standards track when [RFC1738] was made obsolete, but that draft expired in 2005. It enumerated concerns arising from the various, often conflicting implementations of the scheme. It serves as the basis of this document. Kerwin Expires January 19, 2014 [Page 3] Internet-Draft 'file' URI July 2013 The 'file' URI scheme defined in [RFC1738] is referenced three times in the current URI Generic Syntax standard [RFC3986], despite the former's obsoletion: 1. RFC 3986, Section 1.1 uses "file:///etc/hosts" as an example for identifying a resource in relation to the end-user's local context. 2. RFC 3968, Section 1.2.3 mentions the "file" scheme regarding relative references. 3. RFC 3968, Section 3.2.2 says that '...the "file" URI scheme is defined so that no authority, an empty host, and "localhost" all mean the end-user's machine...'. Finally the WHATWG defines a living URL standard [WHATWG-URL], which includes algorithms for interpreting file URIs (as URLs). 1.2. Conventions and Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. 2. Scheme Definition The 'file' URI scheme is used to identify and retrieve files accessible on a particular host computer, where a "file" is a named resource which can be accessed through the computer's filesystem interface. The mechanism for retrieving a representation of a dereferenced 'file' URI is through the computer's filesystem interface; for example using the POSIX "open", "read" and "close" functions [POSIX]. This scheme, unlike most other URI schemes, does not identify a resource that is universally accessible over the Internet. Note well that "file" refers to filesystem names from the perspective of the user of a reference, rather than in relation to a globally- defined naming authority, so care ought to be taken when distributing 'file' URIs to ensure that such references are actually intended to be interpreted in relation to the user's filesystem interface. Also note that 'file' and 'ftp' URIs are not the same, even when the target of the 'ftp' URI is the local host. Kerwin Expires January 19, 2014 [Page 4] Internet-Draft 'file' URI July 2013 2.1. Syntax The syntax of a 'file' URI conforms with the generic syntax presented in [RFC3986], with the following components: scheme name The literal value "file" authority If present, either the fully qualified domain name of the system on which the file is accessible; or one of the special values "localhost" or the empty string (""). If the authority component is omitted, or has either of the special values "localhost" or the empty string (""), it is interpreted as "the machine from which the URI is being interpreted". A host name, if supplied, is typically intended to inform a client on a remote machine that it cannot access the file system, or perhaps to use some other mechanism to access the file. It does not imply that the file ought to be accessible over a network connection. No retrieval mechanism for files stored on a remote machine is defined by this specification. path The hierarchical directory path to the file, using the slash ("/") character to separate directories. Implementations SHOULD translate between the slash-separated URI syntax and the local system's conventions for specifying file paths, where they differ. (See: Section 3.1) Some systems allow 'file' URIs to refer to directories. In this case, the path usually (but not always) includes a terminating slash character, such as in: file:///usr/local/bin/ The presence of a terminating slash character always indicates that the 'file' URI refers to a directory, but the absence of a slash does not necessarily indicate that it refers to a file. Implementations ought to use other mechanisms to detect directories, if and when such detection is required. Kerwin Expires January 19, 2014 [Page 5] Internet-Draft 'file' URI July 2013 query The query component contains non-hierarchical data that, along with data in the path component, serves to identify a file. For example, in a versioning file system, the query component might be used to refer to a specific version of a file. Few implementations are known to use or support query components. fragment The semantics of a fragment component are undefined by this specification. A protocol that employs 'file' URIs MAY define its own semantics for fragment components in the context of that protocol. Systems exhibit different levels of case-sensitivity. Implementations SHOULD maintain the case of file and directory names when translating 'file' URIs to and from the local system's representation of file paths, and any systems or devices that transport 'file' URIs SHOULD NOT alter the case of 'file' URIs they transport. See Section 6.3 for a formal syntax defintion. Previous definitions of the 'file' URI scheme required two slashes between the scheme and path, so implementations that wish to remain interoperable with older implementations ought to include an authority component in any 'file' URIs they generate. 3. Implementation Notes 3.1. Hierarchical Structure Most implementations of the 'file' URI scheme do a reasonable job of mapping the hierarchical part of a directory structure into the slash ("/") delimited hierarchy of the URI syntax, independent of the native platform's delimiter. For example, on Microsoft Windows platforms, it is typical that the file system presents backslash ("\") as the file delimeter for file names, yet the URI's forward slash ("/") can be used in 'file' URIs interpreted on those platforms. Similarly, on (some) Macintosh OS versions, at least in some contexts, the colon (":") is used as the delimiter in the native presentation of file path names. Unix systems natively use the same forward slash ("/") delimiter for hierarchy, so there is a closer mapping between 'file' URI paths and native path names. Kerwin Expires January 19, 2014 [Page 6] Internet-Draft 'file' URI July 2013 In accordance with Section 3.3 of [RFC3986], the path segments "." and "..", also known as dot-segments, are only interpreted within the URI path hierarchy and are removed as part of the resolution process ([RFC3986], Section 5.2). Implementations operating on or interacting with systems that allow dot-segments in their native path representation may be required to escape those segments using some other means when translating to and from 'file' URIs. 3.2. Relative file paths As relative references are resolved into their respective (absolute) target URIs according to Section 5 of [RFC3986], this document does not describe that resolution. However, a fully resolved URI may contain a non-absolute path. For example, using a generic parser, the URI: file:a/b/c would be parsed and interpreted as: file 'c', in directory 'b', in directory 'a', on the machine on which the URI is being interpreted (i.e. localhost); however there is no indication of the location of the directory 'a' on that machine. By convention an absolute file path would begin with a slash ("/") character on a Unix-based system, or a drive letter (e.g. "c:\") on a Microsoft Windows system, etc. Resolution of relative file paths is undefined by this specification. A protocol that employs 'file' URIs MAY define its own rules for resolution of relative file paths in the context of that protocol. 3.3. Drives, drive letters, mount points, file system root Historically there has been considerable difference, in practice, for handling of the syntax for the "top" of the hierarchy. The 'file' URI syntax provides one simple place for designating the root of the file hierachy, and implementations have diverged, even on the same platform, sometimes even within a single application. For example, Microsoft DOS- and Windows-based systems support the notion of a "drive letter", a single character which represents a (virtual) drive, mount point, or device. Native representations of file paths start with the drive letter, a colon, and then the path; e.g., "c:\TMP\test.txt". Drive letters are mapped into the top of a 'file' URI in various ways. On systems running some versions of Microsoft Windows, the drive letter may be specified with a colon (":") character, however sometimes the colon is replaced with a pipe ("|") character, and in some implementations the colon is omitted entirely. The three Kerwin Expires January 19, 2014 [Page 7] Internet-Draft 'file' URI July 2013 representations MAY be considered equivalent, and any implementation which could interact with a Microsoft Windows environment SHOULD interpret a single letter, optionally followed by a colon or pipe character, in the first segment of the path as a drive letter. For example, the following URIs: file:///c:/TMP/test.txt file:///c|/TMP/test.txt file:///c/TMP/test.txt when interpreted on the same machine, would refer to the same file: c:\TMP\test.txt Implementations SHOULD use a colon (":") character to specify drive letters when generating URIs for Windows- and DOS-based systems. Note that some systems running some versions of Microsoft Windows are known to omit the slash before the drive letter, effectively replacing the authority component with the drive specification; for example, "file://c:/TMP/test.txt". In line with Postel's robustness principle ("an implementation must be conservative in its sending behavior, and liberal in its receiving behavior" [RFC791]) implementations that are likely to encounter such a URI MAY interpret it as a drive letter, but SHOULD NOT generate such URIs. 3.4. UNC File Paths The Microsoft Windows Universal Naming Convention (UNC) [MS-DTYP] defines a convention for specifying the location of resources such as shared files or devices, for example Windows shares accessed via the SMB/CIFS protocol [MS-SMB2]. The general structure of a UNC file path, given in Augmented Backus-Naur Form (ABNF) [RFC5234], is: UNC = "\\" hostname "\" sharename *( "\" objectname ) hostname = sharename = objectname = Note that this syntax description is non-normative. The canonical representation of a UNC file path as a 'file' URI copies the UNC 'hostname' into the URI 'host' field, and the UNC 'sharename' and 'objectname's, concatenated with forward slash ("/") characters, into the 'path'. For example, the following UNC path: \\server.example.com\Share\path\to\file.doc Kerwin Expires January 19, 2014 [Page 8] Internet-Draft 'file' URI July 2013 is represented as a 'file' URI canonically as: file://server.example.com/Share/path/to/file.doc \________________/\_____________________/ hostname sharename+objectnames Historically some implementations have translated UNC file paths entirely into the 'path' segment of a 'file' URI, including both leading slashes, and the Firefox web browser even prefixes the UNC file path with another slash. For example, the UNC path given above might be translated as: file:////server.example.com/Share/path/to/file.doc \_________________________________________/ translated UNC path or even: file://///server.example.com/Share/path/to/file.doc \_________________________________________/ translated UNC path An implementation receiving such a URI SHOULD convert it to the canonical representation before processing it. The 'file' URI scheme is unusual in that it does not specify an Internet protocol or access method for shared files; as such, its utility in network protocols between hosts is limited. Examples of file server protocols that do define such access methods include SMB/ CIFS [MS-SMB2], NFS [RFC3530], and NCP [NOVELL]. 3.5. Namespaces The Microsoft Windows API defines Win32 Namespaces [Win32-Namespaces] for interacting with files and devices using Windows API functions. These namespaced paths are prefixed by "\\?\" for Win32 File Namespaces and "\\.\" for Win32 Device Namespaces. There is also a special case for UNC file paths [MS-DTYP] in Win32 File Namespaces, referred to as "Long UNC", using the prefix "\\?\UNC\". This specification does not define a mechanism for translating namespaced file paths into 'file' URIs. 4. Encoding and Character Set Considerations As specified in [RFC3986], the 'file' URI scheme allows any character from the Universal Character Set (UCS, [ISO10646]) encoded as UTF-8 [RFC3629] and then percent-encoded in valid ASCII [RFC20]. Kerwin Expires January 19, 2014 [Page 9] Internet-Draft 'file' URI July 2013 If the local file system uses a known non-Unicode character encoding, the file path SHOULD be converted to a sequence of Unicode characters normalized according to Normalization Form C (NFC, [UTR15]). Before applying any percent-encoding, an application MUST ensure the following about the string that is used as input to the URI- construction process: o The host, if any, consists only of Unicode code points that conform to the rules specified in [RFC5892]. o Internationalized domain name (IDN) labels are encoded as A-labels [RFC5890]. 5. Security Considerations There are many security considerations for URI schemes discussed in [RFC3986]. File access and the granting of privileges for specific operations are complex topics, and the use of 'file' URIs can complicate the security model in effect for file privileges. Under no circumstance should software using 'file' URIs grant greater access than would be available for other file access methods. 6. IANA Considerations In accordance with the guidelines and registration procedures for new URI schemes [RFC4395], this section provides the information needed to update the registration of the 'file' URI scheme. 6.1. URI Scheme Name file 6.2. Status permanent 6.3. URI Scheme Syntax The 'file' URI syntax is defined here in Augmented Backus-Naur Form (ABNF) [RFC5234], including the core ABNF syntax rule 'ALPHA' defined by that specifiation, and borrowing the 'host', 'path-absolute' and 'segment' rules from [RFC3986] (updated by [RFC6874]). Kerwin Expires January 19, 2014 [Page 10] Internet-Draft 'file' URI July 2013 fileURI = "file" ":" ( auth-file / local-file ) auth-file = "//" ( host-file / no-host-file ) host-file = hostpart path-absolute ; file:/// ; file://localhost/ no-host-file = path-abs ; begins with "/" / path-abs-win ; begins with drive-letter ; file:/// ; file://// ; file://c:/ * local-file = path-absolute ; file:/ / path-abs-win ; file:c:/ hostpart = "localhost" / host path-abs = 1*( "/" segment ) path-abs-win = drive-letter path-absolute drive-letter = ALPHA [ drive-marker ] drive-marker = ":" / "|" * The 'no-host-file' rule allows for dubious URIs that encode a Windows drive letter as the authority component. See Section 3.3 of RFC XXXX. [Note to RFC Editor: please replace XXXX with the number issued to this document.] 6.4. URI Scheme Semantics See Section 2 of RFC XXXX. [Note to RFC Editor: please replace XXXX with the number issued to this document.] 6.5. Encoding Considerations See Section 4 of RFC XXXX. [Note to RFC Editor: please replace XXXX with the number issued to this document.] 6.6. Applications/Protocols That Use This URI Scheme Name Web browsers: o Firefox Note: Firefox has a unique interpretation of RFC 1738 (See: Bugzilla#107540 [1]), which affects UNC paths. Kerwin Expires January 19, 2014 [Page 11] Internet-Draft 'file' URI July 2013 o Chromium o Internet Explorer o Opera Other applications/protocols: o Windows API [2,3] o Perl LWP 6.7. Interoperability Considerations Due to the convoluted history of the 'file' URI scheme there a many, varied implementations in existence. Many have converged over time, forming a few kernels of closely-related functionality, and RFC XXXX attempts to accommodate such common functionality. [Note to RFC Editor: please replace XXXX with the number issued to this document.] However there will always be exceptions, and this fact is recognised. 6.8. Security Considerations See Section 4 of RFC XXXX [Note to RFC Editor: please replace XXXX with the number issued to this document.] 6.9. Contact Matthew Kerwin, matthew@kerwin.net.au 6.10. Author/Change Controller This scheme is registered under the IETF tree. As such, the IETF maintains change control. 6.11. References [1] "Bug 107540", Bugzilla@Mozilla, October 2007, . [2] "PathCreateFromUrl function", MSDN, June 2013, . [3] "UrlCreateFromPath function", MSDN, June 2013, Kerwin Expires January 19, 2014 [Page 12] Internet-Draft 'file' URI July 2013 7. Acknowledgements This specification is derived from RFC 1738 [RFC1738], RFC 3986 [RFC3986], and I-D draft-hoffman-file-uri (expired) [I-D.draft-hoffman-file-uri]; the acknowledgements in those documents still apply. 8. References 8.1. Normative References [ISO10646] International Organization for Standardization, "ISO/IEC 10646:2003: Information Technology - Universal Multiple- Octet Coded Character Set (UCS)", ISO Standard 10646, December 2003. [RFC20] Cerf, V., "ASCII format for Network Interchange", RFC 20, October 1969. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 10646", STD 63, RFC 3629, November 2003. [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, January 2005. [RFC5890] Klensin, J., "Internationalized Domain Names for Applications (IDNA): Definitions and Document Framework", RFC 5890, August 2010. [RFC5892] Faltstrom, P., "The Unicode Code Points and Internationalized Domain Names for Applications (IDNA)", RFC 5892, August 2010. [RFC6874] Carpenter, B., Cheshire, S., and R. Hinden, "Representing IPv6 Zone Identifiers in Address Literals and Uniform Resource Identifiers", RFC 6874, February 2013. [UTR15] Davis, M. and K. Whistler, "Unicode Normalization Forms", August 2012, . Kerwin Expires January 19, 2014 [Page 13] Internet-Draft 'file' URI July 2013 8.2. Informative References [I-D.draft-hoffman-file-uri] Hoffman, P., "The file URI Scheme", draft-hoffman-file- uri-03 (work in progress), January 2005. [IANA-URI-Schemes] Internet Assigned Numbers Authority, "Uniform Resource Identifier (URI) Schemes registry", June 2013, . [MS-DTYP] Microsoft Open Specifications, "Windows Data Types, 2.2.56 UNC", January 2013, . [MS-SMB2] Microsoft Open Specifications, "Server Message Block (SMB) Protocol Versions 2 and 3", January 2013, . [NOVELL] Novell, "NetWare Core Protocols", 2013, . [POSIX] IEEE, "IEEE Std 1003.1, 2013 Edition", 2013, . [RFC1630] Berners-Lee, T., "Universal Resource Identifiers in WWW: A Unifying Syntax for the Expression of Names and Addresses of Objects on the Network as used in the World-Wide Web", RFC 1630, June 1994. [RFC1738] Berners-Lee, T., Masinter, L., and M. McCahill, "Uniform Resource Locators (URL)", RFC 1738, December 1994. [RFC3530] Shepler, S., Callaghan, B., Robinson, D., Thurlow, R., Beame, C., Eisler, M., and D. Noveck, "Network File System (NFS) version 4 Protocol", RFC 3530, April 2003. [RFC4395] Hansen, T., Hardie, T., and L. Masinter, "Guidelines and Registration Procedures for New URI Schemes", BCP 35, RFC 4395, February 2006. [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", STD 68, RFC 5234, January 2008. [RFC791] Postel, J., "Internet Protocol - DARPA Internet Program, Protocol Specification", RFC 791, September 1981. Kerwin Expires January 19, 2014 [Page 14] Internet-Draft 'file' URI July 2013 [WHATWG-URL] WHATWG, "URL Living Standard", May 2013, . [Win32-Namespaces] Microsoft Developer Network, "Naming Files, Paths, and Namespaces", June 2013, . [github] Kerwin, M., "file-uri-scheme GitHub repository", n.d., . Author's Address Matthew Kerwin Queensland University of Technology Victoria Park Rd Kelvin Grove, QLD 4059 Australia EMail: matthew.kerwin@qut.edu.au Kerwin Expires January 19, 2014 [Page 15]