Network Working Group S. Leonard Internet-Draft Penango, Inc. Intended Status: Informational September 25, 2015 Expires: March 28, 2016 URI Schemes for SHA-1 and SHA-256 draft-seantek-sha-uris-02 Abstract This document registers Uniform Resource Identifier schemes for use with certain Secure Hash Algorithm (SHA) functions, namely SHA-1 and SHA-256. The purpose is to identify data streams and content in a simple, "drop-in" way within the URI infrastructure. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on March 28, 2016. Copyright Notice Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Leonard Informational [Page 1] Internet-Draft SHA URIs September 2015 1. Introduction This document registers Uniform Resource Identifier schemes for use with certain Secure Hash Algorithm (SHA) functions, namely SHA-1 and SHA-256. The purpose is to identify binary data streams in a simple, "drop-in" way within the URI infrastructure. This document also provides parallel means to identify Internet content or messages. Frequently Internet-facing applications need to store or transmit identifiers for wide-ranging types of content, including security structures (certificates, public keys, authorization tokens), executable code, and resource manifests (well-defined data formats that serve to structure data streams, which may be significantly larger and which are not self-identifiable). These applications achieve greater interoperability by using a common syntax for these identifiers; using URIs [RFC3986] suits their purposes well. Some of the most important properties of URIs are that they are easy to recognize by humans, and that they can be created using simple "copy- and-paste" operations. sha1:2FD4E1C6:7A2D28FC:ED849EE1:BB76E739:1B93EB12;43 sha1:2FD4E1C6-7A2D28FC~ED849EE1_BB76E739.1B93EB12 sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 Figure 1: Example SHA URIs 1.1. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. 2. Data Stream Description, Syntax, and Semantics The syntax of SHA-1 and SHA-256 URIs consists of the scheme ("sha1:" or "sha256:"), followed by the hexadecimal encoding of the hash value. Certain delimiters such as space ("%20"), tab ("%09"), newline ("%0D" or "%0A"), colon ":", period ".", tilde "~", hyphen "-", and underscore "_" are permitted and do not affect equivalence. The hexadecimal characters are case-insensitive. In spite of these leniencies, this specification RECOMMENDS that generators emit lowercase hexadecimal and use no delimiters. The identifier identifies any data stream that, when hashed, produces the same hash value. Optionally, the scheme may be suffixed with ";" and the length of the Leonard Informational [Page 2] Internet-Draft SHA URIs September 2015 data stream in octets (8-bit units). If the data being identified is a bit stream that is not a multiple of 8 bits, the extra suffix "b" followed by "1" through "7" is to be appended. When the length suffix is present, the identifier identifies any data stream of the specified length that, when hashed, produces the same hash value. [SHS] is only defined for messages (data streams) less than 2^64 bits (2^61 octets). Thus, the largest length production that an application would expect to encounter is ";2305843009213693951b7". Behaviors beyond this length are undefined. There is no way to distinguish between "bit streams" (that are multiples of 8 bits) and "octet streams" as categories; this ambiguity is intentional because [SHS] does not distinguish between them. Furthermore, there is no way to truncate the hash value: a parser that receives too few (or too many) hexadecimal digits MUST NOT accept the hash value. The URI schemes defined in this document are not intended specifically for human readability or speakability. They are, however, designed for human "copy-and-paste-ability": the ability of a human operator (or automated text-based operator, such as a simple script) to feed such URIs into processes that can compare them for exact equality. Based on the labeled input, a process can generate appropriate representations for human comparison, such as visual schemes [VISHASH]. The ABNF [RFC5234] is: sha1uri = "sha1:" ( 40HEXDIG / 40*relaxed ) [ length ] sha256uri = "sha256:" ( 64HEXDIG / 64*relaxed ) [ length ] ; HEXDIG is from [RFC5234] ; in the relaxed productions, there must be exactly ; 40 HEXDIG for sha1uri, and 64 HEXDIG for sha256uri relaxed = HEXDIG / ":" / "." / "~" / "-" / "_" / "%20" / "%0D" / "%0A" / "%09" length = ";" wholenum [ bits ] wholenum = "0" / ( %x31-39 *DIGIT ) ; DIGIT is from [RFC5234] bits = "b" ( %x31-37 ) ; 1 - 7 ; "b" is not case-sensitive, but lowercase SHOULD be used Figure 2: ABNF of SHA Data URIs Other URI schemes such as ni: [RFC6920] include or incorporate Leonard Informational [Page 3] Internet-Draft SHA URIs September 2015 hashes. Software may be written to convert URIs between such schemes. Although the bits of the resource may be equivalent, the URI semantics may differ. The SHA-1 and SHA-256 URIs in this document identify binary data streams (i.e., an ordered sequence of bits). 3. Message (Content) Description, Syntax, and Semantics The URI schemes of Section 2 identify data streams; by design, they lack means to include metadata such as filenames or Internet media types. This section defines two additional URI schemes, "sha1msg:" and "sha256msg:", to identify content that includes Internet message headers. The syntax of the message schemes is similar, but not identical, to the data schemes. Length is optional; if present, it MUST be an integral number of octets. Similar to mid: URIs [RFC2392], specific content-parts may be identified by adding a slash "/" and the (encoded) Content-ID of the part. Implementations MUST generate and parse productions that conform to [RFC5322] (Internet message), [RFC6532] (Internationalized Email Headers), [RFC3030] (Binary MIME), and [RFC2045], [RFC2046], [RFC2047], and [RFC2231] (collectively, MIME). Technically, they are octet streams that include Internet message headers, specifically MIME-conformant headers. The default Content-Transfer-Encoding is "binary" (as the channel is binary-clean) and the default MIME- Version is "1.0". These headers MAY be omitted from the production unless they vary from these defaults. While it would not be inaccurate to label the Content-Type of such a production as "message/global", when a SHA Message URI is dereferenced, the resource takes on the Content-Type of the Content- Type header of the production, along with all other header metadata. Any message type (such as HTTP response messages [RFC7230], Netnews articles [RFC5536], or email messages [RFC5322]) is therefore suitable for more-or-less direct identification by SHA Message URIs. Implementers need to note that the encoding of headers is UTF-8 [RFC6532]. HTTP response messages [RFC7230], however, have historically been ISO-8859-1. If an HTTP response message is used as the message, characters outside the ASCII range MUST be re-encoded; they SHOULD be encoded directly to UTF-8, but MAY be encoded via other means, such as [RFC2047]. If the message is intended to convey an enclosed HTTP response (as opposed to the content of an HTTP response), it is appropriate to label the content with Content-Type: message/http or application/http [RFC7230], where the full HTTP response--including headers--is the content of the message. In such a case, re-encoding SHALL NOT occur. Leonard Informational [Page 4] Internet-Draft SHA URIs September 2015 SHA Message URIs are similar to mid: and cid: URIs [RFC2392] in that they uniquely identify a message or content. However, while Message- IDs are assigned to and embedded into the content, SHA hashes are intrinsic properties of the message (octet stream). Attempts to embed the hash into the message would alter it. Furthermore, the semantics of Message-ID are not defined when the top-level Content-Type is not "message". In contrast, Content-ID is defined for all types of content. The ABNF [RFC5234] is: ; assumes productions in Figure 1 are defined sha1msguri = "sha1msg:" ( 40HEXDIG / 40*relaxed ) [ msglength ] [ "/" content-id ] sha256msguri = "sha256msg:" ( 64HEXDIG / 64*relaxed ) [ msglength ] [ "/" content-id ] msglength = ";" wholenum content-id = id-left "@" id-right id-left = pchar-dot-atom-text pchar-dot-atom-text = 1*pchar-atext *("." 1*pchar-atext) ; omits # % ? ^ ` { | } from RFC 5322 atext ; pct-encoded is from RFC 3986 ; TODO: Argh, RFC 6532 extends atext to include UTF-8! pchar-atext = ALPHA / DIGIT / "!" / "$" / "&" / "'" / "*" / "+" / "-" / "/" / "=" / "_" / "~" / pct-encoded id-right = pchar-dot-atom-text / pchar-no-fold-literal ; does not include @ but this is just because it would ; look weird in the production ; TODO: Argh, should [ *dtext ] include %5B ... %5D, ; or should the dtext be interpreted (and de-quoted-paired)? ; TODO: Argh, RFC 6532 extends dtext to include UTF-8! pchar-no-fold-literal = unreserved / sub-delims / ":" / pct-encoded Figure 2: ABNF of SHA Message URIs Leonard Informational [Page 5] Internet-Draft SHA URIs September 2015 4. Encoding and Interoperability SHA-1 and SHA-256 URIs conform to [RFC3986]. The syntaxes described in this document specifically conform to the path-rootless production of the hier-part production of the URI production of Section 3 of [RFC3986]. The characters representing the binary hash values in such URIs are limited to hexadecimal, so no further encoding issues are raised based on the identified content. Beware that the characters representing a Content-ID in Message URIs reflect UTF-8 encoding [RFC6532]. Future revisions may cover semantics of other URI-conformant productions. 5. Security Considerations The basic sha1 and sha256 URI schemes identify data streams, not content in the Internet message sense. Supplementary information about the data stream is expected to be provided by context. If an application designer wishes to affix metadata (such as an Internet media type or file modification date) permanently to a data stream, the metadata and data stream should be concatenated into some format, and hashed. Section 3 provides sha1msg and sha256msg URI schemes that identify Internet message content. Additional URI schemes are proposed for message content (as opposed to using disambiguating parameters in the data URIs to indicate the presence of parsable headers) for several reasons: 1. The data URIs are intended to be very lightweight, with minimal room for implementation errors. The data URIs are usable without any network access, such as when querying a local data store. Parsing and interpreting Internet message headers carries a host of security and interoperability ailments that are described in the relevant standards and elsewhere; sha1 and sha256 URI implementations do not need to worry about these hazards. 2. While the sha1 and sha256 URI schemes can (and are routinely expected to) identify data streams that are canonicalized, the sha1msg and sha256msg URI schemes are not designed with canonicalization in mind. Internet message headers do not have canonical forms. Cryptographic hashes are designed to map variable length data streams to fixed length outputs, with four additional properties: Leonard Informational [Page 6] Internet-Draft SHA URIs September 2015 1. Random distribution: A change, addition, or deletion of any bit to the input message (data stream) will affect each and every bit of the output (hash value) with equal probability. 2. Preimage resistance: Given a hash value, finding a corresponding message is computationally infeasible. 3. Second-preimage resistance: Given a hash value and a first message, finding a second message that has the same hash value is computationally infeasible. 4. Collision resistance: Finding two messages that have the same hash value is computationally infeasible. As of the time of this document, reduced rounds of SHA-1 have been cryptanalyzed [OPTJLOC], prompting the security community to migrate to new hash algorithms [RFC6194]. SHA-256 has not yet been cryptanalyzed. The length qualifier is not intended to provide or augment the basic security properties of the SHA-1 or SHA-256 hash algorithms. However, an application SHOULD employ the length qualifier when it knows the length in advance, because this qualifier constrains the set of possible data streams. Collisions exist with any hash algorithm. Consider an arbitrary data stream of 20 octets and 1 bit (161 bits) and the SHA-1 algorithm. By the pigeonhole principle, a collision (where two messages produce the same hash value) must exist, because in the best case, each data stream of 20 octets (160 bits) maps separately to each one of the 2^160 possible hash values. Therefore, the 2^160 + 1st message must map to one of the hash values corresponding to at least one prior message. The probability that a collision exists for all data streams of 20 octets is virtually certain, as well as for data streams appreciably less than 20 octets (birthday paradox). However, the probability that the particular data stream of interest has the same hash as another (malicious) data stream is harder to calculate. It is possible that a trivial change to the message will result in the same hash; it is also possible that no messages of the same length have the same hash. The probability of the latter clearly diminishes, however, as the set of candidate messages expands without bound. The purpose of the optional length qualifier is not simply to reduce the message space, since for all non-trivial messages of n bits the message space of 2^n vastly exceeds the collision certainty threshold. (For illustrative purposes, the author searched for SHA-1 Leonard Informational [Page 7] Internet-Draft SHA URIs September 2015 collisions in 0-, 1-, 2-, 3-, and 4-octet data streams using a brute force algorithm; [none were found] [actually the search is ongoing-- it looks like it will take 85 days]). Rather, the purpose is to detect malicious underflow or overflow conditions before they happen. If an attacker is feeding the recipient, the recipient may accept data without bounds, waiting for the hash computation to complete. In so doing, the attacker will waste resources in addition to computation time, which may cause latent security errors (such as low memory or disk conditions) to manifest. Consider one application of these URI schemes: identifying security- related objects such as PKIX certificates [RFC5280]. Although [RFC5280] does not limit the size of a certificate, most certificates are not appreciably greater than 16 kilobytes, and there are protocol-related pressures to make certificates much smaller (such as less than 4 kilobytes) to fit in fewer TCP segments. A certificate- using application that accepts the URI schemes in this document might reduce its attack surface by rejecting URIs of unspecified length once the candidate certificate data exceeds a threshold (e.g., 64 kilobytes); for larger certificates, the application would require that the length be specified. A secondary purpose of specifying the length is to resist chosen- prefix collision attacks [CHOSEN], which are attacks in which an attacker picks two separate messages, and then appends different values that results in the concatenated messages having the same hash value. [CHOSEN] shows that Merkle-Damgard hash functions are susceptible to this class of attacks. Chosen-prefix collision attacks were successfully used against the MD5 hash algorithm [HCLASH]; both SHA-1 and SHA-256 are Merkle-Damgard constructions. 6. IANA Considerations IANA is requested to register the "sha1", "sha256", "sha1msg", and "sha256msg" URI schemes in the Uniform Resource Identifier (URI) Schemes registry using the templates below, which conform to the June 2015 URI Scheme Guidelines [RFC7595]. 6.1. Assignment of sha1 URI Scheme URI scheme name: sha1 Status: Permanent Applications/protocols that use this URI scheme name: General applicability. Some examples include security applications and systems, database and forensic lookup tools, and distributed peer-to-peer protocols. Leonard Informational [Page 8] Internet-Draft SHA URIs September 2015 Contact: Sean Leonard Change controller: IETF References: This document. 6.2. Assignment of sha256 URI Scheme URI scheme name: sha256 Status: Permanent Applications/protocols that use this URI scheme name: General applicability. Some examples include security applications and systems, database and forensic lookup tools, and distributed peer-to-peer protocols. Contact: Sean Leonard Change controller: IETF References: This document. 6.3. Assignment of sha1msg URI Scheme URI scheme name: sha1msg Status: Permanent Applications/protocols that use this URI scheme name: General applicability. Some examples include security applications and systems, database and forensic lookup tools, and distributed peer-to-peer protocols. Contact: Sean Leonard Change controller: IETF References: This document. 6.4. Assignment of sha256msg URI Scheme URI scheme name: sha256msg Status: Permanent Applications/protocols that use this URI scheme name: General applicability. Some examples include security Leonard Informational [Page 9] Internet-Draft SHA URIs September 2015 applications and systems, database and forensic lookup tools, and distributed peer-to-peer protocols. Contact: Sean Leonard Change controller: IETF References: This document. 9. References 9.1. Normative References [SHS] National Institute of Standards and Technology, "Secure Hash Standard", Federal Information Processing Standard (FIPS) 180-4, March 2012, . [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", RFC 2045, November 1996. [RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types", RFC 2046, November 1996. [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail Extensions) Part Three: Message Header Extensions for Non-ASCII Text", RFC 2047, November 1996. [RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and Encoded Word Extensions: Character Sets, Languages, and Continuations", RFC 2231, November 1997. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3030] Vaudreuil, G., "SMTP Service Extensions for Transmission of Large and Binary MIME Messages", RFC 3030, December 2000. [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, January 2005. [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", STD 68, RFC 5234, January 2008. Leonard Informational [Page 10] Internet-Draft SHA URIs September 2015 [RFC5322] Resnick, P., Ed., "Internet Message Format", RFC 5322, October 2008. [RFC5536] Murchison, K., Ed., Lindsey, C., and D. Kohn, "Netnews Article Format", RFC 5536, November 2009. [RFC6532] Yang, A., Steele, S., and N. Freed, "Internationalized Email Headers", RFC 6532, February 2012. [RFC7230] Fielding, R., Ed., and J. Reschke, Ed., "Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing", RFC 7230, June 2014. [RFC7595] Thaler, D., Hansen, T., and T. Hardie, "Guidelines and Registration Procedures for URI Schemes", BCP 35, RFC 7595, June 2015. 9.2. Informative References [CHOSEN] Stevens, M., Lenstra, A., and B. de Weger, "Chosen-Prefix Collisions for MD5 and Applications", International Journal of Applied Cryptography Vol. 2, No. 4, 322-359, 2012, , doi:10.1504/IJACT.2012.048084. [HCLASH] Stevens, M., Lenstra, A., and B. de Weger, "Chosen-Prefix Collisions for MD5 and Colliding X.509 Certificates for Different Identities", IACR EUROCRYPT 2007, Lecture Notes in Computer Science 4515 1-22, 2007, , doi:10.1007/978-3-540-72540-4_1. [OPTJLOC] Stevens, M., "New Collision Attacks on SHA-1 Based on Optimal Joint Local-Collision Analysis", EUROCRYPT 2013, Lecture Notes in Computer Science 7881 245-261, 2013, , doi:10.1007/978-3-642-38348-9_15. [VISHASH] Hsiao, H., Lin, Y., Studer, A., Studer, C., Wang, K., Kikuchi, H., Perrig, A., Sun, H., and B. Yang, "A Study of User-Friendly Hash Comparison Schemes," Computer Security Applications Conference, 105-114, 2009, , doi:10.1109/ACSAC.2009.20. [RFC2392] Levinson, E., "Content-ID and Message-ID Uniform Resource Locators", RFC 2392, August 1998. Leonard Informational [Page 11] Internet-Draft SHA URIs September 2015 [RFC5280] Cooper, D., Santesson, S., Farrell, S., Boeyen, S., Housley, R., and W. Polk, "Internet X.509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile", RFC 5280, May 2008. [RFC6194] Polk, T., Chen, L., Turner, S., and P. Hoffman, "Security Considerations for the SHA-0 and SHA-1 Message-Digest Algorithms", RFC 6194, March 2011. [RFC6920] Farrell, S., Kutscher, D., Dannewitz, C., Ohlman, B., Keranen, A., and P. Hallam-Baker, "Naming Things with Hashes", RFC 6920, April 2013. Leonard Informational [Page 12] Internet-Draft SHA URIs September 2015 Appendix A. Hash-Generating Implementations and Case Choice Although the URI schemes in this document are not case-sensitive for the hexadecimal hash component, and permit many arbitrary delimiters, Section 2 recommends generating lowercase with no delimiters. This recommendation balances the need for flexible parsing with the need for consistent output for intuitive inspection. The following implementations emit lowercase hexadecimal by default: shasum (Perl) sha1sum / sha256sum (GNU coreutils) OpenSSL dgst Python sha1().hexdigest() Ruby Bouncy Castle (Java) Node.js Microsoft CryptoAPI applications (including CertUtil and dialogs) [[TODO: find more lowercase emission]] The following implementations emit uppercase hexadecimal by default: Mozilla NSS applications (including Toolkit applications) Apple Mac OS X Keychain Access [[TODO: find more uppercase emission]] Author's Address Sean Leonard Penango, Inc. 5900 Wilshire Boulevard 21st Floor Los Angeles, CA 90036 USA EMail: dev+ietf@seantek.com URI: http://www.penango.com/ Leonard Informational [Page 13]