Network Working Group S. Leonard Internet-Draft Penango, Inc. Intended Status: Informational September 9, 2015 Expires: March 12, 2016 URI Schemes for SHA-1 and SHA-256 draft-seantek-sha-uris-01 Abstract This document registers Uniform Resource Indicator schemes for use with certain Secure Hash Algorithm (SHA) functions, namely SHA-1 and SHA-256. The purpose is to identify data streams in a simple, "drop- in" way within the URI infrastructure. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on March 12, 2016. Copyright Notice Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Leonard Informational [Page 1] Internet-Draft SHA URIs September 2015 1. Introduction This document registers Uniform Resource Indicator schemes for use with certain Secure Hash Algorithm (SHA) functions, namely SHA-1 and SHA-256. The purpose is to identify binary data streams in a simple, "drop-in" way within the URI infrastructure. Frequently Internet-facing applications need to store or transmit identifiers for wide-ranging types of content, including security structures (certificates, public keys, authorization tokens), executable code, and resource manifests (well-defined data formats that serve to structure data streams, which may be significantly larger and which are not self-identifiable). These applications achieve greater interoperability by using a common syntax for these identifiers; using URIs [RFC3986] suits their purposes well. Some of the most important properties of URIs are that they are easy to recognize by humans, and that they can be created using simple "copy- and-paste" operations. sha1:2FD4E1C6:7A2D28FC:ED849EE1:BB76E739:1B93EB12;43 sha1:2FD4E1C6-7A2D28FC~ED849EE1_BB76E739.1B93EB12 sha256:E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855 Figure 1: Example SHA URIs 1.1. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. 2. Description, Syntax, and Semantics The syntax of SHA-1 and SHA-256 URIs consists of the scheme ("sha1:" or "sha256:"), followed by the hexadecimal encoding of the hash value. Certain delimiters such as space ("%20"), tab ("%09"), newline ("%0D" or "%0A"), colon ":", period ".", tilde "~", hyphen "-", and underscore "_" are permitted and do not affect equivalence. The hexadecimal characters are case-insensitive. In spite of these leniencies, this specification RECOMMENDS that generators emit uppercase hexadecimal and use no delimiters. The identifier identifies any data stream that, when hashed, produces the same hash value. Optionally, the scheme may be suffixed with ";" and the length of the data stream in octets (8-bit units). If the data being identified is Leonard Informational [Page 2] Internet-Draft SHA URIs September 2015 a bit stream that is not a multiple of 8 bits, the extra suffix "b" followed by "1" through "7" is to be appended. When the length suffix is present, the identifier identifies any data stream of the specified length that, when hashed, produce the same hash value. [SHS] is only defined for messages (data streams) less than 2^64 bits (2^61 octets). Thus, the largest length production that an application would expect to encounter is ";2305843009213693951b7". Behaviors beyond this length are undefined. There is no way to distinguish between "bit streams" (that are multiples of 8 bits) and "octet streams" as categories; this ambiguity is intentional because [SHS] does not distinguish between them. Furthermore, there is no way to truncate the hash value: a parser that receives too few (or too many) hexadecimal digits MUST NOT accept the hash value. The URI schemes defined in this document are not intended specifically for human readability or speakability. They are, however, designed for human "copy-and-paste-ability": the ability of a human operator (or automated text-based operator, such as a simple script) to feed such URIs into processes that can compare them for exact equality. Based on the labeled input, a process can generate appropriate representations for human comparison, such as visual schemes [VISHASH]. The ABNF [RFC5234] is: sha1uri = "sha1:" ( 40HEXDIG / 40*relaxed ) [ length ] sha256uri = "sha256:" ( 64HEXDIG / 64*relaxed ) [ length ] ; HEXDIG is from [RFC5234] ; in the relaxed productions, there must be exactly ; 40 HEXDIG for sha1uri, and 64 HEXDIG for sha256uri relaxed = HEXDIG / ":" / "." / "~" / "-" / "_" / "%20" / "%0D" / "%0A" / "%09" length = ";" wholenum [ bits ] wholenum = "0" / ( %x31-39 *DIGIT ) ; DIGIT is from [RFC5234] bits = "b" ( %x31-37 ) ; 1 - 7 ; "b" is not case-sensitive, but lowercase SHOULD be used Figure 2: ABNF Other URI schemes such as ni: [RFC6920] include or incorporate hashes. Software may be written to convert URIs between such schemes. Leonard Informational [Page 3] Internet-Draft SHA URIs September 2015 Although the bits of the resource may be equivalent, the URI semantics may differ. The SHA-1 and SHA-256 URIs in this document identify binary data streams (i.e., an ordered sequence of bits). 3. Encoding and Interoperability SHA-1 and SHA-256 URIs conform to [RFC3986]. The syntaxes described in this document specifically conform to the path-rootless production of the hier-part production of the URI production of Section 3 of [RFC3986]. The characters in such URIs are limited to the hexadecimal representation of the binary has values, so no further encoding issues are raised based on the identified content. Future revisions may cover semantics of other URI-conformant productions. 4. Security Considerations The sha1 and sha256 URI schemes identify data streams, not content in the Internet message sense. Supplementary information about the data stream is expected to be provided by context. If an application designer wishes to affix metadata (such as an Internet media type or file modification date) permanently to a data stream, the metadata and data stream should be concatenated into some format, and hashed. For example, the data stream can be an Internet message of some kind, such as [RFC5322], [RFC6532], or [RFC7230]. Cryptographic hashes are designed to map variable length data streams to fixed length outputs, with four additional properties: 1. Random distribution: A change, addition, or deletion of any bit to the input message (data stream) will affect each and every bit of the output (hash value) with equal probability. 2. Preimage resistance: Given a hash value, finding a corresponding message is computationally infeasible. 3. Second-preimage resistance: Given a hash value and a first message, finding a second message that has the same hash value is computationally infeasible. 4. Collision resistance: Finding two messages that have the same hash value is computationally infeasible. As of the time of this document, reduced rounds of SHA-1 have been cryptanalyzed [OPTJLOC], prompting the security community to migrate to new hash algorithms [RFC6194]. SHA-256 has not yet been cryptanalyzed. The length qualifier is not intended to provide or augment the basic Leonard Informational [Page 4] Internet-Draft SHA URIs September 2015 security properties of the SHA-1 or SHA-256 hash algorithms. However, an application SHOULD employ the length qualifier when it knows the length in advance, because this qualifier constrains the set of possible data streams. Collisions exist with any hash algorithm. Consider an arbitrary data stream of 20 octets and 1 bit (161 bits) and the SHA-1 algorithm. By the pigeonhole principle, a collision (where two messages produce the same hash value) must exist, because in the best case, each data stream of 20 octets (160 bits) maps separately to each one of the 2^160 possible hash values. Therefore, the 2^160 + 1st message must map to one of the hash values corresponding to at least one prior message. The probability that a collision exists for all data streams of 20 octets is virtually certain, as well as for data streams appreciably less than 20 octets (birthday paradox). However, the probability that the particular data stream of interest has the same hash as another (malicious) data stream is harder to calculate. It is possible that a trivial change to the message will result in the same hash; it is also possible that no messages of the same length have the same hash. The probability of the latter clearly diminishes, however, as the set of candidate messages expands without bound. The purpose of the optional length qualifier is not simply to reduce the message space, since for all non-trivial messages of n bits the message space of 2^n vastly exceeds the collision certainty threshold. (For illustrative purposes, the author searched for SHA-1 collisions in 0-, 1-, 2-, 3-, and 4-octet data streams using a brute force algorithm; [none were found] [actually the search is ongoing-- it looks like it will take 85 days]). Rather, the purpose is to detect malicious underflow or overflow conditions before they happen. If an attacker is feeding the recipient; a recipient may accept data without bounds, waiting for the hash computation to complete. In so doing, the attacker will waste resources in addition to computation time, which may cause latent security errors (such as low memory or disk conditions) to manifest. Consider one application of these URI schemes: identifying security- related objects such as PKIX certificates [RFC5280]. Although [RFC5280] does not limit the size of a certificate, most certificates are not appreciably greater than 16 kilobytes, and there are protocol-related pressures to make certificates much smaller (such as less than 4 kilobytes) to fit in fewer TCP segments. A certificate- using application that accepts the URI schemes in this document might reduce its attack surface by rejecting URIs of unspecified length once the candidate certificate data exceeds a threshold (e.g., 64 kilobytes); for larger certificates, the application would require Leonard Informational [Page 5] Internet-Draft SHA URIs September 2015 that the length be specified. A secondary purpose of specifying the length is to resist chosen- prefix collision attacks [CHOSEN], which are attacks in which an attacker picks two separate messages, and then appends different values that results in the the concatenated messages having the same hash value. [CHOSEN] shows that Merkle-Damgard hash functions are susceptible to this class of attacks. Chosen-prefix collision attacks were successfully used against the MD5 hash algorithm [HCLASH]; both SHA-1 and SHA-256 are Merkle-Damgard constructions. 4. IANA Considerations IANA is requested to register the "sha1" and "sha256" URI schemes in the Uniform Resource Identifier (URI) Schemes registry using the templates below, which conform to the June 2015 URI Scheme Guidelines [RFC7595]. 4.1. Assignment of sha1 URI Scheme URI scheme name: sha1 Status: Permanent Applications/protocols that use this URI scheme name: General applicability. Some examples include security applications and systems, database and forensic lookup tools, and distributed peer-to-peer protocols. Contact: Sean Leonard Change controller: IETF References: This document. 4.2. Assignment of sha256 URI Scheme URI scheme name: sha256 Status: Permanent Applications/protocols that use this URI scheme name: General applicability. Some examples include security applications and systems, database and forensic lookup tools, and distributed peer-to-peer protocols. Contact: Sean Leonard Leonard Informational [Page 6] Internet-Draft SHA URIs September 2015 Change controller: IETF References: This document. 9. References 9.1. Normative References [SHS] National Institute of Standards and Technology, "Secure Hash Standard", Federal Information Processing Standard (FIPS) 180-4, March 2012, . [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, January 2005. [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", STD 68, RFC 5234, January 2008. [RFC7595] Thaler, D., Hansen, T., and T. Hardie, "Guidelines and Registration Procedures for URI Schemes", BCP 35, RFC 7595, June 2015. 9.2. Informative References [CHOSEN] Stevens, M., Lenstra, A., and B. de Weger, "Chosen-Prefix Collisions for MD5 and Applications", International Journal of Applied Cryptography Vol. 2, No. 4, 322-359, 2012, , doi:10.1504/IJACT.2012.048084. [HCLASH] Stevens, M., Lenstra, A., and B. de Weger, "Chosen-Prefix Collisions for MD5 and Colliding X.509 Certificates for Different Identities", IACR EUROCRYPT 2007, Lecture Notes in Computer Science 4515 1-22, 2007, , doi:10.1007/978-3-540-72540-4_1. [OPTJLOC] Stevens, M., "New Collision Attacks on SHA-1 Based on Optimal Joint Local-Collision Analysis", EUROCRYPT 2013, Lecture Notes in Computer Science 7881 245-261, 2013, , doi:10.1007/978-3-642-38348-9_15. Leonard Informational [Page 7] Internet-Draft SHA URIs September 2015 [VISHASH] Hsiao, H., Lin, Y., Studer, A., Studer, C., Wang, K., Kikuchi, H., Perrig, A., Sun, H., and B. Yang, "A Study of User-Friendly Hash Comparison Schemes," Computer Security Applications Conference, 105-114, 2009, , doi:10.1109/ACSAC.2009.20. [RFC5280] Cooper, D., Santesson, S., Farrell, S., Boeyen, S., Housley, R., and W. Polk, "Internet X.509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile", RFC 5280, May 2008. [RFC6194] Polk, T., Chen, L., Turner, S., and P. Hoffman, "Security Considerations for the SHA-0 and SHA-1 Message-Digest Algorithms", RFC 6194, March 2011. [RFC6920] Farrell, S., Kutscher, D., Dannewitz, C., Ohlman, B., Keranen, A., and P. Hallam-Baker, "Naming Things with Hashes", RFC 6920, April 2013. Author's Address Sean Leonard Penango, Inc. 5900 Wilshire Boulevard 21st Floor Los Angeles, CA 90036 USA EMail: dev+ietf@seantek.com URI: http://www.penango.com/ Leonard Informational [Page 8]