Network Working Group                                         S. Leonard
Internet-Draft                                             Penango, Inc.
Intended Status: Informational                         September 9, 2015
Expires: March 12, 2016                                                 


                   URI Schemes for SHA-1 and SHA-256
                       draft-seantek-sha-uris-01
                                    
Abstract

   This document registers Uniform Resource Indicator schemes for use
   with certain Secure Hash Algorithm (SHA) functions, namely SHA-1 and
   SHA-256. The purpose is to identify data streams in a simple, "drop-
   in" way within the URI infrastructure.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF). Note that other groups may also distribute working
   documents as Internet-Drafts. The list of current Internet-Drafts is
   at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on March 12, 2016.

Copyright Notice

   Copyright (c) 2015 IETF Trust and the persons identified as the
   document authors. All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document. Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

 


Leonard                      Informational                      [Page 1]

Internet-Draft                  SHA URIs                  September 2015


1.  Introduction

   This document registers Uniform Resource Indicator schemes for use
   with certain Secure Hash Algorithm (SHA) functions, namely SHA-1 and
   SHA-256. The purpose is to identify binary data streams in a simple,
   "drop-in" way within the URI infrastructure.

   Frequently Internet-facing applications need to store or transmit
   identifiers for wide-ranging types of content, including security
   structures (certificates, public keys, authorization tokens),
   executable code, and resource manifests (well-defined data formats
   that serve to structure data streams, which may be significantly
   larger and which are not self-identifiable). These applications
   achieve greater interoperability by using a common syntax for these
   identifiers; using URIs [RFC3986] suits their purposes well. Some of
   the most important properties of URIs are that they are easy to
   recognize by humans, and that they can be created using simple "copy-
   and-paste" operations.

 sha1:2FD4E1C6:7A2D28FC:ED849EE1:BB76E739:1B93EB12;43

 sha1:2FD4E1C6-7A2D28FC~ED849EE1_BB76E739.1B93EB12

 sha256:E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855

                       Figure 1: Example SHA URIs

1.1.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].

2.  Description, Syntax, and Semantics

   The syntax of SHA-1 and SHA-256 URIs consists of the scheme ("sha1:"
   or "sha256:"), followed by the hexadecimal encoding of the hash
   value. Certain delimiters such as space ("%20"), tab ("%09"), newline
   ("%0D" or "%0A"), colon ":", period ".", tilde "~", hyphen "-", and
   underscore "_" are permitted and do not affect equivalence. The
   hexadecimal characters are case-insensitive. In spite of these
   leniencies, this specification RECOMMENDS that generators emit
   uppercase hexadecimal and use no delimiters. The identifier
   identifies any data stream that, when hashed, produces the same hash
   value.

   Optionally, the scheme may be suffixed with ";" and the length of the
   data stream in octets (8-bit units). If the data being identified is
 


Leonard                      Informational                      [Page 2]

Internet-Draft                  SHA URIs                  September 2015


   a bit stream that is not a multiple of 8 bits, the extra suffix "b"
   followed by "1" through "7" is to be appended. When the length suffix
   is present, the identifier identifies any data stream of the
   specified length that, when hashed, produce the same hash value.
   [SHS] is only defined for messages (data streams) less than 2^64 bits
   (2^61 octets). Thus, the largest length production that an
   application would expect to encounter is ";2305843009213693951b7".
   Behaviors beyond this length are undefined.

   There is no way to distinguish between "bit streams" (that are
   multiples of 8 bits) and "octet streams" as categories; this
   ambiguity is intentional because [SHS] does not distinguish between
   them. Furthermore, there is no way to truncate the hash value: a
   parser that receives too few (or too many) hexadecimal digits MUST
   NOT accept the hash value. The URI schemes defined in this document
   are not intended specifically for human readability or speakability.
   They are, however, designed for human "copy-and-paste-ability": the
   ability of a human operator (or automated text-based operator, such
   as a simple script) to feed such URIs into processes that can compare
   them for exact equality. Based on the labeled input, a process can
   generate appropriate representations for human comparison, such as
   visual schemes [VISHASH]. The ABNF [RFC5234] is:

         sha1uri   = "sha1:" ( 40HEXDIG / 40*relaxed ) [ length ]

         sha256uri = "sha256:" ( 64HEXDIG / 64*relaxed ) [ length ]

         ; HEXDIG is from [RFC5234]
         ; in the relaxed productions, there must be exactly
         ; 40 HEXDIG for sha1uri, and 64 HEXDIG for sha256uri

         relaxed   = HEXDIG / ":" / "." / "~" / "-" / "_" /
                     "%20" / "%0D" / "%0A" / "%09"

         length    = ";" wholenum [ bits ]

         wholenum  = "0" / ( %x31-39 *DIGIT )

         ; DIGIT is from [RFC5234]

         bits      = "b" ( %x31-37 )   ; 1 - 7

         ; "b" is not case-sensitive, but lowercase SHOULD be used

                             Figure 2: ABNF

   Other URI schemes such as ni: [RFC6920] include or incorporate
   hashes. Software may be written to convert URIs between such schemes.
 


Leonard                      Informational                      [Page 3]

Internet-Draft                  SHA URIs                  September 2015


   Although the bits of the resource may be equivalent, the URI
   semantics may differ. The SHA-1 and SHA-256 URIs in this document
   identify binary data streams (i.e., an ordered sequence of bits).

3.  Encoding and Interoperability

   SHA-1 and SHA-256 URIs conform to [RFC3986]. The syntaxes described
   in this document specifically conform to the path-rootless production
   of the hier-part production of the URI production of Section 3 of
   [RFC3986]. The characters in such URIs are limited to the hexadecimal
   representation of the binary has values, so no further encoding
   issues are raised based on the identified content. Future revisions
   may cover semantics of other URI-conformant productions.

4.  Security Considerations

   The sha1 and sha256 URI schemes identify data streams, not content in
   the Internet message sense. Supplementary information about the data
   stream is expected to be provided by context. If an application
   designer wishes to affix metadata (such as an Internet media type or
   file modification date) permanently to a data stream, the metadata
   and data stream should be concatenated into some format, and hashed.
   For example, the data stream can be an Internet message of some kind,
   such as [RFC5322], [RFC6532], or [RFC7230].

   Cryptographic hashes are designed to map variable length data streams
   to fixed length outputs, with four additional properties:

   1. Random distribution: A change, addition, or deletion of any bit to
      the input message (data stream) will affect each and every bit of
      the output (hash value) with equal probability.

   2. Preimage resistance: Given a hash value, finding a corresponding
      message is computationally infeasible.

   3. Second-preimage resistance: Given a hash value and a first
      message, finding a second message that has the same hash value is
      computationally infeasible.

   4. Collision resistance: Finding two messages that have the same hash
      value is computationally infeasible.

   As of the time of this document, reduced rounds of SHA-1 have been
   cryptanalyzed [OPTJLOC], prompting the security community to migrate
   to new hash algorithms [RFC6194]. SHA-256 has not yet been
   cryptanalyzed.

   The length qualifier is not intended to provide or augment the basic
 


Leonard                      Informational                      [Page 4]

Internet-Draft                  SHA URIs                  September 2015


   security properties of the SHA-1 or SHA-256 hash algorithms. However,
   an application SHOULD employ the length qualifier when it knows the
   length in advance, because this qualifier constrains the set of
   possible data streams.

   Collisions exist with any hash algorithm. Consider an arbitrary data
   stream of 20 octets and 1 bit (161 bits) and the SHA-1 algorithm. By
   the pigeonhole principle, a collision (where two messages produce the
   same hash value) must exist, because in the best case, each data
   stream of 20 octets (160 bits) maps separately to each one of the
   2^160 possible hash values. Therefore, the 2^160 + 1st message must
   map to one of the hash values corresponding to at least one prior
   message.

   The probability that a collision exists for all data streams of 20
   octets is virtually certain, as well as for data streams appreciably
   less than 20 octets (birthday paradox). However, the probability that
   the particular data stream of interest has the same hash as another
   (malicious) data stream is harder to calculate. It is possible that a
   trivial change to the message will result in the same hash; it is
   also possible that no messages of the same length have the same hash.
   The probability of the latter clearly diminishes, however, as the set
   of candidate messages expands without bound.

   The purpose of the optional length qualifier is not simply to reduce
   the message space, since for all non-trivial messages of n bits the
   message space of 2^n vastly exceeds the collision certainty
   threshold. (For illustrative purposes, the author searched for SHA-1
   collisions in 0-, 1-, 2-, 3-, and 4-octet data streams using a brute
   force algorithm; [none were found] [actually the search is ongoing--
   it looks like it will take 85 days]). Rather, the purpose is to
   detect malicious underflow or overflow conditions before they happen.
   If an attacker is feeding the recipient; a recipient may accept data
   without bounds, waiting for the hash computation to complete. In so
   doing, the attacker will waste resources in addition to computation
   time, which may cause latent security errors (such as low memory or
   disk conditions) to manifest.

   Consider one application of these URI schemes: identifying security-
   related objects such as PKIX certificates [RFC5280]. Although
   [RFC5280] does not limit the size of a certificate, most certificates
   are not appreciably greater than 16 kilobytes, and there are
   protocol-related pressures to make certificates much smaller (such as
   less than 4 kilobytes) to fit in fewer TCP segments. A certificate-
   using application that accepts the URI schemes in this document might
   reduce its attack surface by rejecting URIs of unspecified length
   once the candidate certificate data exceeds a threshold (e.g., 64
   kilobytes); for larger certificates, the application would require
 


Leonard                      Informational                      [Page 5]

Internet-Draft                  SHA URIs                  September 2015


   that the length be specified.

   A secondary purpose of specifying the length is to resist chosen-
   prefix collision attacks [CHOSEN], which are attacks in which an
   attacker picks two separate messages, and then appends different
   values that results in the the concatenated messages having the same
   hash value. [CHOSEN] shows that Merkle-Damgard hash functions are
   susceptible to this class of attacks. Chosen-prefix collision attacks
   were successfully used against the MD5 hash algorithm [HCLASH]; both
   SHA-1 and SHA-256 are Merkle-Damgard constructions.

4.  IANA Considerations

   IANA is requested to register the "sha1" and "sha256" URI schemes in
   the Uniform Resource Identifier (URI) Schemes registry using the
   templates below, which conform to the June 2015 URI Scheme Guidelines
   [RFC7595].

4.1.  Assignment of sha1 URI Scheme

      URI scheme name: sha1

      Status: Permanent

      Applications/protocols that use this URI scheme name:
        General applicability. Some examples include security
        applications and systems, database and forensic lookup
        tools, and distributed peer-to-peer protocols.

      Contact: Sean Leonard <dev+ietf@seantek.com>

      Change controller: IETF

      References: This document.

4.2.  Assignment of sha256 URI Scheme

      URI scheme name: sha256

      Status: Permanent

      Applications/protocols that use this URI scheme name:
        General applicability. Some examples include security
        applications and systems, database and forensic lookup
        tools, and distributed peer-to-peer protocols.

      Contact: Sean Leonard <dev+ietf@seantek.com>

 


Leonard                      Informational                      [Page 6]

Internet-Draft                  SHA URIs                  September 2015


      Change controller: IETF

      References: This document.

9.  References

9.1.  Normative References

   [SHS]      National Institute of Standards and Technology, "Secure
              Hash Standard", Federal Information Processing Standard
              (FIPS) 180-4, March 2012, <http://csrc.nist.gov/
              publications/fips/fips180-4/fips-180-4.pdf>.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC3986]  Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
              Resource Identifier (URI): Generic Syntax", STD 66, RFC
              3986, January 2005.

   [RFC5234]  Crocker, D. and P. Overell, "Augmented BNF for Syntax
              Specifications: ABNF", STD 68, RFC 5234, January 2008.

   [RFC7595]  Thaler, D., Hansen, T., and T. Hardie, "Guidelines and
              Registration Procedures for URI Schemes", BCP 35, RFC
              7595, June 2015.

9.2.  Informative References

   [CHOSEN]   Stevens, M., Lenstra, A., and B. de Weger, "Chosen-Prefix
              Collisions for MD5 and Applications", International
              Journal of Applied Cryptography Vol. 2, No. 4, 322-359,
              2012,
              <http://www.win.tue.nl/hashclash/ChosenPrefixCollisions/>,
              doi:10.1504/IJACT.2012.048084.

   [HCLASH]   Stevens, M., Lenstra, A., and B. de Weger, "Chosen-Prefix
              Collisions for MD5 and Colliding X.509 Certificates for
              Different Identities", IACR EUROCRYPT 2007, Lecture Notes
              in Computer Science 4515 1-22, 2007,
              <http://www.win.tue.nl/hashclash/ChosenPrefixCollisions/>,
              doi:10.1007/978-3-540-72540-4_1.

   [OPTJLOC]  Stevens, M., "New Collision Attacks on SHA-1 Based on
              Optimal Joint Local-Collision Analysis", EUROCRYPT 2013,
              Lecture Notes in Computer Science 7881 245-261, 2013,
              <http://marc-stevens.nl/research/papers/EC13-S.pdf>,
              doi:10.1007/978-3-642-38348-9_15.
 


Leonard                      Informational                      [Page 7]

Internet-Draft                  SHA URIs                  September 2015


   [VISHASH]  Hsiao, H., Lin, Y., Studer, A., Studer, C., Wang, K.,
              Kikuchi, H., Perrig, A., Sun, H., and B. Yang, "A Study of
              User-Friendly Hash Comparison Schemes," Computer Security
              Applications Conference, 105-114, 2009,
              <http://dl.acm.org/citation.cfm?id=1723224>,
              doi:10.1109/ACSAC.2009.20.

   [RFC5280]  Cooper, D., Santesson, S., Farrell, S., Boeyen, S.,
              Housley, R., and W. Polk, "Internet X.509 Public Key
              Infrastructure Certificate and Certificate Revocation List
              (CRL) Profile", RFC 5280, May 2008.

   [RFC6194]  Polk, T., Chen, L., Turner, S., and P. Hoffman, "Security
              Considerations for the SHA-0 and SHA-1 Message-Digest
              Algorithms", RFC 6194, March 2011.

   [RFC6920]  Farrell, S., Kutscher, D., Dannewitz, C., Ohlman, B.,
              Keranen, A., and P. Hallam-Baker, "Naming Things with
              Hashes", RFC 6920, April 2013.

Author's Address

   Sean Leonard
   Penango, Inc.
   5900 Wilshire Boulevard
   21st Floor
   Los Angeles, CA  90036
   USA

   EMail: dev+ietf@seantek.com
   URI:   http://www.penango.com/




















Leonard                      Informational                      [Page 8]