Network Working Group                                         S. Leonard
Internet-Draft                                             Penango, Inc.
Intended Status: Informational                        September 25, 2015
Expires: March 28, 2016                                                 


                   URI Schemes for SHA-1 and SHA-256
                       draft-seantek-sha-uris-02
                                    
Abstract

   This document registers Uniform Resource Identifier schemes for use
   with certain Secure Hash Algorithm (SHA) functions, namely SHA-1 and
   SHA-256. The purpose is to identify data streams and content in a
   simple, "drop-in" way within the URI infrastructure.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF). Note that other groups may also distribute working
   documents as Internet-Drafts. The list of current Internet-Drafts is
   at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on March 28, 2016.

Copyright Notice

   Copyright (c) 2015 IETF Trust and the persons identified as the
   document authors. All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document. Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

 


Leonard                      Informational                      [Page 1]

Internet-Draft                  SHA URIs                  September 2015


1.  Introduction

   This document registers Uniform Resource Identifier schemes for use
   with certain Secure Hash Algorithm (SHA) functions, namely SHA-1 and
   SHA-256. The purpose is to identify binary data streams in a simple,
   "drop-in" way within the URI infrastructure. This document also
   provides parallel means to identify Internet content or messages.

   Frequently Internet-facing applications need to store or transmit
   identifiers for wide-ranging types of content, including security
   structures (certificates, public keys, authorization tokens),
   executable code, and resource manifests (well-defined data formats
   that serve to structure data streams, which may be significantly
   larger and which are not self-identifiable). These applications
   achieve greater interoperability by using a common syntax for these
   identifiers; using URIs [RFC3986] suits their purposes well. Some of
   the most important properties of URIs are that they are easy to
   recognize by humans, and that they can be created using simple "copy-
   and-paste" operations.

 sha1:2FD4E1C6:7A2D28FC:ED849EE1:BB76E739:1B93EB12;43

 sha1:2FD4E1C6-7A2D28FC~ED849EE1_BB76E739.1B93EB12

 sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

                       Figure 1: Example SHA URIs

1.1.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].

2.  Data Stream Description, Syntax, and Semantics

   The syntax of SHA-1 and SHA-256 URIs consists of the scheme ("sha1:"
   or "sha256:"), followed by the hexadecimal encoding of the hash
   value. Certain delimiters such as space ("%20"), tab ("%09"), newline
   ("%0D" or "%0A"), colon ":", period ".", tilde "~", hyphen "-", and
   underscore "_" are permitted and do not affect equivalence. The
   hexadecimal characters are case-insensitive. In spite of these
   leniencies, this specification RECOMMENDS that generators emit
   lowercase hexadecimal and use no delimiters. The identifier
   identifies any data stream that, when hashed, produces the same hash
   value.

   Optionally, the scheme may be suffixed with ";" and the length of the
 


Leonard                      Informational                      [Page 2]

Internet-Draft                  SHA URIs                  September 2015


   data stream in octets (8-bit units). If the data being identified is
   a bit stream that is not a multiple of 8 bits, the extra suffix "b"
   followed by "1" through "7" is to be appended. When the length suffix
   is present, the identifier identifies any data stream of the
   specified length that, when hashed, produces the same hash value.
   [SHS] is only defined for messages (data streams) less than 2^64 bits
   (2^61 octets). Thus, the largest length production that an
   application would expect to encounter is ";2305843009213693951b7".
   Behaviors beyond this length are undefined.

   There is no way to distinguish between "bit streams" (that are
   multiples of 8 bits) and "octet streams" as categories; this
   ambiguity is intentional because [SHS] does not distinguish between
   them. Furthermore, there is no way to truncate the hash value: a
   parser that receives too few (or too many) hexadecimal digits MUST
   NOT accept the hash value. The URI schemes defined in this document
   are not intended specifically for human readability or speakability.
   They are, however, designed for human "copy-and-paste-ability": the
   ability of a human operator (or automated text-based operator, such
   as a simple script) to feed such URIs into processes that can compare
   them for exact equality. Based on the labeled input, a process can
   generate appropriate representations for human comparison, such as
   visual schemes [VISHASH]. The ABNF [RFC5234] is:

         sha1uri   = "sha1:" ( 40HEXDIG / 40*relaxed ) [ length ]

         sha256uri = "sha256:" ( 64HEXDIG / 64*relaxed ) [ length ]

         ; HEXDIG is from [RFC5234]
         ; in the relaxed productions, there must be exactly
         ; 40 HEXDIG for sha1uri, and 64 HEXDIG for sha256uri

         relaxed   = HEXDIG / ":" / "." / "~" / "-" / "_" /
                     "%20" / "%0D" / "%0A" / "%09"

         length    = ";" wholenum [ bits ]

         wholenum  = "0" / ( %x31-39 *DIGIT )

         ; DIGIT is from [RFC5234]

         bits      = "b" ( %x31-37 )   ; 1 - 7

         ; "b" is not case-sensitive, but lowercase SHOULD be used

                    Figure 2: ABNF of SHA Data URIs

   Other URI schemes such as ni: [RFC6920] include or incorporate
 


Leonard                      Informational                      [Page 3]

Internet-Draft                  SHA URIs                  September 2015


   hashes. Software may be written to convert URIs between such schemes.
   Although the bits of the resource may be equivalent, the URI
   semantics may differ. The SHA-1 and SHA-256 URIs in this document
   identify binary data streams (i.e., an ordered sequence of bits).

3.  Message (Content) Description, Syntax, and Semantics

   The URI schemes of Section 2 identify data streams; by design, they
   lack means to include metadata such as filenames or Internet media
   types. This section defines two additional URI schemes, "sha1msg:"
   and "sha256msg:", to identify content that includes Internet message
   headers.

   The syntax of the message schemes is similar, but not identical, to
   the data schemes. Length is optional; if present, it MUST be an
   integral number of octets. Similar to mid: URIs [RFC2392], specific
   content-parts may be identified by adding a slash "/" and the
   (encoded) Content-ID of the part.

   Implementations MUST generate and parse productions that conform to
   [RFC5322] (Internet message), [RFC6532] (Internationalized Email
   Headers), [RFC3030] (Binary MIME), and [RFC2045], [RFC2046],
   [RFC2047], and [RFC2231] (collectively, MIME). Technically, they are
   octet streams that include Internet message headers, specifically
   MIME-conformant headers. The default Content-Transfer-Encoding is
   "binary" (as the channel is binary-clean) and the default MIME-
   Version is "1.0". These headers MAY be omitted from the production
   unless they vary from these defaults.

   While it would not be inaccurate to label the Content-Type of such a
   production as "message/global", when a SHA Message URI is
   dereferenced, the resource takes on the Content-Type of the Content-
   Type header of the production, along with all other header metadata.
   Any message type (such as HTTP response messages [RFC7230], Netnews
   articles [RFC5536], or email messages [RFC5322]) is therefore
   suitable for more-or-less direct identification by SHA Message URIs.

   Implementers need to note that the encoding of headers is UTF-8
   [RFC6532]. HTTP response messages [RFC7230], however, have
   historically been ISO-8859-1. If an HTTP response message is used as
   the message, characters outside the ASCII range MUST be re-encoded;
   they SHOULD be encoded directly to UTF-8, but MAY be encoded via
   other means, such as [RFC2047]. If the message is intended to convey
   an enclosed HTTP response (as opposed to the content of an HTTP
   response), it is appropriate to label the content with Content-Type:
   message/http or application/http [RFC7230], where the full HTTP
   response--including headers--is the content of the message. In such a
   case, re-encoding SHALL NOT occur.
 


Leonard                      Informational                      [Page 4]

Internet-Draft                  SHA URIs                  September 2015


   SHA Message URIs are similar to mid: and cid: URIs [RFC2392] in that
   they uniquely identify a message or content. However, while Message-
   IDs are assigned to and embedded into the content, SHA hashes are
   intrinsic properties of the message (octet stream). Attempts to embed
   the hash into the message would alter it. Furthermore, the semantics
   of Message-ID are not defined when the top-level Content-Type is not
   "message". In contrast, Content-ID is defined for all types of
   content.

   The ABNF [RFC5234] is:

         ; assumes productions in Figure 1 are defined

         sha1msguri   = "sha1msg:" ( 40HEXDIG / 40*relaxed )
                                   [ msglength ] [ "/" content-id ]

         sha256msguri = "sha256msg:" ( 64HEXDIG / 64*relaxed )
                                     [ msglength ] [ "/" content-id ]

         msglength    = ";" wholenum

         content-id   = id-left "@" id-right

         id-left = pchar-dot-atom-text

         pchar-dot-atom-text = 1*pchar-atext *("." 1*pchar-atext)

         ; omits # % ? ^ ` { | } from RFC 5322 atext
         ; pct-encoded is from RFC 3986
         ; TODO: Argh, RFC 6532 extends atext to include UTF-8!
         pchar-atext = ALPHA / DIGIT / "!" / "$" / "&" / "'" /
                       "*" / "+" / "-" / "/" / "=" / "_" / "~" /
                       pct-encoded

         id-right = pchar-dot-atom-text / pchar-no-fold-literal

         ; does not include @ but this is just because it would
         ; look weird in the production
         ; TODO: Argh, should [ *dtext ] include %5B ... %5D,
         ; or should the dtext be interpreted (and de-quoted-paired)?
         ; TODO: Argh, RFC 6532 extends dtext to include UTF-8!
         pchar-no-fold-literal = unreserved / sub-delims / ":" /
                                 pct-encoded

                   Figure 2: ABNF of SHA Message URIs



 


Leonard                      Informational                      [Page 5]

Internet-Draft                  SHA URIs                  September 2015


4.  Encoding and Interoperability

   SHA-1 and SHA-256 URIs conform to [RFC3986]. The syntaxes described
   in this document specifically conform to the path-rootless production
   of the hier-part production of the URI production of Section 3 of
   [RFC3986].

   The characters representing the binary hash values in such URIs are
   limited to hexadecimal, so no further encoding issues are raised
   based on the identified content. Beware that the characters
   representing a Content-ID in Message URIs reflect UTF-8 encoding
   [RFC6532].

   Future revisions may cover semantics of other URI-conformant
   productions.

5.  Security Considerations

   The basic sha1 and sha256 URI schemes identify data streams, not
   content in the Internet message sense. Supplementary information
   about the data stream is expected to be provided by context.

   If an application designer wishes to affix metadata (such as an
   Internet media type or file modification date) permanently to a data
   stream, the metadata and data stream should be concatenated into some
   format, and hashed. Section 3 provides sha1msg and sha256msg URI
   schemes that identify Internet message content.

   Additional URI schemes are proposed for message content (as opposed
   to using disambiguating parameters in the data URIs to indicate the
   presence of parsable headers) for several reasons:

   1. The data URIs are intended to be very lightweight, with minimal
      room for implementation errors. The data URIs are usable without
      any network access, such as when querying a local data store.
      Parsing and interpreting Internet message headers carries a host
      of security and interoperability ailments that are described in
      the relevant standards and elsewhere; sha1 and sha256 URI
      implementations do not need to worry about these hazards.

   2. While the sha1 and sha256 URI schemes can (and are routinely
      expected to) identify data streams that are canonicalized, the
      sha1msg and sha256msg URI schemes are not designed with
      canonicalization in mind. Internet message headers do not have
      canonical forms.

   Cryptographic hashes are designed to map variable length data streams
   to fixed length outputs, with four additional properties:
 


Leonard                      Informational                      [Page 6]

Internet-Draft                  SHA URIs                  September 2015


   1. Random distribution: A change, addition, or deletion of any bit to
      the input message (data stream) will affect each and every bit of
      the output (hash value) with equal probability.

   2. Preimage resistance: Given a hash value, finding a corresponding
      message is computationally infeasible.

   3. Second-preimage resistance: Given a hash value and a first
      message, finding a second message that has the same hash value is
      computationally infeasible.

   4. Collision resistance: Finding two messages that have the same hash
      value is computationally infeasible.

   As of the time of this document, reduced rounds of SHA-1 have been
   cryptanalyzed [OPTJLOC], prompting the security community to migrate
   to new hash algorithms [RFC6194]. SHA-256 has not yet been
   cryptanalyzed.

   The length qualifier is not intended to provide or augment the basic
   security properties of the SHA-1 or SHA-256 hash algorithms. However,
   an application SHOULD employ the length qualifier when it knows the
   length in advance, because this qualifier constrains the set of
   possible data streams.

   Collisions exist with any hash algorithm. Consider an arbitrary data
   stream of 20 octets and 1 bit (161 bits) and the SHA-1 algorithm. By
   the pigeonhole principle, a collision (where two messages produce the
   same hash value) must exist, because in the best case, each data
   stream of 20 octets (160 bits) maps separately to each one of the
   2^160 possible hash values. Therefore, the 2^160 + 1st message must
   map to one of the hash values corresponding to at least one prior
   message.

   The probability that a collision exists for all data streams of 20
   octets is virtually certain, as well as for data streams appreciably
   less than 20 octets (birthday paradox). However, the probability that
   the particular data stream of interest has the same hash as another
   (malicious) data stream is harder to calculate. It is possible that a
   trivial change to the message will result in the same hash; it is
   also possible that no messages of the same length have the same hash.
   The probability of the latter clearly diminishes, however, as the set
   of candidate messages expands without bound.

   The purpose of the optional length qualifier is not simply to reduce
   the message space, since for all non-trivial messages of n bits the
   message space of 2^n vastly exceeds the collision certainty
   threshold. (For illustrative purposes, the author searched for SHA-1
 


Leonard                      Informational                      [Page 7]

Internet-Draft                  SHA URIs                  September 2015


   collisions in 0-, 1-, 2-, 3-, and 4-octet data streams using a brute
   force algorithm; [none were found] [actually the search is ongoing--
   it looks like it will take 85 days]). Rather, the purpose is to
   detect malicious underflow or overflow conditions before they happen.
   If an attacker is feeding the recipient, the recipient may accept
   data without bounds, waiting for the hash computation to complete. In
   so doing, the attacker will waste resources in addition to
   computation time, which may cause latent security errors (such as low
   memory or disk conditions) to manifest.

   Consider one application of these URI schemes: identifying security-
   related objects such as PKIX certificates [RFC5280]. Although
   [RFC5280] does not limit the size of a certificate, most certificates
   are not appreciably greater than 16 kilobytes, and there are
   protocol-related pressures to make certificates much smaller (such as
   less than 4 kilobytes) to fit in fewer TCP segments. A certificate-
   using application that accepts the URI schemes in this document might
   reduce its attack surface by rejecting URIs of unspecified length
   once the candidate certificate data exceeds a threshold (e.g., 64
   kilobytes); for larger certificates, the application would require
   that the length be specified.

   A secondary purpose of specifying the length is to resist chosen-
   prefix collision attacks [CHOSEN], which are attacks in which an
   attacker picks two separate messages, and then appends different
   values that results in the concatenated messages having the same hash
   value. [CHOSEN] shows that Merkle-Damgard hash functions are
   susceptible to this class of attacks. Chosen-prefix collision attacks
   were successfully used against the MD5 hash algorithm [HCLASH]; both
   SHA-1 and SHA-256 are Merkle-Damgard constructions.

6.  IANA Considerations

   IANA is requested to register the "sha1", "sha256", "sha1msg", and
   "sha256msg" URI schemes in the Uniform Resource Identifier (URI)
   Schemes registry using the templates below, which conform to the June
   2015 URI Scheme Guidelines [RFC7595].

6.1.  Assignment of sha1 URI Scheme

      URI scheme name: sha1

      Status: Permanent

      Applications/protocols that use this URI scheme name:
        General applicability. Some examples include security
        applications and systems, database and forensic lookup
        tools, and distributed peer-to-peer protocols.
 


Leonard                      Informational                      [Page 8]

Internet-Draft                  SHA URIs                  September 2015


      Contact: Sean Leonard <dev+ietf@seantek.com>

      Change controller: IETF

      References: This document.

6.2.  Assignment of sha256 URI Scheme

      URI scheme name: sha256

      Status: Permanent

      Applications/protocols that use this URI scheme name:
        General applicability. Some examples include security
        applications and systems, database and forensic lookup
        tools, and distributed peer-to-peer protocols.

      Contact: Sean Leonard <dev+ietf@seantek.com>

      Change controller: IETF

      References: This document.

6.3.  Assignment of sha1msg URI Scheme

      URI scheme name: sha1msg

      Status: Permanent

      Applications/protocols that use this URI scheme name:
        General applicability. Some examples include security
        applications and systems, database and forensic lookup
        tools, and distributed peer-to-peer protocols.

      Contact: Sean Leonard <dev+ietf@seantek.com>

      Change controller: IETF

      References: This document.

6.4.  Assignment of sha256msg URI Scheme

      URI scheme name: sha256msg

      Status: Permanent

      Applications/protocols that use this URI scheme name:
        General applicability. Some examples include security
 


Leonard                      Informational                      [Page 9]

Internet-Draft                  SHA URIs                  September 2015


        applications and systems, database and forensic lookup
        tools, and distributed peer-to-peer protocols.

      Contact: Sean Leonard <dev+ietf@seantek.com>

      Change controller: IETF

      References: This document.

9.  References

9.1.  Normative References

   [SHS]      National Institute of Standards and Technology, "Secure
              Hash Standard", Federal Information Processing Standard
              (FIPS) 180-4, March 2012, <http://csrc.nist.gov/
              publications/fips/fips180-4/fips-180-4.pdf>.

   [RFC2045]  Freed, N. and N. Borenstein, "Multipurpose Internet Mail
              Extensions (MIME) Part One: Format of Internet Message
              Bodies", RFC 2045, November 1996.

   [RFC2046]  Freed, N. and N. Borenstein, "Multipurpose Internet Mail
              Extensions (MIME) Part Two: Media Types", RFC 2046,
              November 1996.

   [RFC2047]  Moore, K., "MIME (Multipurpose Internet Mail Extensions)
              Part Three: Message Header Extensions for Non-ASCII Text",
              RFC 2047, November 1996.

   [RFC2231]  Freed, N. and K. Moore, "MIME Parameter Value and Encoded
              Word Extensions: Character Sets, Languages, and
              Continuations", RFC 2231, November 1997.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC3030]  Vaudreuil, G., "SMTP Service Extensions for Transmission
              of Large and Binary MIME Messages", RFC 3030, December
              2000.

   [RFC3986]  Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
              Resource Identifier (URI): Generic Syntax", STD 66, RFC
              3986, January 2005.

   [RFC5234]  Crocker, D. and P. Overell, "Augmented BNF for Syntax
              Specifications: ABNF", STD 68, RFC 5234, January 2008.

 


Leonard                      Informational                     [Page 10]

Internet-Draft                  SHA URIs                  September 2015


   [RFC5322]  Resnick, P., Ed., "Internet Message Format", RFC 5322,
              October 2008.

   [RFC5536]  Murchison, K., Ed., Lindsey, C., and D. Kohn, "Netnews
              Article Format", RFC 5536, November 2009.

   [RFC6532]  Yang, A., Steele, S., and N. Freed, "Internationalized
              Email Headers", RFC 6532, February 2012.

   [RFC7230]  Fielding, R., Ed., and J. Reschke, Ed., "Hypertext
              Transfer Protocol (HTTP/1.1): Message Syntax and Routing",
              RFC 7230, June 2014.

   [RFC7595]  Thaler, D., Hansen, T., and T. Hardie, "Guidelines and
              Registration Procedures for URI Schemes", BCP 35, RFC
              7595, June 2015.

9.2.  Informative References

   [CHOSEN]   Stevens, M., Lenstra, A., and B. de Weger, "Chosen-Prefix
              Collisions for MD5 and Applications", International
              Journal of Applied Cryptography Vol. 2, No. 4, 322-359,
              2012,
              <http://www.win.tue.nl/hashclash/ChosenPrefixCollisions/>,
              doi:10.1504/IJACT.2012.048084.

   [HCLASH]   Stevens, M., Lenstra, A., and B. de Weger, "Chosen-Prefix
              Collisions for MD5 and Colliding X.509 Certificates for
              Different Identities", IACR EUROCRYPT 2007, Lecture Notes
              in Computer Science 4515 1-22, 2007,
              <http://www.win.tue.nl/hashclash/ChosenPrefixCollisions/>,
              doi:10.1007/978-3-540-72540-4_1.

   [OPTJLOC]  Stevens, M., "New Collision Attacks on SHA-1 Based on
              Optimal Joint Local-Collision Analysis", EUROCRYPT 2013,
              Lecture Notes in Computer Science 7881 245-261, 2013,
              <http://marc-stevens.nl/research/papers/EC13-S.pdf>,
              doi:10.1007/978-3-642-38348-9_15.

   [VISHASH]  Hsiao, H., Lin, Y., Studer, A., Studer, C., Wang, K.,
              Kikuchi, H., Perrig, A., Sun, H., and B. Yang, "A Study of
              User-Friendly Hash Comparison Schemes," Computer Security
              Applications Conference, 105-114, 2009,
              <http://dl.acm.org/citation.cfm?id=1723224>,
              doi:10.1109/ACSAC.2009.20.

   [RFC2392]  Levinson, E., "Content-ID and Message-ID Uniform Resource
              Locators", RFC 2392, August 1998.
 


Leonard                      Informational                     [Page 11]

Internet-Draft                  SHA URIs                  September 2015


   [RFC5280]  Cooper, D., Santesson, S., Farrell, S., Boeyen, S.,
              Housley, R., and W. Polk, "Internet X.509 Public Key
              Infrastructure Certificate and Certificate Revocation List
              (CRL) Profile", RFC 5280, May 2008.

   [RFC6194]  Polk, T., Chen, L., Turner, S., and P. Hoffman, "Security
              Considerations for the SHA-0 and SHA-1 Message-Digest
              Algorithms", RFC 6194, March 2011.

   [RFC6920]  Farrell, S., Kutscher, D., Dannewitz, C., Ohlman, B.,
              Keranen, A., and P. Hallam-Baker, "Naming Things with
              Hashes", RFC 6920, April 2013.




































 


Leonard                      Informational                     [Page 12]

Internet-Draft                  SHA URIs                  September 2015


Appendix A.  Hash-Generating Implementations and Case Choice

   Although the URI schemes in this document are not case-sensitive for
   the hexadecimal hash component, and permit many arbitrary delimiters,
   Section 2 recommends generating lowercase with no delimiters. This
   recommendation balances the need for flexible parsing with the need
   for consistent output for intuitive inspection.

   The following implementations emit lowercase hexadecimal by default:

      shasum (Perl)
      sha1sum / sha256sum (GNU coreutils)
      OpenSSL dgst
      Python sha1().hexdigest()
      Ruby
      Bouncy Castle (Java)
      Node.js
      Microsoft CryptoAPI applications (including CertUtil and dialogs)
      [[TODO: find more lowercase emission]]

   The following implementations emit uppercase hexadecimal by default:

      Mozilla NSS applications (including Toolkit applications) 
      Apple Mac OS X Keychain Access
      [[TODO: find more uppercase emission]]




Author's Address

   Sean Leonard
   Penango, Inc.
   5900 Wilshire Boulevard
   21st Floor
   Los Angeles, CA  90036
   USA

   EMail: dev+ietf@seantek.com
   URI:   http://www.penango.com/











Leonard                      Informational                     [Page 13]