Internet DRAFT - draft-hallambaker-compressedcrlset

draft-hallambaker-compressedcrlset



Internet Engineering Task Force (IETF)              Phillip Hallam-Baker
Internet-Draft                                         Comodo Group Inc.
Intended Status: Standards Track                           Rob Stradling
Expires: April 10, 2015                                   Comodo CA Ltd.
                                                         October 7, 2014


                          Compressed CRL Sets
                 draft-hallambaker-compressedcrlset-00

Abstract

   An efficient encoding for Certificate Revocation Lists is described 
   based on a Disjoint Set encoding technique. When applied to the 
   population of currently issued certificates enroled in the 
   Certificate Transparency Notary network, the data set was reduced by 
   97% without introduction of false positives or false negatives. 

Status of This Memo

   This Internet-Draft is submitted in full conformance with the 
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering 
   Task Force (IETF).  Note that other groups may also distribute 
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any 
   time.  It is inappropriate to use Internet-Drafts as reference 
   material or to cite them other than as "work in progress."

Copyright Notice

   Copyright (c) 2014 IETF Trust and the persons identified as the 
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal 
   Provisions Relating to IETF Documents 
   (http://trustee.ietf.org/license-info) in effect on the date of 
   publication of this document. Please review these documents 
   carefully, as they describe your rights and restrictions with respect
   to this document. Code Components extracted from this document must 
   include Simplified BSD License text as described in Section 4.e of 
   the Trust Legal Provisions and are provided without warranty as 
   described in the Simplified BSD License.









Hallam-Baker                 April 10, 2015                     [Page 1]

Internet-Draft             Compressed CRLSets               October 2014

Table of Contents

   1.  Definitions  . . . . . . . . . . . . . . . . . . . . . . . . .  3
      1.1.  Requirements Language . . . . . . . . . . . . . . . . . .  3
   2.  The Revocation Problem . . . . . . . . . . . . . . . . . . . .  3
      2.1.  PKIX Certificate Status Mechanisms  . . . . . . . . . . .  3
      2.2.  Short Lived Certificates  . . . . . . . . . . . . . . . .  4
      2.3.  CRL Sets  . . . . . . . . . . . . . . . . . . . . . . . .  4
      2.4.  Compressed CRL Sets . . . . . . . . . . . . . . . . . . .  5
   3.  Efficient Encoding of Disjoint Sets  . . . . . . . . . . . . .  5
      3.1.  Problem Statement . . . . . . . . . . . . . . . . . . . .  5
      3.2.  Standard Hash Function. . . . . . . . . . . . . . . . . .  6
      3.3.  Parameterized Hash Function.  . . . . . . . . . . . . . .  6
      3.4.  Shortest Discrimination Prefix  . . . . . . . . . . . . .  6
         3.4.1.  Sorted List of Hashes  . . . . . . . . . . . . . . .  7
         3.4.2.  Difference Encoding  . . . . . . . . . . . . . . . .  7
         3.4.3.  Further optimizations. . . . . . . . . . . . . . . .  7
   4.  Compressed CRL Set Encoding  . . . . . . . . . . . . . . . . .  8
   5.  Experimental Results . . . . . . . . . . . . . . . . . . . . .  8
   6.  References . . . . . . . . . . . . . . . . . . . . . . . . . .  8
      6.1.  Normative References  . . . . . . . . . . . . . . . . . .  8
      6.2.  Informative References  . . . . . . . . . . . . . . . . .  8
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . .  8































Hallam-Baker                 April 10, 2015                     [Page 2]

Internet-Draft             Compressed CRLSets               October 2014

1. Definitions

1.1. Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 
   document are to be interpreted as described in [RFC2119]. 

2. The Revocation Problem

   Certificate revocation is a critical element of the Web PKI design. A
   certificate issuer may revoke a certificate for many reasons 
   including:

      *  The certificate was issued in error or due to a CA breach.

      *  The certificate holder has breached some condition set by the 
         issuer.

      *  The certificate is being used for or in connection with illegal
         purposes.

      *  The certificate holder has requested the certificate be 
         revoked.

      *  The private key corresponding to the public key in the 
         certificate has been compromised.

   While certificates are occasionally issued in error or due to an 
   issuer breach, these cases are very rare. Impersonating another 
   company is much more difficult than stealing a private key from a 
   legitimate company. The ability to revoke certificates is therefore a
   critical element in the WebPKI risk mitigation strategy. 

   At present, there are approximately TBS certificates unexpired 
   certificates issued in the WebPKI of which TBS (approximately 10%) 
   have been revoked. 

2.1. PKIX Certificate Status Mechanisms

   PKIX [RFC5280] describes two mechanisms for communicating certificate
   status.

      *  Certificate Revocation Lists (CRLs) list the serial numbers of 
         certificates that a CA has issued that are not currently valid.

      *  Online Certificate Status Protocol (OCSP) describes an online 
         service that a relying party can query to request the current 
         status of a certificate.





Hallam-Baker                 April 10, 2015                     [Page 3]

Internet-Draft             Compressed CRLSets               October 2014

   The scaling problems inherent in the use of CRLs quickly became 
   apparent as the size of the WebPKI grew. The length of the CRLs 
   issued by the large CAs quickly grew to several Mb. While techniques 
   such as delta encoding and partitioning had the protential to provide
   temporary relief, the scaling issue was only mitigated, not solved.

   OCSP [RFC6960] provides a scalable solution but in its original 
   conception for TLS introduces a new party to every communication. 
   Each time a client attempts to establish a TLS connection it looks 
   for an unexpired OCSP token to establish the validity of the 
   certificate presented. If no unexpired cached token is available, a 
   fresh one is obtained from the OCSP distribution. This introduces a 
   risk of traffic analysis by either the OCSP service provider itself 
   or any party that can view the traffic to the OCSP service.

   OCSP Stapling [RFC6066] was introduced to the TLS protocol as a 
   mechanism to avoid the need to obtain OCSP tokens from a third party 
   by allowing them to be passed in band in the TLS conversation.

2.2. Short Lived Certificates

   While OCSP stapling removes the privacy, reliability and performance 
   concerns of the original OCSP design, the certificate issuer is still
   required to issue the OCSP token and the server using the certificate
   is still required to fetch it. Since issuing an OCSP token on a daily
   basis is tantamount to issuing a new certificate, why not reissue the
   certificate on a daily basis and do away with the OCSP token 
   entirely?

   In principle, use of short lived certificates is simpler for all the 
   parties involved: CAs, subjects and relying parties. But before short
   lived certificates can be used in practice, new infrastructure is 
   required to support their use. In particular client software has to 
   be rewritten so that it recognizes a certificate with a short 
   validity interval (e.g. 36 hours) as carrying an implicit assertion 
   of validity and a mechanism by which the server obtains fresh 
   certificates must be specified and implemented. For work on the 
   latter problem, see [I-D.hallambaker-omnipublish]. 

2.3. CRL Sets

   CRL Sets are curated CRLs, typically issued by an application 
   provider that list the revoked certificates regarded to pose the 
   greatest concern of abuse.

   Use of CRL Sets has the advantage that they may allow an application 
   provider to react to a security incident more quickly than the 
   issuing CA. But they only provide revocation information for a 
   limited number of certificates. 





Hallam-Baker                 April 10, 2015                     [Page 4]

Internet-Draft             Compressed CRLSets               October 2014

   As with traditional CRLs, various techniques such as delta encoding 
   may be used to limit the size of individual CRLs but these do not 
   reduce the total volume of data that must be exchanged. 

2.4. Compressed CRL Sets

   Compressed CRL Sets are an alternative encoding for CRLs that permit 
   very much higher encoding density than traditional CRLs. Rather than 
   encoding the set of revoked certificates, Compressed CRLs encode the 
   difference between the set of valid unexpired certificates and the 
   set of invalid unexpired certificates. This allows the average number
   of bits required to encode the set to be reduced from 96 bits per 
   certificate (using the certificate serial number) to approximately 6.

   While Compressed CRL Sets currently offer a practical answer to the 
   problem of revocation in the WebPKI, short lived certificates 
   actually scale better. 

   Although Compressed CRL Sets provide much better scaling properties 
   than traditional CRLs, the size of the CRL Set still increases 
   linearly with the product of the number of revoked certificates and 
   the log of the revocation ratio. Thus while use of Compressed CRL 
   Sets will remain practical as long as the growth in the number of 
   WebPKI certificates issued grows at a slower rate than additional 
   communications bandwidth becomes available, this cannot be 
   guaranteed. 

   Short lived certificates in contrast do not require any additional 
   revocation data and may be regarded as offering perfect scaling. The 
   chief drawback to relying on short lived certificates is that the 
   minimum practical validity interval is 48 hours which is also the 
   time period in which the vast majority of the harm caused in a 
   criminal attack occurs. Also deployment of short lived certificates 
   relies on new certificate lifecycle management infrastructure that is
   not yet developed. 

   Use of Compressed CRLs in combination with Short Lived certificates 
   offers advantages over both. In the short term Compressed CRLs 
   provide a mechanism for reporting status on the population of long 
   expiry certificates until the infrastructure is deployed to enable 
   use of the short lived approach. In the long term, Compressed CRLs 
   provide an efficient mechanism to narrow the residual vulnerability 
   window from days to minutes. 

3. Efficient Encoding of Disjoint Sets

   CRL Sets are compressed using a selected variant of the approach 
   described in [[HBS2014 - To be specified]]. 






Hallam-Baker                 April 10, 2015                     [Page 5]

Internet-Draft             Compressed CRLSets               October 2014

3.1. Problem Statement

   Given two disjoint sets of data entries A and B in which |A| >= |B|, 
   provide a boolean function f (x) that is efficient in both memory and
   time such that:

      *  f(x) is true if x is a member of A

      *  f(x) is false if x is a member of B

      *  f(x) can return true or false otherwise

   For ease of comparison we will consider an example in which set A has
   10 million members and set B has 1 million. This corresponds to a 
   revocation use case in which there are 11 million unexpired 
   certificates and 10% have been revoked. 

   Note that while it is in theory possible for the revocation ratio to 
   exceed 50% and thus for |A| < |B|, this case is easily handled with a
   binary flag to indicate that Set A represents the revoked 
   certificates and B those that are valid. 

3.2. Standard Hash Function.

   One approach to this problem is to first reduce the data entries 
   using a hash function and compile a list of the hash values 
   corresponding to the entries of set B. This has the advantage of 
   simplicity but the corresponding data set is very large. For the 
   example case, the implementation of f(x) requires 32MB of data if a 
   256 bit hash is used. 

3.3. Parameterized Hash Function.

   While using a 256 bit hash leaves an infintesimal probability of 
   collisions, the same is true of 128 bit or even shorter hashes. If a 
   parameterized hash function (e.g. MAC-SHA1) is used we can repeat the
   hash construction process multiple times with different parameters 
   and check to see the shortest truncation that allows the sets to be 
   distinguished. This allows the example case to be addressed using 
   only 6MB of data. 

3.4. Shortest Discrimination Prefix

   The parameterized hash function encodes enough information to 
   distinguish any member of Set A from any member of set B, but this is
   more information than is necessary. It is sufficient to distinguish a
   member of set B from the two members of set A that are adjacent to 
   it.






Hallam-Baker                 April 10, 2015                     [Page 6]

Internet-Draft             Compressed CRLSets               October 2014

   We begin by calculating a hash value H(x) of each member of A and B 
   using a hash function that returns a unique value for each entry to 
   create the corresponding hash sets AH, BH. 

   The hash function used for this purpose does not need to meet any 
   special cryptographic properties such as first or second pre-image 
   resistance. 

   Let P(x,y) be a function of two bit strings, x, y such that P(x,y) is
   true if and only if y is a prefix of x. That if y is n bits long, the
   first n bits of x and y are identical. 

   Let M(b, AH) be a function that returns the shortest prefix of b that
   is not a prefix of any member of AH. 

   We apply the function M to each member of BH and remove duplicates to
   create the set BP. 

   The function f(A,B) can now be implemented as follows: f(x) = false 
   if there is a member p of BP such that p is a prefix of the hash 
   value of x (i/e/ P(H(x),p) is true)f(x) = true otherwise

   Note that the value of f(x) returned if x is not a member of A or B 
   is undefined. 

   A space and time efficient implementation of the function f(x) is 
   arrived at by an efficient encoding of the set BP. 

3.4.1. Sorted List of Hashes

   The set BP may be encoded as a simple list. 

   Sorting the list permits efficient access using standard techniques 
   applicable to searching a hash table. 

   For the example case, the average shortest prefix is 24?? bits giving
   a total data set size of 3Mb. 

3.4.2. Difference Encoding

   In a sorted list of hashes, adjacent hashes will typically only vary 
   in the last few bits. Since it is only the difference between the 
   hashes that contains information, we can encode this information 
   without loss of information overall. 

   For the example case, the average number of bits required for a 
   difference encoding is 6 bits and the total data volume required is 
   750KB. 






Hallam-Baker                 April 10, 2015                     [Page 7]

Internet-Draft             Compressed CRLSets               October 2014

3.4.3. Further optimizations.

   Further optimizations have been developed permitting an additional 
   reduction in the data volume by half over the difference technique. 

4. Compressed CRL Set Encoding

   TBS specify an encoding of CRL data based on the above.

   Following the general trend for JSON and the need for compactness, 
   the use of JSON-B is ideal.

5. Experimental Results

   TBS here extract the lab data from the prototype implementations 
   using the encoding described above.

6. References

6.1. Normative References

   [I-D.hallambaker-omnipublish]  Hallam-Baker, P, "OmniBroker 
              Publication Protocol", Internet-Draft draft-hallambaker-
              omnipublish-00, 22 May 2014.

   [RFC6960]  Santesson, S.,Myers, M.,Ankney, R.,Malpani, A.,Galperin, 
              S.,Adams, C., "X.509 Internet Public Key Infrastructure 
              Online Certificate Status Protocol - OCSP", RFC 6960, June
              2013.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate 
              Requirement Levels", BCP 14, RFC 2119, March 1997.

6.2. Informative References

   [RFC6066]  Eastlake, D., "Transport Layer Security (TLS) Extensions: 
              Extension Definitions", RFC 6066, January 2011.

   [RFC5280]  Cooper, D.,Santesson, S.,Farrell, S.,Boeyen, S.,Housley, 
              R.,Polk, W., "Internet X.509 Public Key Infrastructure 
              Certificate and Certificate Revocation List (CRL) 
              Profile", RFC 5280, May 2008.

Authors' Addresses

   Phillip Hallam-Baker
   Comodo Group Inc.

   philliph@comodo.com





Hallam-Baker                 April 10, 2015                     [Page 8]

Internet-Draft             Compressed CRLSets               October 2014

   Rob Stradling
   Comodo CA Ltd.

   rob.stradling@comodo.com


















































Hallam-Baker                 April 10, 2015                     [Page 9]