CFRG S. Halevi (IBM) Internet-Draft H. Krawczyk (IBM) Expires: November 12, 2005 12 May, 2005 Strengthening Digital Signatures via Randomized Hashing draft-irtf-cfrg-rhash-00.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. The list of current Internet-Drafts can be accessed at http:// www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on November 12, 2005. Copyright (C) The Internet Society (2005). All Rights Reserved. Abstract We propose to adopt randomized hashing as a mode of operation for existing and future cryptographic hash functions. The idea is to strengthen hash functions for use in the context of digital signatures without requiring a change to the actual hashing and signing algorithms or to their existing implementations. We suggest that randomization can be achieved via the processing of the input to the function, even if the hash function itself is not randomized. Effective use of such mode of operation requires changes to the standardization of the encoding and processing of digital signatures (e.g., PKCS#1, FIPS186) but has no impact on existing signature and hashing algorithms. We urge the standards community to plan a transition towards these new mechanisms for which we outline specific instantiations. Halevi and Krawczyk [Page 1] Internet Draft draft-irtf-cfrg-rhash-00.txt 12 May, 2005 1 Introduction Recent cryptanalytical advances in the area of collision-resistant hash functions (CRHF), specifically the attacks against MD5 and SHA-1, have not only shaken our confidence in the security of specific constructions but, more fundamentally, have cast doubts on our ability to design collision-resistant hash functions that will withstand attacks over a long period of time. These attacks remind us that cryptography is founded on heuristic constructions whose security may be endangered unexpectedly. In particular, this highlights the importance of following two fundamental cryptographic design principles: (i) design protocols and applications such that the underlying cryptographic pieces (e.g., hash functions) are easy to replace when need arises (in particular, avoid hard-wiring of any specific construction into the application), and (ii) design as general as possible mechanisms with as little as possible requirements from the basic cryptographic building blocks. The present proposal is intended to address these points, especially the second one. Although many existing applications that use hash functions do not actually require full collision resistance, and although the current attack on SHA-1 is not quite practical yet, it is clear that we cannot dismiss the recent attacks as theoretical only. Indeed there are important applications today that do rely on full collision resistance, in particular those that use standard signature schemes to provide non-repudiation or certification services. And with the expected cryptanalytical improvements in the near future, ignoring these new attacks would be irresponsible. Some of the options contemplated in the applied cryptography world for responding to the recent attacks on MD5 and SHA-1 are the following: (1) Modify applications that rely on collision resistance such that the particular use of CRHF in these applications will be less vulnerable to collision attacks. (2) Upgrade the systems using SHA-1 and MD5 to use stronger hash functions such as the SHA2 family (256- and 512-bit versions). The hope is that these functions will provide for more robust CRHFs. Option (1) could be applied to different settings, but it is very application specific. In particular, note that even if one could set precise assumptions on the way specific applications are used today, these assumptions are likely to change or become obsolete over time. To illustrate this point, consider modifying applications that use signatures so that the messages to be signed are structured in a way that is unpredictable to the attacker. This approach relies heavily on the understanding of the semantics and structure of messages used in the application. Therefore, while it may be viable for specific applications (such as choosing unpredictable serial numbers in Halevi and Krawczyk [Page 2] Internet Draft draft-irtf-cfrg-rhash-00.txt 12 May, 2005 certificates) it is insufficient as a general measure. Option (2) is more robust but its cost and complexity are significant: it requires a multitude of applications, protocols and implementations to instrument the transition to new functions. Hopefully, the current attacks will improve at a mild enough pace so that a relatively orderly transition can be implemented. And even if we manage such a gradual transition, we must contemplate the possibility that by the time the transition is completed the new adopted functions will appear as weak as SHA-1 appears to be now. The approach that we propose here takes elements from the above two options. We suggest that we must plan for a transition to more secure mechanisms, that this has to be done in an orderly way (i.e., not as an uncontrolled panicking reaction), and that rather than patching individual applications we re-engineer general mechanisms in a way that provides for more robust cryptography, specifically more secure signature schemes. To accomplish the latter we propose to re-define the way hash functions are used in the context of digital signatures so as not to rely so heavily on the full collision resistance of our hash functions. This is likely to result in a significantly longer useful life for SHA-1 itself and, even more importantly, will result in significantly weaker requirements from any hash family to be adopted or designed in the future. While this solution is not for free (see below), we show that it can be done without having to change the basic signature algorithms in use (e.g., RSA, DSS), without even changing the existing hash functions (e.g., SHA-1, SHA2), and without the need to understand the semantics of particular applications or messages. What needs to be changed is the interface to the signing and hash algorithms. The main tool for achieving all of the above, in particular lowering the requirements on the security of hash functions, is the use of randomized hashing, a well-studied notion in the cryptographic literature (but seldom used in practice). Since our proposal requires no change to the hashing algorithms themselves it can be seen as a proposal for a "mode of operation" for existing and future hash functions. We end this introduction by noting that randomized hashing has applications beyond the context of digital signatures. On the other hand, it is important to also realize that randomized hashing is NOT a replacement for CRHF in ALL possible applications. For example, randomized hashing may not be appropriate in applications where commitment is required or implied, e.g., a bidder committing to her bid in an auction. So while we do not advocate abandoning CRHF as a useful cryptographic tool, the important message we wish to convey is that having a randomized mode of operation for CRHF for use in digital signature applications, such as those requiring non-repudiation, provides a substantial security gain and significantly raises the bar against existing and future cryptanalytical attacks on the underlying hash functions. Halevi and Krawczyk [Page 3] Internet Draft draft-irtf-cfrg-rhash-00.txt 12 May, 2005 2. Randomized Hashing and Signature Schemes The idea behind the use of randomized hashing is simple. Instead of using a single deterministic hash function such as SHA-1 one uses a keyed hash function (equivalently, a family of hash functions where each function in the family is determined by a key or index). For example, SHA-1 itself can be converted into a family of hash functions indexed through a variable IV (we mention this as an illustration, not necessarily as the best way to transform SHA-1 into an indexed hash family for our purposes here). Let us denote by SIG a secure signing algorithm (such as RSA or DSS), by H a family of hash functions, and by H_r the function from this family indexed by the value r. Now, for signing a message m the signer chooses a random value r and computes SIG(r,H_r(m)). Here, the pair (r,H_r(m)) represents a (standard) encoding of the concatenation of the values r and H_r(m). The signature on message m now consists of the pair (r,SIG(r,H_r(m)). Before discussing implementation issues (such as the choice of the family H, the index r, and the encoding function) let's see why this method reduces the reliance on collision resistance of the hash function. Consider an attacker, Alice, against a honest signer Bob that signs using the scheme from above. Alice provides a message m to be signed by Bob and she gets back the pair (r,SIG(r,H_r(m)) from Bob, where r is a value chosen at random (or pseudo-randomly) by Bob anew with each signature. How can Alice attack this scheme (short of breaking the signature algorithm, say RSA, itself)? What Alice needs to do is to find a message m that Bob is willing to sign and hope that when she receives the pair (r,H_r(m)), for random r chosen by Bob, she will be able to find another message m' for which H_r(m)=H_r(m') (with the same index r chosen and signed by Bob). If she could do that then the signature string SIG(r,H_r(m)) would also be a valid signature for m'. We remark that Alice could do a bit better by asking to sign many messages m1, m2,..., getting back many pairs (r1, SIG(r1,H_r1(m1)), (r2,SIG(r2, H_r2(m2)),..., and then finding another m' such that for some i it holds that H_ri(mi)=H_ri(m'). But note that the number of pairs is limited by the number of signatures that Bob is willing to generate, and that Alice needs to engage in an on-line interaction with Bob for every such pair. It is therefore likely that in most applications the number of pairs available to Alice would be quite small (say, not more than 2^30 or 2^40). Below we analyze only the case of a single pair, while keeping in mind this additional factor when dealing with concrete parameters. Returning to the single pair condition, we see that Alice can produce a forged signature if she can do one of the following: Halevi and Krawczyk [Page 4] Internet Draft draft-irtf-cfrg-rhash-00.txt 12 May, 2005 (i) Cryptanalyze the family H to the point that for a random pair (r,v) she can find m' such that H_r(m')=v. (ii) Achieve (i) when in addition to the pair (r,v) Alice also knows another value m for which H_r(m)=v. (iii) Achieve (ii) when the value m is chosen by Alice herself BEFORE learning r. In other words, the collision finding task for Alice is not against a fixed, known in advance, function as it is the case today with the use of a fixed hash function, but against a random function in the hash family H whose index r is revealed to Alice only after she committed to the message m. In particular, being able to find collisions against a fixed member of the family is useless; Alice needs to be able to do so for a reasonably large fraction of hash functions in the family. Before we continue we note that the resistance to each of the above forms of attacks is called, respectively: (i) one-wayness (OW) (ii) second-preimage resistance (SPR) (iii) target-collision resistant (TCR) The precise difference between SPR and TCR is that in the former the first message m is chosen at random while in TCR the attacker gets to choose m (but before learning r). We also remark that TCR functions were first defined by Naor and Yung [NY89] where they were called universal one-way hash function (UOWHF); the term TCR that we use here is from [BR97]. Obviously, these tasks are harder to perform than a regular collision-finding attack against a single CRHF function H (i.e. the finding of two messages m,m' such that H(m)=H(m')). More specifically, one can point to two essential differences between a regular collision attack and any one of the above tasks. First, a regular collision attack can be performed in a complete off-line manner (i.e. ahead of the time when a signature is to be issued) while each of (i)-(iii) depends on the choice of r and therefore needs to be completed only after r is determined and communicated to the adversary. Second, while collisions against a single hash function that outputs k bits can be found by brute force in time 2^{k/2}, a brute force TCR attack will take 2^k time. And even if we recall the additional factor of 2^n pairs that Alice can achieve via on-line interaction with Bob, a brute force attack would still take her 2^{k-n} time (in the case of SHA-1 k=160, while n would be no more than 40 in most reasonable applications). Of course, none of the above says that SHA-1 (or any other specific hash function) is sure to resist TCR attacks (or even SPR attacks). But it clearly indicates that if an application uses a hash function in a way that can only be broken under a successful TCR attack, then the application is much more likely to remain secure in face of cryptanalytical improvements than one that relies on full collision resistance. This is true whether the hash function in use is a Halevi and Krawczyk [Page 5] Internet Draft draft-irtf-cfrg-rhash-00.txt 12 May, 2005 partially broken CRHF such as SHA-1, a (hopefully) better collision-resistant family such as SHA2, or any hash function to be designed in the future. While the above indicates general relations between the strengths and vulnerabilities of different hashing tasks it does not tell us how to instantiate a TCR function. We discuss this in the next section. Later, in section 4, we explain how to integrate randomized hashing into signatures (specifically, how to sign and transport the index r). 3. A TCR Construction for Iterated Hash Functions We propose a specific way to convert a single hash function H (e.g SHA-1 or SHA2) into a TCR function family. The design principles that we follow are: (1) Do not change H: randomization is applied to the hash input before the hash function is called. (2) Minimize performance impact. (3) Increase (heuristically) the likelihood of resistance of the family to TCR attacks. In 3.1 we present a basic construction (with some heuristic rationale in Appendix A). In 3.2 we list some variants which take into account some further trade-offs between performance and plausible security. We stress that these methods, although plausible, need to be scrutinized further before they can be adopted. 3.1 A simple randomized hash construction Let H be a hash function that processes the message to be hashed in 512-bit blocks. For example, if H is an integrated hash function a-la-Merkle-Damgard then the underlying compression function has as inputs an IV and a 512-bit data input. (We use 512 bits as the typical block size but other values are possible.) Let XOR denote the bit-wise exclusive-or operation. Given a message m to be hashed, the signer (or "hasher") chooses a 512-bit random value r, and XORs each 512-bit block of m with r. (If m is not an exact multiple of 512-bit blocks then the shorter last block is XORed with an appropriately truncated r.) In other words, we concatenate r to itself until we get a string r* of the same length of m, and then compute m XOR r*. We define H_r(m) to be H(m XOR r*). Note: By our definition the result of (m XOR r*) is of the same length as m; therefore, the length padding defined by Merkle-Damgard functions such as SHA-1 is applied to (m XOR r*). In other words, the length padding is not subject to the XOR with r*. Halevi and Krawczyk [Page 6] Internet Draft draft-irtf-cfrg-rhash-00.txt 12 May, 2005 In Appendix A we provide some rationale on the choice of this particular way of converting iterated hash functions into TCR. Variants of this method are presented next. 3.2 Some Randomized Hash Variants A possible strengthening of our construction from Sec 3.1 can be obtained if, in addition to XORing each block of input with the value r, one also prepends r to the input to H, i.e., the input to H consists of the block r concatenated with (m XOR r*). This provides a randomizing effect to the initial IV of H (in the spirit of the HMAC construction). An even more conservative variant could interleave the block r between any two blocks of the original message, thus providing an IV randomization feature for each application of the compression function. The obvious drawback is the added computation (double the cost of the original hash function). Another natural idea is to add a layer of security by XORing a different random pad to each block of the message. Clearly, this adds a non-trivial computational cost (one would need to generate a pad of the length of the message via some PRG). A midway strategy could be to start with a pad of the length of a single block and slightly (and inexpensively) change this pad for each new block of input, for example by applying circular byte rotation to the previous block pad. A similar idea would be to derive the pad from a byte-oriented LFSR whose initial value is the key r. Finally, if the generation of a 512-bit random (or pseudo-random) quantity r for each signature is regarded as expensive (possibly true for low-power devices, smart cards, etc.) then it is possible to define r as the concatenation of a shorter pad. For example, in order to define r one could choose a random 128-bit string and concatenate it four times to create r. Given the heuristic nature of our constructions this may be considered a reasonable trade-off. 4. TCR Hashing and Signature Encoding Recall how randomized hashing is to be used in the context of digital signatures. For signing a message m, the signer chooses at random a value r and computes SIG(r,H_r(m)) where SIG represents a signing algorithm (such as RSA or DSS). More precisely, the signer will use a well-defined standard encoding of the concatenation of the values r and H_r(m) and then apply algorithm SIG to this encoding. The signature on message m consists of the pair (r,SIG(r,H_r(m)). The above requires changing current signature schemes in four ways: (1) Choosing a random (unpredictable) index r for each signature, (2) Replacing the current hashing of a message m from H(m) to H_r(m), (3) Signing r, and (4) Transporting r as part of the signature. Halevi and Krawczyk [Page 7] Internet Draft draft-irtf-cfrg-rhash-00.txt 12 May, 2005 Here we discuss the required changes to existing message encodings for implementing the last 3 points. We focus on the two main algorithms in use: RSA and DSS. We note that while changing existing encoding standards may be one of the possible obstacles to adopting randomized hashing, this change is instrumental in allowing for more secure and robust signature schemes not only in the short term but in the farther future as well. We suggest that this change to the standards be specified and adopted as soon as possible. As we see below, these changes can be specified in a way that is independent of the specific randomized hash function to be used. We start with RSA. The most common encoding in use with RSA signatures is PKCS#1 v1.5. It specifies that given a message m to be signed, the input to the RSA signature function is a string composed of the hash value H(m) (computed on the message m using a deterministic hash function such as SHA-1) which is padded to the length of the RSA modulus with a standard deterministic padding (this padding contains information to identify the hash algorithm in use). This encoding can be extended to deal with randomized hashing as follows. First, the value H(m) is replaced with H_r(m) for r chosen by the signer. Second, part of the deterministic padding (which is currently filled with repeated 0xff octets) is replaced with the value of r. In this way, r is signed and, at the same time, it is made available to the verifier of the signature without any increase in the size of signatures (r is recovered by the verifier by inverting the signature operation). Another RSA encoding, called EMSA-PSS, is standardized by PKCS#1 v2.1 and is based on the randomized signature scheme of Bellare and Rogaway [BR96]. Unfortunately, the standard defines an encoding in which the first step is to apply a deterministic hash function (say, SHA-1) to the message m. Only then the randomized encoding scheme of PSS is applied. As a result, the signature scheme that uses EMSA-PSS is broken if the hash function is not fully collision resistant. In order to use this scheme with randomized hashing, one would replace the current H(m) value in the encoding with H_r(m) and the value r would be encoded in a way that the verifier of a signature can recover it before applying the randomized hashing. The original PSS scheme from [BR96] can be used, or adapted, to achieve such an encoding. Two points to remark regarding the applicability of PSS here are: first, the original PSS scheme is patented -- see US Patent 6266771 (which may or may not be an obstacle for adoption). Second, the main analytical benefit of PSS is its provability based on the so called "random oracle model". While this provides a good heuristic backing to the construction, one has to take into account that here we are dealing explicitly with lowering the security requirements from the hash function, so it is questionable how random-like these functions may be required to be. Formal proofs aside, the PSS scheme offers good heuristic advantages over the PKCS#1 v1.5 in that it better randomizes the input to the RSA signing algorithm. Halevi and Krawczyk [Page 8] Internet Draft draft-irtf-cfrg-rhash-00.txt 12 May, 2005 Regarding the DSS (or DSA) signature algorithm, the first thing to note is that this is already a randomized signature scheme. A DSS signature is composed of a pair of elements (R,S) where R is a random element in the DSS group and S is a value computed as a function of the private key of the signer, the discrete logarithm of R (denoted k), and the value H(m) (where m is the message to be signed and H a deterministic hash function). In order to convert this scheme to use randomized hashing one can use R itself as the index to the hash family, i.e., r=R (or to derive r from R in some deterministic way). Then one would replace H(m) with H_r(m). In this way the size of signatures is unchanged and no further processing is required to generate r. Also note that while the signature component R is not strictly "signed", the attacker cannot control or choose this value (indeed, an attack that finds values R and S for which (R,S) are a valid signature of H_R(m), for a value H_R(m) not signed by the legitimate signer, would contradict the basic security of DSS). One may argue that the use of H_r(m) instead of H(m) can be viewed as an "implementation" of the random-oracle version of DSS as analyzed by Pointcheval and Stern [PS96]; the same caveats expressed in the case of PSS in relation to the use of the random oracle model apply here as well. One consideration in regards to using the component R of DSS signatures as the index to the randomized hash family is that, in order to ensure the TCR property, this index needs to be unknown (unpredictable) to the attacker when the latter chooses the message m to be signed. If the value of R is computed off-line by the signer (which is possible in the case of DSS) and is leaked before the attacker choses m then the benefit of randomized hashing is lost. Hence, R=g^k should be kept secret together with k until the signature is issued. This is not a fundamental limitation to the practice of DSS since the DSS scheme already requires (in an essential way) that k be kept secret, even if computed off-line, since its discovery by the attacker is equivalent to finding the secret private key of the signer! 5. Security Considerations This document presents mechanisms that, if adopted by standard bodies such as the IETF, will result in significant improvements to our current and future digital signature systems. While this document focuses on randomized modes of operation of hash functions that provide randomized hashing without changing existing algorithms, it is advisable that future hash families will be designed with randomized hashing and TCR requirements in mind. For example, new schemes that follow the Merkle-Damgard approach may consider allowing for the masking of intermediate values with optional user provided inputs (that is, such a mask could be set to a default value, say 0, for deterministic uses of the hash function, and to user-provided values when randomization is desired). The important point is that implementations of the function will be ready to accept such masks without having to change the function. Halevi and Krawczyk [Page 9] Internet Draft draft-irtf-cfrg-rhash-00.txt 12 May, 2005 We note that all references to "randomness" in this document should be interpreted as "pseudo-randomness" provided one uses a cryptographically strong pseudorandom generator (or pseudo-random function) initialized with a strong unpredictable seed. We also mention that using TCR hashing may mean that the legitimate signer can find two messages with the same signature (since it is the legitimate signer that is choosing the randomness r). One should note, however, that this has no bearing on non-repudiation (as the signer is still bound to both messages). Moreover, as shown in [SPMS02], even if one uses CRHF some secure signature schemes (such as ECDSA) may allow a signer to find two different messages whose signature string is the same. Still, as mentioned at the end of the Introduction, there may be OTHER applications of CRHF that cannot be replaced with a TCR family. Finally, the general approaches to randomized hashing and digital signatures discussed here do not depend on the specifics of the concrete constructions that we proposed here. Other forms of randomized hashing and TCR schemes may be superior to the ones proposed here and further proposals are encouraged. ACKNOWLEDGMENT. We thank Ran Canetti for useful discussions and for badgering us to write this document. REFERENCES [BR96] M. Bellare and P. Rogaway, "The Exact Security of Digital Signatures -- How to Sign with RSA and Rabin", Eurocrypt'96, LNCS 1070, 1996. [BR97] M. Bellare and P. Rogaway, "Collision-Resistant Hashing: Towards Making UOWHFs Practical", Crypto'97, LNCS 1294, 1997 [HPL04] D. Hong, B. Preneel, and S. Lee, "Higher Order Universal One-Way Hash Functions", Asiacrypt'04, LNCS 3329, 2004. [NY89] M. Naor and M. Yung, "Universal One-Way Hash Functions and Their Cryptographic Applications", STOC'89, 1989. [PS96] D. Pointcheval and J. Stern, "Security Arguments for Digital Signatures and Blind Signatures", J.Cryptology, 13:361-396, 2000. [S00] V. Shoup, "A Composite Theorem for Universal One-Way Hash Functions", Eurocrypt'00, LNCS 1807, 2000. [SPMS02] Jacques Stern, David Pointcheval, John Malone-Lee, and Nigel P. Smart, "Flaws in Applying Proof Methodologies to Signature Schemes", CRYPTO '2002, LNCS 2442, 2002. Halevi and Krawczyk [Page 10] Internet Draft draft-irtf-cfrg-rhash-00.txt 12 May, 2005 Appendix A - Rationale for the Proposed TCR Construction(s) Our TCR proposals follow the following principles: (1) Allow the use of existing functions such as SHA-1 and SHA2 (in particular, iterated hash functions a la Merkle-Damgard). (2) Do not change the hash function but only the interface to it (e.g., in our proposal randomization is achieved via the input to the function and therefore implemented hash functions, in either software or hardware, can be used without modification). (3) Use as weak as possible properties of the compression function underlying the hash construction. Our construction is general enough to be used with any hash function that processes the incoming data as blocks. Yet, we focus in our discussion here on Merkle-Damgard (M-D) type of hash functions since these are the most common schemes in practice. While (1) and (2) are obvious properties of our suggested construction we elaborate here on point (3). Ideally, we would have liked to provide a mathematical theorem proving the security of our construction using only relatively weak requirements from the underlying compression function. While such theorems exist for some specific constructions (e.g., [BR97,S00]), they all include operations that violate the principle of using the existing hash functions without any change (e.g., masking the intermediate value of the compression function with each call to this function). We thus settle for a heuristic rationale that should be scrutinized in light of the evolving ideas in the area of hash function cryptanalysis. Let H be a M-D function (the reader can think of SHA-1 for concreteness) and h be the corresponding compression function. That is, h acts on two inputs, a and b, where a represents an intermediate value (IV) and b is a 512-bit block, and the output of h is of the length of the IV (IV lengths vary with different constructions, e.g., 160, 256, etc.). The function H itself is defined for arbitrary inputs by iterating h over the successive blocks of the input with each iteration using the IV computed by the previous application of h (the first IV is set to some constant defined by the specification of H). Consider now a family of compression functions derived from h as follows: for each 512-bit index r, define h_r(a,b)=h(a,b XOR r). It is easy to see that iterating h_r as in a M-D construction one obtains the function H_r that we defined in 3.1. Merkle and Damgard showed that if a compression function h is collision resistant with respect to fixed-length inputs, then the function H obtained by iterating h is collision resistant on arbitrary inputs. We would like to claim the same with respect to the property of target-collision resistance (TCR), namely, that if h is TCR so is H. This, however, is not necessarily the case. Halevi and Krawczyk [Page 11] Internet Draft draft-irtf-cfrg-rhash-00.txt 12 May, 2005 Yet, an "approximation" to this result was recently shown by Hong, Preneel and Lee [HPL04]. They show that if the construction h_r has a property called "n-order TCR" then the iterated family H_r is TCR for messages of up to n blocks. The property of n-order TCR is defined by the following game between Alice (the Attacker) and a "hasher" Bob. (1) Bob chooses an index r and keeps it secret. (2) For i=1,...,n: Alice chooses a pair (a_i,b_i) and receives from Bob the value h_r(a_i,b_i). (3) Alice chooses a pair (a,b). (4) Bob reveals r to Alice Alice wins the game if she can find (a',b') different from (a,b) such that h_r(a,b)=h_r(a',b'). In other words, Alice needs to carry a TCR attack but she is allowed to query h_r on n inputs of her choice before committing to the first colliding message and before learning the value of r. Intuitively, the difference with a regular TCR attack is that Alice has now an advantage in choosing (a,b) since she can first learn something about r from the first n queries. A family h is called n-order TCR if any efficient attacker (Alice) can only find (a',b') as above with insignificant probability. Before we continue it is important to clarify that the above game defining n-order TCR functions is not a game that reflects an actual interaction between an attacker and a victim in real life but it is only a virtual game used to define the security of a function. How much does the extra phase (2) in the game from above help Alice to find collisions? This of course depends on the specific function, and to some extent also on the value of n. Note that if one lets n to be huge (say 2^80 in the case of SHA-1) then Alice can use this "learning phase" to find colliding pair (a_i,b_i) and (a_j,b_j) that she can then use as (a,b) and (a',b') respectively. But recall that n represents the length in blocks of the messages to be hashed with the iterated construction, so it will typically be quite small. (I.e., n=4 or so in the case of certificates, and n < 2^30 even for huge documents.) Hence one may hope that the learning phase will not be sufficiently useful for Alice to choose the colliding pair. In other words, while in order to break a collision-resistant hash function an attacker can spend a HUGE amount of OFF-LINE computation for finding collisions, for breaking an n-order TCR function the attacker is limited to only MODERATE ON-LINE interaction with the hasher after which it needs to commit to a first colliding value x. Only then the attacker receives the actual value r for which it needs to find x' such that h_r(x)=h_r(x'). We also comment that the common view of the compression function h(a,b) as a block cipher with key b and input a gives rise to another heuristic argument supporting the view of h_r as n-order TCR. Viewing h(a,b) as a block cipher, phase (2) of the attack from above Halevi and Krawczyk [Page 12] Internet Draft draft-irtf-cfrg-rhash-00.txt 12 May, 2005 on h_r is just a chosen-plaintext related-key attack on the block cipher h. If h resists such attacks with a moderate number of queries, then phase (2) does not help the attacker learn much about r. Hence, if h is both TCR and a sufficiently robust block cipher, then it is also an n-order TCR. As said, [HPL04] show that if a compression family h={h_r} is n-order TCR then the family H={H_r} is TCR on n block inputs (here H_r is a Merkle-Damgard iteration of h_r). Applying this result to our case, we obtain that if the construction h_r(a,b) = h(a, b XOR r) is an n-order TCR then the family H_r described in 3.1 is TCR for n-block inputs. In other words, any TCR attack against the family H_r that uses n-block messages, translates into an n-order TCR attack against the compression function family h_r with only n initial oracle queries. This provides some foundation to the belief that even the existing hash functions are significantly more secure in the sense of TCR than for collision resistance when used as specified here. In addition, one should examine the current attacks and see to what extent they apply to the defined functions. In particular, we note that the XORing of input blocks with a random block, while it preserves differentials, it also destroys the ability of the attacker to set some of the bits of the colliding messages to values of its choice. It seems that an attack that takes advantage of differentials in this setting would need to rely on universal differentials that depend only on the hash function and for which most pairs of messages with that difference would collide. Finally, we point out to another "motivating" element in our design. Remember that SPR (second pre-image resistant) functions are a weaker (i.e., easier to accomplish) version of TCR functions where the attacker cannot choose the first colliding value but rather this value is determined at random. A straightforward way to transform an SPR compression function h into a TCR family [S00] is to choose a pair r=(s1,s2), where s1,s2 are random values of the length of the IV and block size, respectively, and define h_r(a,b)=h(a XOR s1, b XOR s2). Unfortunately, iterating such an h_r is impractical as it requires modifying H such that the IV can be XORed with S1 in each iteration of h. Therefore, instead of using this full transformation of SPR into TCR we carry the randomization only in the second input of h, namely, in our construction in 3.1 we use h_r(a,b)=h(a,b XOR r) (when viewing h as a block cipher, as mentioned before, the XORing with r provides for randomization of the cipher key). Halevi and Krawczyk [Page 13] Internet Draft draft-irtf-cfrg-rhash-00.txt 12 May, 2005 Authors' Addresses Shai Halevi shaih@alum.mit.edu Hugo Krawczyk hugo@ee.technion.ac.il IBM T.J. Watson Research Center 19 Skyline Drive Hawthorne, NY 10532 USA Full Copyright Statement Copyright (C) The Internet Society (2005). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Halevi and Krawczyk [Page 14]