SIP C. Jennings Internet-Draft Cisco Systems Expires: January 16, 2006 July 15, 2005 Computational Puzzles for SPAM Reduction in SIP draft-jennings-sip-hashcash-02 Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on January 16, 2006. Copyright Notice Copyright (C) The Internet Society (2005). Abstract One of the techniques used in SPAM prevention and various solutions for denial of service attacks is to force the SIP client requesting a service to perform a calculation that limits the rate and increases the cost of the request. This draft defines a way to allow a UAS to ask the UAC to compute a computationally expensive hash based function and present the result to the UAS. Although the computation is expensive for the UAC to compute, it is cheap for the UAS to verify. The solution also allows for proxies to compute and check the puzzle on behalf of the UAC or UAS. Jennings Expires January 16, 2006 [Page 1] Internet-Draft SIP Puzzles Against SPAM July 2005 1. Introduction The SPAM prevention problem is complex and will require many techniques working in combination to balance reducing SPAM to acceptable levels while still fostering efficient communication. The overall problem and various approaches are in [7]. Clearly white lists are a critical part of dealing with SPAM. Any system would first check whether an incoming request for communications was from someone on the white list. The Identity [6] mechanisms are critical for understanding who the caller is and to check whether the caller is on the white list. As well, there still needs to be a way for callers not on the white list to communicate with the user. It is here that this specification becomes relevant. The problem is how to permit contacts from people with no previous relationship to us without receiving undesirable contacts. This draft uses the idea that it may be possible to make undesirable contacts more expensive than desirable ones. Different undesirables are willing to spend different amounts of time and money on contacting their markets. Founders of acquired startups are often contacted by random financial companies offering to help manage the new riches. These companies will send people from New York to San Jose and spend hours talking to this very narrow target market. Clothing retailers will mail glossy catalogues worth $1 apiece to houses within the right demographic zip codes. Emails advertising Viagra are sent to random email addresses. As the costs go down, the volume of unsolicited contact goes up. Often people whose contact is desirable are willing to spend much less than some of the undesirables. The student in Fiji who wants to ask about this draft will send an email but probably will not fly here to talk to me. I would like to receive that email. Increasing the cost of contact will reduce both desirable and undesirable contact. My assumption is that the cost should be set very low, so that even a person with a pathetic CPU could still make contact in, say, 10 seconds. Key to this draft is that the receiver can set this cost. This low cost will not stop the financial advisers or the telemarketers, but it might stop the Viagra ads. It would also probably stop a single user from ringing every phone of some residential service provider in a five-second window, before any operator or system can react. Deciding what cost to set constitutes a classic type I/type II error problem, and the receiver gets to choose how to balance these two errors. As is clearly stated in [7], whitelists are the best thing. After that, this is one of the multiple other options that need Jennings Expires January 16, 2006 [Page 2] Internet-Draft SIP Puzzles Against SPAM July 2005 consideration. In general there are two arguments about why the computation puzzles in this specification will not work. The first is that the bad guys have the most powerful CPUs. This issues was addressed above. The other argument is that bad guys have infinite CPU time through using armies of zombie PCs. The problem with this argument is that the goal is not to block particular bad guys but to reduce the overall number of undesirable messages. This second argument is, however, more worrisome than the first. Assume that some percentage of the world's machines each year get owned and used as zombies. Let's say that a given machine has 1% of having this happen to it in a year, that it sends zombie traffic for 24 hours before getting shut down, and that the mechanism described here limits it to ten messages per second: each machine on the internet would receive an average of about one undesirable message per hour. If you assume there are more users than machines, this looks appealing. If message sending technology detects users that are sending lots of messages and shuts them down in less than 24 hours, it gets better. It gets better still if you hope for improvements in operating systems or for users to choose them more carefully. The next assumption is hard to model statistically but it is true: the people with the best financial incentives to send undesirable messages do not want to be subject to the legal and reputation problems of using zombies to get their message across. The zombie problem basically comes down to this. If there are a small percentage of machines in the world that are zombies, they do not render this computation puzzle approach useless. If 10% of the machines in the world are zombies, this approach will be useless. This specification does not attempt to deal with how to make the world such that a small percentage of computers are zombies - the is the problem for other work and that work needs to happen for SPAM to be reduced to reasonable level. This specification does assume that the zombie problem is solved to the level where a small percentage of the worlds computers are zombies. So in summary, white listing is the first and best defense. But for dealing with messages from people with whom we have not previous direct or indirect relationship, another approach is necessary. Hashcash cannot stop all bad messages - that is not the goal - but it can raise the cost of messages and thus decrease the number of times it makes economic sense to send undesirable ones. This approach does assume that bad guys will have more CPU power than good guys and that zombies will still send lots of messages. This approach will simply reduce the number of undesirable messages by some amount that cannot be measured. Jennings Expires January 16, 2006 [Page 3] Internet-Draft SIP Puzzles Against SPAM July 2005 No one knows if this approach would reduce SPAM noticeably. Right now the only thing that limits the rate at which I can call every SIP phone in the world is proxies getting overloaded. And of course, most SIP phones are not connected to the public internet. Of course, the SPAM problem is one reason why many SIP phones are not connected to the public internet. There are some other approaches outlined in [7]. They have different pros and cons, and it is probably necessary to use most of them to ensure SPAM stays at an acceptable level. 2. Overview This specification extends RFC 3261 [3] and defines a mechanism for a proxy or UAS to request that a UAC compute the solution to a puzzle. The puzzle is based on finding a value called the pre-image that, when hashed with SHA1 [4], results in a specific value referred to as the image. The goal is for the UAC to find a pre-image that will SHA1 hash to the correct image. The UAS provides a partial pre-image with some of the low order bits set to zero, together with the number of bits in the pre-image that have been set to zero. The UAS provides the puzzle information using a 419 response, and the UAC resubmits the request along with the solution to the puzzle. The high level flow of information is shown below. UAC UAS | Request | |------------------------->| | | | 419 with Puzzle | |<-------------------------| | | | Request with Solution | |------------------------->| | | This specification defines the 419 response code along with a new header, called Puzzle, to carry the puzzle and solution. 3. Definitions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [2]. 4. Puzzles The normative definition of a puzzle is as follows. A puzzle is four Jennings Expires January 16, 2006 [Page 4] Internet-Draft SIP Puzzles Against SPAM July 2005 values: an integer number referred to as work, a pre-image string, an image string, and a integer number referred to as value. There MUST exist a value X such that all but the "work" number of low order bits of X match the pre-image string, and the SHA1 hash of the string formed by the concatenation of "z9hG4bK" and X results in a value Y, where the "value" number of low order bits of Y are the same as those bits in the image string. The SHA1 hash is computed as described in RFC 3174 [4]. The value X is the solution to the puzzle. The 'work' number of low order bits of the pre-image MUST be zero. This can all be described more mathematically. The notation low(v,x) returns the low v low order bits of x, and zero(v,x) returns x with the low v bits set to zero. The | operator signifies string concatenation. The solution to the puzzle can be considered finding an X such that both the following are true: low( value, image ) = low( value, sha1( "z9hG4bK" | X ) ) zero( work, X ) = zero( work, pre-image ) The pre-image forms a constraint on X. The value of X is the same pre-image, other than the low 'work' bits that are set to zero in the pre-image. The 'value' is the number of bits that match in the solution and is typically set to 160, which is the full size of the SHA1 hash result. The following is a non-normative way for a UAS or proxy to construct a puzzle. The following strings are concatenated: 1. a secret that only this device knows. This would typically be a crypto random string of bits; 2. the current time, rounded to the nearest minute; 3. the URI of the request, the Call-ID, the From tags, and the branch tag for a proxy or the To tag for a UAS. The string is hashed with SHA1 to form the pre-image. The pre-image is appended to the string "z9hG4bK", and the SHA1 hash of this is computed to get the value of the image. A value 'work' indicates how many bits of the pre-image are to be removed. The value 'work' could be a configurable parameter, or it could be dynamically discovered by the software based on how long a hash should take and the speed of the computer it was running on. In the latter case, the resulting software would automatically choose larger values of 'work' as computers get faster. The low order 'work' bits of the pre-image are set to zero. The puzzle consists of the chosen value of 'work', the pre-image (with the low order bits set to zero), the image, and the 'value'. The 'value' would typically be set to 160 as this is the size of the SHA1 hash. Jennings Expires January 16, 2006 [Page 5] Internet-Draft SIP Puzzles Against SPAM July 2005 5. Semantics 5.1 UAS Creating Puzzle When a UAS wishes to challenge a request, it MAY create a puzzle, encode this puzzle in a Puzzle header field value, and return the puzzle in a 419 response. 5.2 UAC Receiving Puzzle When a UAC receives a 419 response, it needs to look at the 'work' and 'value' requested and decide whether or not to try to solve this puzzle. This decision can be made based on the programmed policy and possibly human input. The UAC should not tackle a puzzle that will take longer than the age of the universe to solve. If the UAC chooses to try to solve the puzzle then it proceeds along the following steps: 1. Check that the 'work' bottom bits of the pre-image are all zero. If they are not, this is an invalid puzzle and the 419 response MUST be considered an error response. 2. Set Y to low( value, image ). 3. Create a loop where X ranges from the value of the pre-image to the value of the pre-image plus 2 raised to power of the 'work'. 4. For each interaction through the loop, check if low( value, sha1( "z9hG4bK" | X )) equals Y. If it does, a solution X has been found and the loop can terminate. If the loop terminates without a solution being found, the puzzle was bad and the 419 response MUST be considered as an error response. Once the solution to the puzzle, X, is found, a new request is formed by copying the old request and adding an additional puzzle header field value. The new puzzle header field value MUST have the 'work' set to 0, the pre-image set to the value X, the image set to the value of the image in the original puzzle, and the value parameter set to the same as the value parameter in the original puzzle. Note that if a request was challenged by one proxy and a new request was generated with a solution, and then this request was challenged by a second proxy, a third request would be generated that had two Puzzle header field values. If a UAC, through some out of band mechanism, knows that it will be challenged and what the puzzle will be, it MAY include the appropriate puzzle header field value in the initial request. 5.3 Proxy Behavior SIP allows proxies to act as UASs when generating 4xx responses. Jennings Expires January 16, 2006 [Page 6] Internet-Draft SIP Puzzles Against SPAM July 2005 This same mechanism can be used to allow a proxy to generate the challenge on behalf of a UAS in its domain. Proxies may also act on behalf of the UAC and compute the solution to a puzzle on behalf of the UAC in either a request or a response that passes through the proxy. Typically a proxy would only do this for a UAC that had authenticated to the proxy and for which the proxy had a service relationship. 6. Example TBD 7. Syntax The Puzzle header field carries the puzzle and solution information. It has a parameter called 'work' that has the number of bits of the pre-image that have been set to zero for this puzzle. It has a parameter called 'pre' that carries the pre-image string base64 encoded, and a parameter called 'image' that carries the image string base64 encoded. In addition there is a parameter called 'value' that indicates how many bits of the resulting hash will match the 'image' string. The base64 encoding is done as described in RFC 3548 [1]. When the header field value is carrying a solution to a puzzle, the work parameter will be set to zero. Example: Puzzle: work=10; pre="XPokF1n0+NG6iwRcYzeXuETrtDo="; image="XPokF1n0+NG6iwRcYzeXuETrtDo="; value=160 The ABNF for the header is: Puzzle = "Puzzle" HCOLON puzzle-parm *(COMMA puzzle-param) puzzle-param = puzzle-bits SEMI puzzle-pre SEMI puzzle-image SEMI puzzle-value *( SEMI generic-param ) puzzle-work = "work=" 1*DIGIT puzzle-value = "value=" 1*DIGIT puzzle-pre = "pre=" quoted-string puzzle-image = "image=" quoted-string This document updates the dreaded Table 2 of RFC 3261 to be: Jennings Expires January 16, 2006 [Page 7] Internet-Draft SIP Puzzles Against SPAM July 2005 Header field where proxy ACK BYE CAN INV OPT REG ------------ ----- ----- --- --- --- --- --- --- Puzzle amr o o - o o o SUB NOT REF INF UPD PRA --- --- --- --- --- --- o o o o o o 8. Open Issues and To Do Items The current mechanism has poor interaction with the HERFP forking problem. If several endpoints sent a 419, the proxy would need to aggregate the results and add something like the realm to the challenges to keep them sorted out. Need to add this in next revision. In many cases the solution would work out better if the proxy that was doing the forking applied the policy and did the 419 before forking. This approach has the usual HERFP problem that if some UAs do a 419, and some UAs don't, the request will only reach the UAs that don't do the 419. 9. Security Considerations Still TBD. The concatenation with "z9hG4bK" is done so that this mechanism cannot be used as a distributed computation to reverse arbitrary hash values, as that would present a security risk for other hash based security schemes. TODO - Advice on selecting the size of 'work'. 10. IANA This specification registers a new header and a new response code. IANA is requested to make the following updates in the registry at: http:///www.iana.org/assignments/sip-parameters 10.1 Puzzle Header Add the following entry to the header sub-registry. Header Name compact Reference ----------------- ------- --------- Puzzle [RFCXXXX] Jennings Expires January 16, 2006 [Page 8] Internet-Draft SIP Puzzles Against SPAM July 2005 10.2 419 Response Add the following entry to the response code sub-registry under the "Request Failure 4xx" heading. 419 Puzzle Required [RFCXXXX] 11. Acknowledgments This approach was motivated by [5]. 12. References 12.1 Normative References [1] Josefsson, S., "The Base16, Base32, and Base64 Data Encodings", RFC 3548, July 2003. [2] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [3] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002. [4] Eastlake, D. and P. Jones, "US Secure Hash Algorithm 1 (SHA1)", RFC 3174, September 2001. 12.2 Informational References [5] Black, A., "http://www.hashcash.org/", February 2005. [6] Peterson, J. and C. Jennings, "Enhancements for Authenticated Identity Management in the Session Initiation Protocol (SIP)", draft-ietf-sip-identity-05 (work in progress), May 2005. [7] Rosenberg, J., "The Session Initiation Protocol (SIP) and Spam", draft-ietf-sipping-spam-00 (work in progress), February 2005. Jennings Expires January 16, 2006 [Page 9] Internet-Draft SIP Puzzles Against SPAM July 2005 Author's Address Cullen Jennings Cisco Systems 170 West Tasman Drive MS: SJC-21/2 San Jose, CA 95134 USA Phone: +1 408 421 9990 Email: fluffy@cisco.com Jennings Expires January 16, 2006 [Page 10] Internet-Draft SIP Puzzles Against SPAM July 2005 Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Copyright Statement Copyright (C) The Internet Society (2005). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. Jennings Expires January 16, 2006 [Page 11]