Internet Draft Matt Curtin draft-ietf-usefor-message-id-01.txt The Ohio State University Category-to-be: Informational Jamie Zawinski Netscape Communications July 1998 Expires: Six Months from above date Recommendations for generating Message IDs Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as ``work in progress.'' To view the entire list of current Internet-Drafts, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). Abstract This draft provides recommendations on how to generate globally unique Message IDs in client software. Table of Contents 1. Introduction 2. Message-ID formatting 3. Message-ID generation 3.1 "Domain part" 3.2 "Local part" 3.2.1 Sequence number 3.2.2 Using a pseudorandom number generator 3.2.3 Using a hash 3.3 Bringing it all together 4. Acknowledgments 5. References 6. Authors' addresses 1. Introduction Message-ID headers are used to uniquely identify Internet messages. Having a unique identifier for each message has many benefits, including ease in the following of threads and intelligent scoring of messages based on threads to which they belong. It has been suggested that it is impossible for client software to be able to generate globally-unique Message-IDs. We believe this to be incorrect, and herein to offer suggestions for generating unique Message-IDs. 2. Message-ID formatting As defined in [NEWS], a message ID consists of two parts, a local part and a domain, separated by an at-sign and enclosed in angle brackets: message-id = "<" local-part "@" domain ">" Practically, news message IDs are a restricted subset of mail message IDs. In particular, no existing news software copes properly with mail quoting conventions within the local part, so software generating a Message-ID would be well-advised to avoid this pitfall. It is also noted that some buggy software considers message IDs completely case-insensitive, in violation of the standards. It is therefore advised that one not generate IDs such that two IDs so generated can differ only in character case. 3. Message-ID generation As shown above, the Message-ID is made up of two sections. We'll consider each seperately. 3.1. "Domain part" On many client systems, it is not always possible to get the fully-qualified domain name (FQDN) of the local host. In that situation, a reasonable fallback course of action would be to use the domain-part of the user's return address. (Use of an unqualified hostname for the domain part of the Message-ID header would be foolish, and should never be done.) Using the domain-part of the user's return address makes the generation of the "local part" be more important; in particular, it means that a process ID is probably not sufficient. 3.2. "Local part" The most popular method of generating local parts is to use the date and time, plus some way of distinguishing between simultaneous postings on the same host (e.g. a process number), and encode them in a suitably- restricted alphabet. A number of approaches here are possible. Each has its advantages and drawbacks. The importance of the local part's uniqueness increases with the frequency of messages being generated in a given domain. Using several of these methods together will produce a Message-ID that is longer, but significantly less likely to collide. 3.2.1. Sequence number An older but now less-popular alternative is to use a sequence number, incremented each time the host generates a new message ID; this is workable for servers, but requires careful design to cope properly with simultaneous posting attempts, and is not as robust in the presence of crashes and other malfunctions. For client Message-ID generation, particularly on hosts where the exact FQDN cannot be obtained, or is subject to change, this might not even be workable. 3.2.2. Using a psuedorandom number generator One could take 64 bits from a good, well-seeded pseudorandom number generator [PRNG] in order to significantly increase the uniqueness of the Message-ID. The advantage of this method is that it is fast and generally effective. The disadvantage is that in a perfect random number generation scheme, the possibility of getting the same number twice in a row is exactly the same probability as getting any two numbers. 3.2.3. Using a hash Another approach would be to generate a hash of the message and use that after the timestamp. If this is done well, this can also significantly reduce the opportunity for collision, and will generate a value that is relatively unique. Note that, in practice, this is more difficult than it sounds. It is recommended that a cryptographically secure hash function [SHA1, MD5] be used, as others, such as CRC, are likely to have higher instances of collision. 3.3. Bringing it all together In summary, the approaches to generating a Message-ID that we'll consider here are in the following format: 1 Append "<". 2 Get the current time in the highest resolution to which you have access (at least seconds, though most systems will give you milliseconds) and generate a timestamp in the format yyyymmddHHMMSS.ss; 3 Generate additional data to prevent Message-ID collision on two messages processed by the same host at precisely the same moment. (See section 3.2.) Convert these two numbers to base 36 (0-9 and A-Z), and write the first number, then additional parts, each section seperated by a ".", and an "@". 5 Append the FQDN of the local host, or the host name in the user's return address. 6 Append ">". 4. Acknowledgments This document is partially derived from an earlier, unrelated draft by Henry Spencer. 5. References Ref. Author, title IETF status (June 1998) ---------------------- --- ------------- [NEWS] M.R. Horton, R. Adams: "Standard Non-standard (but still for interchange of USENET widely used as a de-facto messages", RFC 1036, December standard). 1987. [SHA1] National Institute of Standards and Technology (NIST), "Announcement of Weakness in the Secure Hash Standard", May 1994. (Update of FIPS 180: "Secure Hash Standard".) [MD5] R. Rivest: "The MD5 Message-Digest Informational (but Algorithm", RFC 1321, April 1992. (widely used as a de-facto standard). [PRNG] D. Eastlake, 3rd, S. Crocker, Informational. J. Schiller: "Randomness Recommendations for Security", RFC 1750, December 1994. 6. Authors' Addresses Matt Curtin The Ohio State University 791 Dreese Laboratories 2015 Neil Ave Columbus OH 43210 +1 614 292 7352 cmcurtin@cis.ohio-state.edu Jamie Zawinski Netscape Communications Corporation 501 East Middlefield Road Mountain View, CA 94043 (650) 937-2620 jwz@netscape.com