Network Working Group C. Newman Internet Draft: Application Protocol Design Principles Innosoft Document: draft-newman-protocol-design-01.txt July 1997 Expires in six months Application Protocol Design Principles Status of this memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." To view the entire list of current Internet-Drafts, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Abstract There are a number of design principles which come into play over and over again when designing application protocols. Many of these are entrenched in IETF lore and spread by word of mouth. Most have been learned the hard way many times. This is an attempt to codify some of these principles so they can be referenced rather than spread by word of mouth. The author has not invented any of these ideas and while the exercise of finding the originator of the ideas would be interesting, it is not deemed necessary for this project. Many of these principles have a much wider scope than application protocol design. However, the author's primary experience is with application protocols and examples provided usually involve application protocols or elements. [Disclaimer: this is a preliminary draft. Some of the case studies and exceptions need tuning. Suggestions welcome.] Newman [Page i] Internet Draft Application Protocol Design Principles July 1997 Table of Contents Status of this memo ............................................... i Abstract .......................................................... i 1. K.I.S.S. .................................................... 1 2. Make the Common Case Simple & Uncommon Case Possible ......... 1 3. 0, 1, N Principle ............................................ 1 4. Be Liberal/Conservative ...................................... 2 5. Avoid Silly States ........................................... 3 6. Text Not Numbers ............................................. 4 7. Avoid Alternative Representations ............................ 4 8. Announce Features, Not Version ............................... 5 9. Avoid Unnecessary Layers ..................................... 6 10. Fully Qualify Data ........................................... 6 11. Extensibility ................................................ 6 12. Conclusions Based on Design Principles ....................... 7 13. Security Considerations ...................................... 8 14. References ................................................... 8 15. Author's Address ............................................. 9 Newman [Page ii] 1. K.I.S.S. The "Keep It Simple, Stupid" principle or "KISS" principle is well known. The basic idea is not to add complexity if there is any way to avoid it. Sometimes this also involves a decision of where the complexity should live (e.g. client implementation, server implementation, protocol itself, external layers). This is a very difficult principle to follow in practice. Consequences of Violation: design errors, implementation bugs, poor deployment, poor maintainability, interoperability problems, poor usability, less peer review, protocol has to be "profiled" to interoperate. Case Study: X.400 vs. SMTP/MIME. X.400 is very complex and is losing ground steadily in the marketplace. SMTP/MIME is much simpler and is gaining ground in the marketplace. Case Study: The OSI Virtual Terminal (VT) is much more complex than telnet and has received little success in the marketplace by comparison. Note that telnet is unnecessarily complex by itself. 2. Make the Common Case Simple & Uncommon Case Possible This is largely a corollary of the KISS principle. Sometimes phrased as "design for the common case." The idea is to make the common case very simple without disallowing the useful uncommon cases. This requires identifying the common case which is a good idea. Consequences of Violation: Same as KISS. If useful uncommon case is not possible, then a potentially complex protocol extension is necessary which results in more complexity than if the uncommon case was considered from the start. Case Study (common case too complex): ASN.1 makes the common case far too complex. While it does provide for unlimited extensibility, in practice implementations can't deal with many legal structures due to the complexity. Case Study (uncommon case not possible): Internet Mail originally didn't allow non-text data. MIME is more complex than it would have been if designed in from the beginning. 3. 0, 1, N Principle The "0, 1, N principle" is not an obvious principle but is true surprisingly often. In general, any protocol element or object Newman [Page 1] Internet Draft Application Protocol Design Principles July 1997 should come in quantities of 0, 1, or N where N is an arbitrary number. If a limit is picked, it is likely to be too small. This is especially true of hierarchy, and often true of names. Consequences of Violation: System has to be extended to allow larger values. This causes a transition with severe interoperability problems or a semantic overload of existing structure which adds complexity and confusion. Case Study: 640K Case Study: 32-bit IPv4 address space Case Study: MIME media types. Two levels of hierarchy were defined in the initial MIME specification. This proved inadequate, so a new hierarchy delimiter had to be introduced to allow more naming hierarchy. Case Study: SMTP error codes have 3 levels of hierarchy with 10 settings each. This has proved to be insufficent and inflexible requiring the addition of ESMTP ENHANCEDSTATUSCODES [ESMTP-STATUS]. Exception: A quantity of two is permitted for clearly binary situations. Exception: Has to be balanced with the KISS principle. For example, current practice limits most numbers in protocols to 32-bit values so they can be represented as native integers on 32-bit machines. 4. Be Liberal/Conservative The "Be Liberal in What You Accept, and Conservative in What You Generate" principle is well known in the IETF, but controversial in some cases. The intention is to maximize interoperability. The basic idea is that on generation one should follow the standard strictly as that will work with all other compliant software. On acceptance if one tolerates minor protocol or format violations, it helps work around known bugs in other software. This principle would work great if everyone followed it. However, when there are mixtures of systems which follow this rule and others which don't the exceptions below need to be considered. Consequences of Violation: decreased interoperability, customers blaming the violator of this principle for bugs in other vendor's software, reduced extensibility, loss of functionality. Case Study (conservative accept): The X.400 OIW documents specify Newman [Page 2] Internet Draft Application Protocol Design Principles July 1997 that implementations reject messages which have inconsistent global-domain-identifiers in their message-ids and trace information fields. The result of this recommendation to be "conservative in what you accept" is that lots of mail is rejected unnecessarily, and other implementations now have to force those fields to be equal, which may damage the functionality they are supposed to provide. Case Study (liberal generate): Vendor extensions to html have caused serious interoperability problems in the field. Exception: Don't accept ambiguous interactive input. If the user ends up seeing the data in an indecipherable context, severe consequences result. It's often better to reject the data so the problem can be fixed at the source. Exception: Interactive servers. If client vendors notice their illegal behavior before deploying, it gets fixed before it's deployed, and overall interoperability is increased. Exception Case Study: When netnews was initially deployed, a number of clients generated date headers in a variety of illegal formats. Fairly early in the deployment, a major implementation was modified to discard news messages which had missing or improperly formatted date headers. Very soon after this was deployed, all date headers in news were interoperable. 5. Avoid Silly States Whenever possible, design the system so no silly states are possible. A silly state is a combination of options or values which contradict each other or are nonsensical. A common occurance is use of function bits to indicate non-binary values (e.g. the Marked and Unmarked mailbox flags in IMAP4). Consequences of Violation: Increased complexity to deal with the possibility of the silly states occurring. Case Study: POP3 [POP3] includes both an octet count in the LIST command and an end of text mark in the RETR command. This results in the possibility of the actual octet count differing from the LIST octet count and has caused bugs in practice. Case Study: The Internet mail format [IMAIL] permits the same header to appear multiple times even when this doesn't make sense. This has resulted in the generation of such broken messages which are inconsistantly interpretated. The DRUMS (detailed revision and update of messaging standards) working group is addressing this Newman [Page 3] Internet Draft Application Protocol Design Principles July 1997 problem. 6. Text Not Numbers Whenever possible, text should be used instead of numbers. Numbers almost always have to be looked up in order for humans to interpret them. Text can be read and debugged by a mere mortal. One common counter argument is that numbers are more compact, but if size is a serious concern, a general purpose compression layer is usually a better solution. Another counter argument is that the mapping tables and parsers to convert to internal numbers add complexity. In practice the complexity of debugging a non-text protocol is usually greater than the complexity of the parser and tables. Consequences of Violation: Protocol is difficult to debug, protocol is difficult to understand, examples are hard to provide. Previous three consequences make this equivalent to a KISS violation. Results in poor user interface. Endian problems. Case Study: Problems in binary protocols such as X.400 are very hard to diagnose. The protocol trace has to be recorded and run through an interpretor to debug. SMTP can be debugged by observing the original protocol trace. Case Study: Whenever numeric error codes are used unqualified by text, humans are invariably presented with these error codes, resulting in a poor user interface and debugging difficulties. There is also the problem of poor correlation of numeric codes and actual errors -- in the X.400 case, this has resulted in the EMA publishing correlation tables between implementations, error numbers, and what the errors actually mean. Case Study: The telnet ENVIRON option had to be replaced with NEW-ENVIRON due to endian problems. Exceptions: Compression or Encrytion layers (which make things unreadable anyway). Low-level protocols with high performance requirements. Encapsulated non-text objects. 7. Avoid Alternative Representations Having several ways to represent the same thing results in interoperability problems. In general, implementors will only test the representation format they use. The less often used representations will fail to work. In a worst case scenario, two or more representions are widely used, but systems which use one often can't talk to systems which use another. Newman [Page 4] Internet Draft Application Protocol Design Principles July 1997 Consequences of Violation: Serious interoperability problems, more bugs, conversion support necessary to interoperate. Case Study: The TIFF image format permits both a "big endian" version and a "little endian" version. Some implementations can only read one or the other. Many TIFF applications now have a "Macintosh format TIFF" vs "IBM format TIFF" option when saving TIFF files. Case Study: ASN.1 provides many ways of representing the same thing. This has caused numerous interoperability problems as not all systems support all representations of a given field. Profiles of ASN.1 are usually necessary to interoperate at all. Case Study: Internet addresses [IMAIL] allow several different ways to quote the same address. The useless ones like: "foo"."bar"@do.main rarely work. Exceptions: An alternative representation may be necessary for a more expressive case. For example, quoted strings and literals in IMAP. Rare alternative representations should be avoided. 8. Announce Features, Not Version While version numbers are fine to inform the user of what implementation or conformance level they are at, they are usually a bad idea in protocols. A system where the server announces available features and the client activates the features it wants results in a far better protocol. If a protocol needs to be redesigned from scratch, use of a different port number for the new version will allow a parallel transition period -- otherwise when a major version number is increased on the server, the old clients cease to interoperate with it. Consequences of Violation: Useless version number fields, painful version transition, complexity due to need to support older versions, meaning of version number sometimes ambiguous. Case Study: MIME [MIME-IMB] has the MIME-Version header. Since MIME also has feature announcement via headers, the version number is useless and will never change. Case Study: X.400:1988 fails to interoperate with X.400:1993 due to certain body part types. [XXX: need to confirm] Newman [Page 5] Internet Draft Application Protocol Design Principles July 1997 9. Avoid Unnecessary Layers Whenever two layered services can be combined into a single service without a significant increase in complexity, it should be done. Unnecessary layers result in implementor confusion and more complexity. Consequences of Violation: same as KISS violations Case Study: RFC 822 has a multi-layer parsing model which includes unfolding lines, lexing, removal of linear-white-space, and parsing. This has resulted in endless confusion and serious interoperability problems. The DRUMS WG is folding these into a single formal syntax and the result looks promising. Case Study: ASN.1 requires additional unnecessary layers -- BER and DER. This results in encoding mistakes when the DER layer is forgotten and also makes translating an ASN.1 protocol from its formal definition into code much more complex than necessary. 10. Fully Qualify Data Whenever possible, make sure the data sent is fully qualified either explicitly or by context. Consequences of Violation: serious interoperability problems, a painful and expensive transition to fix the problem. Case Study: Year 2000 problem. Failure to fully qualify years (by using 4 digits) has resulted in very severe problems. This problem was solved for Internet email in 1989 [HOST-REQ]. Case Study: Characters outside the US-ASCII [US-ASCII] repertoire have been used in email without a label. This results in incorrect display of those characters when the email is received on a system with a different localized character set. MIME solved the problem by requiring character set labels. A better solution would be to require the use of an international charset such as UTF-8 [UTF8] since charset labels violate the alternate representations principle. 11. Extensibility This is a corollary of making the uncommon case possible. Protocol syntax has to be designed to permit extensibility so the protocol can evolve with the times. This includes having servers announce features, and leaving simple rules in the formal grammar to skip parameters (especially in server responses) which will be defined Newman [Page 6] Internet Draft Application Protocol Design Principles July 1997 in the future. A common solution is to use arbitrary attribute/value pairs with unknown attributes ignored by older clients/servers. This worked remarkably well in the Internet mail format [IMAIL]. PNG [PNG] is also well designed for this aspect and distinguishes between extensions which can be ignored, and extensions which can't. Consequences of Violation: Painful transition to add extensibility. Duplicate functionality in extended and un-extended form (resulting in alternate representations problem). Complex feature probing. In the worst case, a new protocol version is necessary. Case Study: The IMAP4 "BODY" fetch item [IMAP4] was originally non-extensible and had to be replaced with the extensible "BODYSTRUCTURE" fetch item. Case Study: SMTP [SMTP] required the addition of the EHLO ESMTP command [ESMTP] which was a fairly painful transition. Case Study: X.400 has numerous formats for attachments because initial attempts were not sufficiently extensible. Case Study: ASN.1 provides formats for extensibility, but there is no simple way to skip unfamiliar extensions in most cases. Thus implementations of ASN.1 protocols usually have to be revised to ignore new extensions. 12. Conclusions Based on Design Principles Note that these are the conclusions of the author and do not represent any sort of formal IETF position. KISS: Every protocol should go through a "feature cut review" before going on the standards track. KISS/Text not Binary/Alternate Representations/Avoid Unnecessary Layers/Extensibility: Use of ASN.1 in new protocols should be strongly discouraged. 0,1,N Principle/Text not Numbers: Use of 3 digit SMTP-style error codes in new protocols should be forbidden. Announce Features, Not Version: Server feature announcement should be required in most standards track protocols. Alternate Representations: CRLF line separators should be required. Big endian should be required in new binary protocols and formats. Use of UTF-8 should be preferred over labelled character sets in Newman [Page 7] Internet Draft Application Protocol Design Principles July 1997 new protocols. 13. Security Considerations Many of these can have profound security implications. Violation of KISS makes a security bug more likely. Alternate Representations makes a security bug more likely in a less frequently used representation. A silly state could introduce a security bug if special handling isn't included. Failure to follow the 0,1,N principle when implementing makes buffer overrun problems more likely. Extensibility adds the potential for new security holes in the extensions. While it's harder to fix a security bug in a binary protocol due to the debugging complexity, text protocols tend to be more susceptible to buffer overrun security problems. These two factors probably offset each other. 14. References [ESMTP] Klensin, Freed, Rose, Stefferud, Crocker, "SMTP Service Extensions", RFC 1869, MCI, Innosoft, Dover Beach Consulting, Network Management Associates, Brandenburg Consulting, November 1995. [ESMTP-STATUS] Freed, "SMTP Service Extension for Returning Enhanced Error Codes", RFC 2034, Innosoft, October 1996. [HOST-REQ] Braden, R., "Requirements for Internet Hosts -- Application and Support", RFC 1123, Internet Engineering Task Force, October 1989. [IMAIL] Crocker, D., "Standard for the Format of Arpa Internet Text Messages", RFC 822, University of Delaware, August 1982. Newman [Page 8] Internet Draft Application Protocol Design Principles July 1997 [IMAP4] Crispin, M., "Internet Message Access Protocol - Version 4rev1", RFC 2060, University of Washington, December 1996. [MIME-IMB] Freed, Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", RFC 2045, Innosoft, First Virtual, November 1996. [MIME-IMT] Freed, Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types", RFC 2046, Innosoft, First Virtual, November 1996. [PNG] Boutell, "PNG (Portable Network Graphics) Specification -- Version 1.0", RFC 2083, Boutell.com, March 1997. [POP3] Myers, J., Rose, M., "Post Office Protocol - Version 3", RFC 1939, Carnegie Mellon, Dover Beach Consulting, Inc., May 1996. [SMTP] Postel, "Simple Mail Transfer Protocol", RFC 821, Information Sciences Institute, August 1982. [US-ASCII] "Coded Character Set--7-bit American Standard Code for Information Interchange", ANSI X3.4-1986. [UTF8] Yergeau, F. "UTF-8, a transformation format of Unicode and ISO 10646", RFC 2044, Alis Technologies, October 1996. 15. Author's Address Chris Newman Innosoft International, Inc. 1050 Lakes Drive West Covina, CA 91790 USA Email: chris.newman@innosoft.com Newman [Page 9] Internet Draft Application Protocol Design Principles July 1997 Newman [Page 10]