Network Working Group A. Main Internet-Draft: draft-main-typo-wcard-02 Black Ops Ltd Category: Informational September 2003 Expires: March 2004 Typo-Catching Wildcard Considered Harmful Status of this Memo This document is an Internet-Draft and is subject to all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Abstract Many Internet protocols have as an essential element a mapping from a textual name to some resource. There is a technique, popular in some circles, of providing a default resource, which is returned in response to a lookup that does not match any known name, rather than returning an indication that there is no resource of that name. This technique is dangerous and is to be discouraged in all contexts. This memo discusses the issues. Change Log Note that version numbers are a little screwy around -01 and -02. Changes from -01.1 to -02: o Changed title phrase from "typo wildcard" to the clearer "typo- catching wildcard". Similar rephrasing in the text. Main expires March 2004 [Page 1] Internet-Draft Typo-Catching Wildcard Considered Harmful September 2003 o Added a reference to the IAB's statement on DNS wildcards. (-01.1 identified itself as -02 internally but was published in the internet-drafts directory as -01.) Changes from -01.0 to -01.1: o Added section 7, "Conclusion", since the main recommendation of this memo was otherwise explicated only in the abstract (oops). (-01.0 identified itself as -01 internally but never appeared in the internet-drafts directory.) Changes from -00 to -01.0: o Expanded the paragraph on legitimate wildcard use into section 2.1, "Legitimate Use of Wildcards". o Removed a dangling reference. o Added STD number for RFC 1034 in reference. o Several style/wording changes. o Acknowledgements section added. 1 Introduction Internet protocols define many namespaces that map (usually textual) names to resources. For example, the domain name space of the DNS protocol [DNS-CONC], local parts within a mail domain [SMTP], and HTTP pathnames within a single HTTP authority [HTTP]. Each namespace has an associated resolution mechanism, which determines the resource associated with a particular name. In each case, the resolution protocol provides a way to unambiguously indicate that there is no resource with the name in question: DNS has the NXDOMAIN (Name Error) RCODE, SMTP has the 550 response code, and HTTP the 404 response code. From time to time, organisations controlling particular namespaces have decided that, rather than return such an error code, they would prefer to serve a default, or algorithmically-generated, resource in response to any unrecognised name. The thinking goes something like this: "This user was obviously trying to reach *some* host/user/ document here, but we're not sure which because they mistyped the name. We'd better do something to try to direct them Main expires March 2004 [Page 2] Internet-Draft Typo-Catching Wildcard Considered Harmful September 2003 towards an appropriate resource, because if they got an error response they might not notice their typo and think we don't have what they're looking for, and then we'd lose their business." (As the final clause of that hypothetical monologue suggests, this thinking is typical of commercial entities. It is seen almost nowhere else.) Tempting though such defaulting is for some, it is counterproductive, and even dangerous. It subverts the protocol in question, preventing other error-handling mechanisms from operating. It imposes behaviour that is, unavoidably, appropriate for only a very narrow range of circumstances. It causes confusion and security problems by making naming behave in unexpected ways. The following sections consider the problems of wildcard resources in more detail. Most of the discussion is applicable regardless of the particular protocol, so particular protocols will be mentioned only as examples or where there are real differences. 1.1 Note on a Protocol Feature Of the three protocols mentioned so far, HTTP alone does not suffer from widespread misuse of wildcards. This can be ascribed to the fact that this protocol, unlike the others, allows a resource to be returned to the client *in addition to* a negative response. A popular way to handle an HTTP negative (404) response is to display the resource sent by the server, and that seems to satisfy companies' desire to customise error handling. 1.2 See Also The IAB has published a statement concerning the use of wildcards in DNS [IAB-DNS-WC]. That statement goes into much more detail than this memo concerning the problems caused specifically by DNS wildcards. 2 Abstract Semantics At a conceptual level, the results of name lookups via the appropriate protocol are what define the name-to-resource mapping that is a particular namespace. Usually this is a sparse mapping: most names don't refer to anything, indicated by the negative responses they elicit. With a wildcard resource being returned for each bad name, the name- to-resource mapping has all names referring to something, with most Main expires March 2004 [Page 3] Internet-Draft Typo-Catching Wildcard Considered Harmful September 2003 names referring to the default resource. This is hardly ever the mapping that the manager of the namespace really intends to present. Usually the intent is that the bad name actually does not refer to any particular resource of the expected type, and the wildcard resource is an implementation technique used in order to control what happens when there is a name error. In this case the name-to- resource mapping that is presented via the protocol is misleading. Any client that takes protocol conversations literally will see the namespace differently from how it was intended to be. Computers are renowned for taking things literally, and Internet protocols generally are intended to be interpreted literally. There is no limit to the ways in which the erroneous view of the namespace will be used. 2.1 Legitimate Use of Wildcards Very occasionally one does want a namespace that behaves in the manner described above. For example, some people have a mail domain in which all addresses are valid and lead to the same mailbox, which allows them to use a variety of addresses within that domain without having to explicitly configure them. This is quite different from the use of a wildcard to provide default behaviour. With a legitimate wildcard the various names that are handled by the wildcard are conceptually intended to refer to the default resource, whereas with a typo-catching wildcard the names are conceptually meaningless. Where the protocol permits, a legitimate wildcard often affects only a limited set of names, whereas a typo- catching wildcard is usually as broad as possible. With a legitimate wildcard it is rare to add new names to the namespace in a way that changes a wildcard-handled name to instead refer to an individual resource. Such a change often is viewed as, or is the result of, a change to the architecture of the namespace. With typo-catching wildcards it is common to change wildcard-handled names into ordinary names: it is seen merely as adding a name, despite the more extensive changes to the protocol's view of the namespace. See section 5 for more discussion of this issue. 3 Proper Position of Error Handling The fundamental difference between returning negative responses and having a wildcard resource is which party to the protocol decides how to handle the name error. In the former case it is the client, and in the latter case the server, which provides its error-handling behaviour in the form of a default resource. Main expires March 2004 [Page 4] Internet-Draft Typo-Catching Wildcard Considered Harmful September 2003 When a name error is indicated by a negative response, the client who requested the name lookup is responsible for determining how to handle the condition. The client is fully aware of the name error, and also knows the purpose for which the name resolution was required. It can therefore tailor its response to the precise situation. Common behaviours in this situation include: o pass the error to a higher software layer o report the error to a human user o try another name lookup with a different name o try looking up a name in some other namespace o try doing something else instead (e.g., use a different protocol, or do a fuzzy search in some database using some form of the name) o use a default resource In some cases the name lookup might have been speculative, expecting a high likelihood of a negative response, rather than expecting to find a resource. Name error handling strategies that involve trying a number of possible names (perhaps variations on an incompletely- specified name) expect negative responses as a matter of course, and need those responses in order to function properly. In such cases, a negative response is not viewed as an error condition; whether it is an error condition depends entirely on the client. Having a wildcard resource pre-empts the client, denying it the opportunity to employ its name error handling strategy of choice. The choice is instead made by the server, which is not aware of the reason for the name lookup. The server cannot know what is an appropriate response to the condition. Unavoidably, therefore, the wildcard resource is appropriate only for a very limited set of cases. In any other situation it causes inappropriate behaviour, behaviour other than what the client wanted. An error response that only works correctly in one situation would be as bad as an SMTP server that ignored its input and always produced a fixed sequence of responses: it would work in the one situation it was designed to expect, but cause chaos whenever presented with any other situation. 4 Limitations of Wildcard Resources The previous section gave a list of common client behaviours when faced with a name error. That list is open-ended: the client has Main expires March 2004 [Page 5] Internet-Draft Typo-Catching Wildcard Considered Harmful September 2003 complete freedom. When a server returns a wildcard resource instead of a negative response, not only does it pick one particular behaviour from that list, it is limited to always picking the last one: a default resource. Its error-handling behaviour is therefore limited to what can be packaged into a resource served by the protocol. This rules out most of the interesting possibilities. How limited the possibilities are depends on the particular protocol. Where the expected resource is an IP address, for example, it is possible to return the address of a host that will do interesting things, but looking for an X.25 address instead is not possible. The default resource is also limited to contain only information available to the server. Error-handling behaviours such as falling back to a client-held copy of the expected resource are impossible. 5 Name Stability One variant of the `default resource' technique attempts to find an extant resource with a name similar to the bad name that was looked up. This has the effect of making slightly mistyped names result in the resource the user was seeking, which at first glance may seem desirable, at least when one is in fact dealing with a human user. A major problem with this approach is that the user who gives an incorrect name and sees it work is likely to consider it a correct name. It will in fact work just as well as the truly correct name, but it has a second-class status, and can cease to work in a surprising way. If new names are added to the namespace in question, it may happen that an incorrect name that used to be resolved to one resource is now more similar to the name of a new resource, and so gets resolved to the new resource. From the user's point of view, the name has changed meaning: what used to be a `correct' name for one resource is now a `correct' (possibly even truly correct) name for a different resource. This type of name instability is problematic in many situations; any long-lived reference that uses the incorrect-but-working name can lose its validity. The name confusion can be particularly embarrassing in the case where the resources are private mailboxes. 6 Security Considerations Security mechanisms, like all other software, are built with certain assumptions and expectations. If a security mechanism expects that a particular namespace will be sparse, then a default positive response to name lookups can confuse it just as well as it can confuse non- security software. Main expires March 2004 [Page 6] Internet-Draft Typo-Catching Wildcard Considered Harmful September 2003 For example, it is common to make a lightweight check for validity of the return path of a mail message by checking that there is somewhere to route mail for the domain part of that address. This is based on the expectation that routing will fail for a made-up non-existent domain. If all non-existent domains look like they have an MX or A record, it appears that mail is routable to all such domains, and the return path validity check achieves nothing. The infinite nature of the namespaces constructed by wildcards can in some cases cause problems by itself. Any attempt to enumerate all or some subset of the namespace will result in an arbitrarily large amount of data, leading to issues of resource exhaustion and consequent denial of service. These issues can also arise with legitimate wildcard use, but often legitimate wildcards are to some extent expected; serious problems occur where the infinitely-large namespace is unexpected. 7 Conclusion Typo-catching wildcards are dangerous. They result in publishing data other than what was intended, which misleads any client that believes what it sees. They break reasonable expectations about the behaviour of names. They prevent error-handling, searching, and other mechanisms from operating as intended. The resources that can be served by the technique are necessarily limited in applicability and in capability, causing inappropriate and inadequate behaviour. The typo-catching wildcard is a fundamentally broken idea. It cannot be fixed; it is impossible to patch it to the point of acceptability. The technique should be eschewed in all contexts. The best that can be done with it is to hold it up as an example of how things go wrong when protocols are abused. 8 Acknowledgements Useful commentary on this memo was received from Bill Weinman. 9 Informative References [DNS-CONC] P.V. Mockapetris, "Domain names - concepts and facilities", STD 13, RFC 1034, Nov-01-1987. [HTTP] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. [IAB-DNS-WC] Internet Architecture Board, "Architectural Concerns on the use of DNS Wildcards", 19 September 2003, Main expires March 2004 [Page 7] Internet-Draft Typo-Catching Wildcard Considered Harmful September 2003 . [SMTP] J. Klensin, Ed., "Simple Mail Transfer Protocol", RFC 2821, April 2001. 10 Author's Address Andrew Main Black Ops Ltd Flat 2 84 Isledon Road London N7 7JS United Kingdom Phone: +44 7887 945779 EMail: zefram@fysh.org Main expires March 2004 [Page 8]