Network Working Group                                            A. Main
Internet-Draft: draft-main-typo-wcard-03                   Black Ops Ltd
Category: Informational                                       2003-10-02
Expires: 2004-04-02


               Typo-Catching Wildcard Considered Harmful

Status of this Memo

   This document is an Internet-Draft and is subject to all provisions
   of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/1id-abstracts.html

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html

Abstract

   Many Internet protocols have as an essential element a mapping from a
   textual name to some resource.  There is a technique, popular in some
   circles, of providing a default resource, which is returned in
   response to a lookup that does not match any known name, rather than
   returning an indication that there is no resource of that name.  This
   technique is dangerous and is to be discouraged in all contexts.
   This memo discusses the issues.

Change Log

   Note that version numbers are a little screwy around -01 and -02.

   Changes from -02 to -03:

   o  Renamed section 2.1 to 2.3, and turned the text directly under
      section 2 into sections 2.1, "Semantics of Wildcards", and 2.2,
      "Unintentional Semantics of Typo-Catching Wildcards".  Some


Main                       expires 2004-04-02                   [Page 1]

Internet-Draft  Typo-Catching Wildcard Considered Harmful     2003-10-02


      rewording where the sections are split.

   o  Added section 2.4, "Who Has the Right to Create a Wildcard".

   o  Renamed section 2 to "Semantics of Namespaces".

   Changes from -01.1 to -02:

   o  Changed title phrase from "typo wildcard" to the clearer "typo-
      catching wildcard".  Similar rephrasing in the text.

   o  Added a reference to the IAB's statement on DNS wildcards.

   (-01.1 identified itself as -02 internally but was published in the
   internet-drafts directory as -01.)

   Changes from -01.0 to -01.1:

   o  Added section 7, "Conclusion", since the main recommendation of
      this memo was otherwise explicated only in the abstract (oops).

   (-01.0 identified itself as -01 internally but never appeared in the
   internet-drafts directory.)

   Changes from -00 to -01.0:

   o  Expanded the paragraph on legitimate wildcard use into section
      2.1, "Legitimate Use of Wildcards".

   o  Removed a dangling reference.

   o  Added STD number for RFC 1034 in reference.

   o  Several style/wording changes.

   o  Acknowledgements section added.

1 Introduction

   Internet protocols define many namespaces that map (usually textual)
   names to resources.  For example, the domain name space of the DNS
   protocol [DNS-CONC], local parts within a mail domain [SMTP], and
   HTTP pathnames within a single HTTP authority [HTTP].  Each namespace
   has an associated resolution mechanism, which determines the resource
   associated with a particular name.  In each case, the resolution
   protocol provides a way to unambiguously indicate that there is no
   resource with the name in question: DNS has the NXDOMAIN (Name Error)
   RCODE, SMTP has the 550 response code, and HTTP the 404 response


Main                       expires 2004-04-02                   [Page 2]

Internet-Draft  Typo-Catching Wildcard Considered Harmful     2003-10-02


   code.

   From time to time, organisations controlling particular namespaces
   have decided that, rather than return such an error code, they would
   prefer to serve a default, or algorithmically-generated, resource in
   response to any unrecognised name.  The thinking goes something like
   this:

        "This user was obviously trying to reach *some* host/user/
        document here, but we're not sure which because they mistyped
        the name.  We'd better do something to try to direct them
        towards an appropriate resource, because if they got an error
        response they might not notice their typo and think we don't
        have what they're looking for, and then we'd lose their
        business."

   (As the final clause of that hypothetical monologue suggests, this
   thinking is typical of commercial entities.  It is seen almost
   nowhere else.)

   Tempting though such defaulting is for some, it is counterproductive,
   and even dangerous.  It subverts the protocol in question, preventing
   other error-handling mechanisms from operating.  It imposes behaviour
   that is, unavoidably, appropriate for only a very narrow range of
   circumstances.  It causes confusion and security problems by making
   naming behave in unexpected ways.

   The following sections consider the problems of wildcard resources in
   more detail.  Most of the discussion is applicable regardless of the
   particular protocol, so particular protocols will be mentioned only
   as examples or where there are real differences.

1.1 Note on a Protocol Feature

   Of the three protocols mentioned so far, HTTP alone does not suffer
   from widespread misuse of wildcards.  This can be ascribed to the
   fact that this protocol, unlike the others, allows a resource to be
   returned to the client *in addition to* a negative response.  A
   popular way to handle an HTTP negative (404) response is to display
   the resource sent by the server, and that seems to satisfy companies'
   desire to customise error handling.

1.2 See Also

   The IAB has published a statement concerning the use of wildcards in
   DNS [IAB-DNS-WC].  That statement goes into much more detail than
   this memo concerning the problems caused specifically by DNS
   wildcards.


Main                       expires 2004-04-02                   [Page 3]

Internet-Draft  Typo-Catching Wildcard Considered Harmful     2003-10-02


2 Semantics of Namespaces

2.1 Semantics of Wildcards

   At a conceptual level, the results of name lookups via the
   appropriate protocol are what define the name-to-resource mapping
   that is a particular namespace.  Usually this is a sparse mapping:
   most names don't refer to anything, indicated by the negative
   responses they elicit.

   With a wildcard resource being returned for each bad name, the name-
   to-resource mapping has all names referring to something, with most
   names referring to the default resource.  This is hardly ever the
   mapping that the manager of the namespace really intends to present.

2.2 Unintentional Semantics of Typo-Catching Wildcards

   Usually it is intended that a bad name actually does not refer to any
   particular resource of the expected type.  If a wildcard resource is
   used, it is just an implementation technique used in order to control
   what happens when there is a name error.  In this case the name-to-
   resource mapping that is presented via the protocol is misleading.

   Any client that takes protocol conversations literally will see the
   namespace differently from how it was intended to be.  Computers are
   renowned for taking things literally, and Internet protocols
   generally are intended to be interpreted literally.  There is no
   limit to the ways in which the erroneous view of the namespace will
   be used.

2.3 Legitimate Use of Wildcards

   Very occasionally one does want a namespace that behaves in the
   manner described above.  For example, some people have a mail domain
   in which all addresses are valid and lead to the same mailbox, which
   allows them to use a variety of addresses within that domain without
   having to explicitly configure them.

   This is quite different from the use of a wildcard to provide default
   behaviour.  With a legitimate wildcard the various names that are
   handled by the wildcard are conceptually intended to refer to the
   default resource, whereas with a typo-catching wildcard the names are
   conceptually meaningless.  Where the protocol permits, a legitimate
   wildcard often affects only a limited set of names, whereas a typo-
   catching wildcard is usually as broad as possible.

   With a legitimate wildcard it is rare to add new names to the
   namespace in a way that changes a wildcard-handled name to instead


Main                       expires 2004-04-02                   [Page 4]

Internet-Draft  Typo-Catching Wildcard Considered Harmful     2003-10-02


   refer to an individual resource.  Such a change often is viewed as,
   or is the result of, a change to the architecture of the namespace.
   With typo-catching wildcards it is common to change wildcard-handled
   names into ordinary names: it is seen merely as adding a name,
   despite the more extensive changes to the protocol's view of the
   namespace.  See section 5 for more discussion of this issue.

2.4 Who Has the Right to Create a Wildcard

   Clearly, from the semantics described in section 2.1, the right to
   create a wildcard in a particular namespace implies the right to use
   all the names that are synthesised by the wildcard.  In the usual
   case of a wildcard that affects almost all possible names in the
   namespace, it is rare for sufficient rights to belong to an entity
   that does not have rights to use all possible names.  That is, to a
   close approximation, a wildcard can only be legitimately created by
   an entity that has total rights to determine the use of names in the
   namespace in question.

   Note that having the right to create a wildcard is a distinct issue
   from the question of the wildcard being a good idea (for which see
   section 2.3).

3 Proper Position of Error Handling

   The fundamental difference between returning negative responses and
   having a wildcard resource is which party to the protocol decides how
   to handle the name error.  In the former case it is the client, and
   in the latter case the server, which provides its error-handling
   behaviour in the form of a default resource.

   When a name error is indicated by a negative response, the client who
   requested the name lookup is responsible for determining how to
   handle the condition.  The client is fully aware of the name error,
   and also knows the purpose for which the name resolution was
   required.  It can therefore tailor its response to the precise
   situation.  Common behaviours in this situation include:

   o  pass the error to a higher software layer

   o  report the error to a human user

   o  try another name lookup with a different name

   o  try looking up a name in some other namespace

   o  try doing something else instead (e.g., use a different protocol,
      or do a fuzzy search in some database using some form of the name)


Main                       expires 2004-04-02                   [Page 5]

Internet-Draft  Typo-Catching Wildcard Considered Harmful     2003-10-02


   o  use a default resource

   In some cases the name lookup might have been speculative, expecting
   a high likelihood of a negative response, rather than expecting to
   find a resource.  Name error handling strategies that involve trying
   a number of possible names (perhaps variations on an incompletely-
   specified name) expect negative responses as a matter of course, and
   need those responses in order to function properly.  In such cases, a
   negative response is not viewed as an error condition; whether it is
   an error condition depends entirely on the client.

   Having a wildcard resource pre-empts the client, denying it the
   opportunity to employ its name error handling strategy of choice.
   The choice is instead made by the server, which is not aware of the
   reason for the name lookup.  The server cannot know what is an
   appropriate response to the condition.  Unavoidably, therefore, the
   wildcard resource is appropriate only for a very limited set of
   cases.  In any other situation it causes inappropriate behaviour,
   behaviour other than what the client wanted.

   An error response that only works correctly in one situation would be
   as bad as an SMTP server that ignored its input and always produced a
   fixed sequence of responses: it would work in the one situation it
   was designed to expect, but cause chaos whenever presented with any
   other situation.

4 Limitations of Wildcard Resources

   The previous section gave a list of common client behaviours when
   faced with a name error.  That list is open-ended: the client has
   complete freedom.  When a server returns a wildcard resource instead
   of a negative response, not only does it pick one particular
   behaviour from that list, it is limited to always picking the last
   one: a default resource.  Its error-handling behaviour is therefore
   limited to what can be packaged into a resource served by the
   protocol.  This rules out most of the interesting possibilities.

   How limited the possibilities are depends on the particular protocol.
   Where the expected resource is an IP address, for example, it is
   possible to return the address of a host that will do interesting
   things, but looking for an X.25 address instead is not possible.

   The default resource is also limited to contain only information
   available to the server.  Error-handling behaviours such as falling
   back to a client-held copy of the expected resource are impossible.

5 Name Stability


Main                       expires 2004-04-02                   [Page 6]

Internet-Draft  Typo-Catching Wildcard Considered Harmful     2003-10-02


   One variant of the `default resource' technique attempts to find an
   extant resource with a name similar to the bad name that was looked
   up.  This has the effect of making slightly mistyped names result in
   the resource the user was seeking, which at first glance may seem
   desirable, at least when one is in fact dealing with a human user.

   A major problem with this approach is that the user who gives an
   incorrect name and sees it work is likely to consider it a correct
   name.  It will in fact work just as well as the truly correct name,
   but it has a second-class status, and can cease to work in a
   surprising way.  If new names are added to the namespace in question,
   it may happen that an incorrect name that used to be resolved to one
   resource is now more similar to the name of a new resource, and so
   gets resolved to the new resource.  From the user's point of view,
   the name has changed meaning: what used to be a `correct' name for
   one resource is now a `correct' (possibly even truly correct) name
   for a different resource.

   This type of name instability is problematic in many situations; any
   long-lived reference that uses the incorrect-but-working name can
   lose its validity.  The name confusion can be particularly
   embarrassing in the case where the resources are private mailboxes.

6 Security Considerations

   Security mechanisms, like all other software, are built with certain
   assumptions and expectations.  If a security mechanism expects that a
   particular namespace will be sparse, then a default positive response
   to name lookups can confuse it just as well as it can confuse non-
   security software.

   For example, it is common to make a lightweight check for validity of
   the return path of a mail message by checking that there is somewhere
   to route mail for the domain part of that address.  This is based on
   the expectation that routing will fail for a made-up non-existent
   domain.  If all non-existent domains look like they have an MX or A
   record, it appears that mail is routable to all such domains, and the
   return path validity check achieves nothing.

   The infinite nature of the namespaces constructed by wildcards can in
   some cases cause problems by itself.  Any attempt to enumerate all or
   some subset of the namespace will result in an arbitrarily large
   amount of data, leading to issues of resource exhaustion and
   consequent denial of service.  These issues can also arise with
   legitimate wildcard use, but often legitimate wildcards are to some
   extent expected; serious problems occur where the infinitely-large
   namespace is unexpected.


Main                       expires 2004-04-02                   [Page 7]

Internet-Draft  Typo-Catching Wildcard Considered Harmful     2003-10-02


7 Conclusion

   Typo-catching wildcards are dangerous.  They result in publishing
   data other than what was intended, which misleads any client that
   believes what it sees.  They break reasonable expectations about the
   behaviour of names.  They prevent error-handling, searching, and
   other mechanisms from operating as intended.  The resources that can
   be served by the technique are necessarily limited in applicability
   and in capability, causing inappropriate and inadequate behaviour.

   The typo-catching wildcard is a fundamentally broken idea.  It cannot
   be fixed; it is impossible to patch it to the point of acceptability.
   The technique should be eschewed in all contexts.  The best that can
   be done with it is to hold it up as an example of how things go wrong
   when protocols are abused.

8 Acknowledgements

   Useful commentary on this memo was received from Bill Weinman.

9 Informative References

   [DNS-CONC]   P.V. Mockapetris, "Domain names - concepts and
                facilities", STD 13, RFC 1034, Nov-01-1987.

   [HTTP]       R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L.
                Masinter, P. Leach, T. Berners-Lee, "Hypertext Transfer
                Protocol -- HTTP/1.1", RFC 2616, June 1999.

   [IAB-DNS-WC] Internet Architecture Board, "Architectural Concerns on
                the use of DNS Wildcards", 19 September 2003,
                <http://www.iab.org/documents/docs/2003-09-20-dns-
                wildcards.html>.

   [SMTP]       J. Klensin, Ed., "Simple Mail Transfer Protocol", RFC
                2821, April 2001.

10 Author's Address

   Andrew Main
   Black Ops Ltd
   Flat 2
   84 Isledon Road
   London
   N7 7JS
   United Kingdom

   Phone: +44 7887 945779


Main                       expires 2004-04-02                   [Page 8]

Internet-Draft  Typo-Catching Wildcard Considered Harmful     2003-10-02


   EMail: zefram@fysh.org


Main                       expires 2004-04-02                   [Page 9]