<?xml version="1.0" encoding="UTF-8"?>
<!-- edited with XMLSPY v5 rel. 3 U (http://www.xmlspy.com)
     by Daniel M Kohn (private) -->
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY rfc2119 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
]>
<rfc category="std" ipr="trust200902" docName="draft-bellovin-hpw-00.txt">
    <?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
    <?rfc symrefs="yes" ?>
    <?rfc sortrefs="yes"?>
    <?rfc iprnotified="no" ?>
    <?rfc strict="yes" ?>
    <front>
        <title>Hashed Password Exchange</title>
        <author initials="S.M." surname="Bellovin" fullname="S.M. Bellovin">
            <organization>
	      Columbia University
	    </organization>
                <address>
                    <postal>
                        <street>1214 Amsterdam Avenue</street>
                        <street>MC 0401</street>
                        <city>New York</city>
                        <region>NY</region>
                        <code>10027</code>
                        <country>US</country>
                    </postal>
                    <phone>+1 212 939 7149</phone>
                    <email>bellovin@acm.org</email>
                </address>
            </author>
            <date month="January" year="2012"/>
            <abstract>
                <t>
		    Many systems (e.g., cryptographic protocols relying
		    on symmetric cryptography) require that plaintext
		    passwords be stored.  Given how often people reuse
		    passwords on different systems, this poses a very
		    serious risk if a single machine is compromised.
		    We propose a scheme to derive passwords limited
		    to a single machine from a typed password, and
		    explain how a protocol definition can specify
		    this scheme.
		</t>
            </abstract>
        </front>
        <middle>
	    <section title="Introduction">
		<t>
		    Today, despite the lessons of more than 30 years
		    [[cite Morris and Thomson]], many systems store
		    plaintext passwords.  This is often done for
		    good reasons, such as authenticating cryptographic
		    exchanges or as a convenience to users with
		    many passwords; see, for example, the password
		    store in many browsers or the Keychain in MacOS.
		    That said, this practice does pose a security
		    risk to users, since their passwords are in
		    danger if the system is compromised.
		</t>
		<t>
		    The big problem is not compromise of the
		    actual password used on that system; while
		    regrettable, it is inherent in the service definition.
		    Rather, the problem is that users tend to
		    reuse passwords on different systems.  If a
		    password is compromised on one machine, the user
		    is at risk on many different systems.  Accordingly,
		    we describe a scheme for storing a single-site-only
		    password, derived from the user's typed password;
		    a compromise of a service
		    thus affects just that service.
		</t>
		<t>
		    To accomplish this,
		    we specify a "Hashed Password
		    Exchange" standard, or rather, a metastandard.
		    Rather than specifying a precise way to store
		    and use hashed passwords, we give rules for
		    specifying hashed passwords for use in a given
		    protocol or application.  We take advantage of
		    the fact that unlike 1979, when users used very
		    dumb terminals to transmit passwords directly
		    to the receiving applications, most passwords
		    these days are entered into user-controlled
		    software; these programs in turn transmit the
		    passwords to the verifying applications.  There
		    is thus intelligence on the user's side; we
		    will use this to irreversibly transform the
		    entered password into some other string.  By
		    the same token, the receiving system must apply
		    the same transform to the authenticator supplied
		    at user enrollment time or password change time.
		    Because two independent pieces of software must
		    apply the same transformation, the algorithm
		    must be precisely specified in standards
		    documents.
		</t>
		    <section title="Requirements Notation">
			<t>
				The key words "MUST", "MUST NOT", "REQUIRED",
				"SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT",
				"RECOMMENDED", "MAY", and "OPTIONAL"
				in this document are to be interpreted
				as described in <xref target="RFC2119"/>.</t>
		    </section>
	    </section>
	    <section title="Definitions and Goals">
		<t>
		    We use the following definitions:
		    <list style="hanging">
			<t hangText="Username">
				An arbitrary string, the
				syntax of which
				is application-dependent,
				employed by both
				the user and the
				verifying system
				to uniquely identify
				a given user.
			</t>
			<t hangText="Entered Password">
				The authenticator typed by
				the user to his or
				her own software.
				The usual quality
				rules (length,
				special characters, etc.)
				can be applied; that is out
				of the scope of this standard.
			</t>
			<t hangText="Effective Password">
				The actual, over-the-wire,
				string transmitted
				by the user's
				software.
			</t>
			<t hangText="Service">
				A particular application
				on a particular machine or
				cluster of machines
				appearing as a
				single machine
			</t>
			<t hangText="Service URI">
				A URI <xref target="RFC3986"/> for which
				this effective password
				should be valid.
				Only the scheme
				name, userinfo,
				and host name portions are
				discussed here; use of path
				information is protocol-dependent.
				In the userinfo field, only
				the username is used.  An example
				is given below.
			</t>
		    </list>
		</t>
		<t>
		    Our scheme has the following goals:
		    <list style="numbers">
			<t>
				No two users of a given service
				should have the same
				effecive password, even if the
				entered passwords are the same.
			</t>
			<t>
				No two effective passwords for the
				same user should be the
				same for different services,
				even if the entered passwords
				are the same.
			</t>
			<t>
				It should be infeasible to invert
				the hashing function to
				retrieve the entered password
				from an effective password and
				service URI.
			</t>
			<t>
				It should be computationally expensive
				to mount dictionary
				attacks on compromised effective
				passwords.
			</t>
		    </list>
		</t>
	    </section>
	    <section title="The Hashed Password Scheme">
		<t>
		    Fundamentally, we calculate the effective
		    password by iterating HMAC <xref target="RFC2104" />,
		    using
		    the entered password as the key and the service
		    URI as the data.  This meets all four of our
		    goals: <list style="numbers">
			<t>
				Since the username is part of the
				service URI, different users will
				have different URIs, and hence
				different effective passwords.
			</t>
			<t>
				Since the hostname is part of the
				URI, different services for
				any given user will have different
				URIs, and hence different effective
				passwords.
			</t>
			<t>
				For any reasonable underlying hash
				function, it is believed
				to be infeasible to invert HMAC;
				see <xref target="RFC2104" /> for
				details.
			</t>
			<t>
				By iterating a sufficient number
				of times, dictionary attacks
				can be made arbitrarily expensive.
			</t>
		    </list>
		</t>
		<t>
		    We do not use a salt in this scheme.  The primary
		    purposes of a salt are to achieve our first and
		    second goals, which we achieve in other ways.
		    A salt also protects against precomputation of
		    possible passwords of known users in anticipation
		    of a later password file compromise; however,
		    since the salt must be used in calculating the
		    effective password, it would have to be known
		    to the user as well as the server, and users
		    typically have multiple devices on which they
		    enter passwords.  Using a salt would require
		    that users know it and reenter it, which we
		    regard as infeasible and of limited benefit.
		</t>
		<t>
		    Usernames and the hostname portions of service
		    URIs must be canonicalized before applying HMAC.
		    Legal characters in a username are upper and
		    lower case US-ASCII letters, period, hyphen,
		    underscore, and digits.  All other characters
		    MUST be percent-encoded, per section 2.1 of
		    <xref target="RFC3986" />.  Hostnames MUST be canonicalized
		    per <xref target="RFC5890" /><xref target="RFC5891" />
		    and converted to lower
		    case.  How usernames and hostnames are entered
		    is application- and implementation-dependent,
		    and not part of this specification.  The hostname
		    used is either the string users type or
		    unambiguously derivable from it per specified
		    rules.
		</t>
		<t>
		    The URI scheme name is given by the protocol
		    specification and MUST NOT be entered directly
		    by the user.
		</t>
		<t>
		    The iteration count is protocol- and use-dependent,
		    and given in the protocol specification.
		</t>
		<t>
		    The effective password, then, is calculated by
		    iterating HMAC some number of times over the
		    message
		</t>
		<t>
		    <list>
			<t>
			    scheme://username@hostname
			</t>
		    </list>
		</t>
		<t>
		    with the entered password as the key.
		</t>
		<section title="Examples">
			<t>
			    <list>
				<t>
				    ipsec://someuser@gw.example.net
				    <vspace blankLines="0" />
				    imap://someuser@mail.example.com
				    <vspace blankLines="0" />
				    submission://someuser@mail.example.com
				</t>
			    </list>
			    Note that although someuser can specify
			    the same entered password for both 'imap'
			    and 'submission' on mail.example.com, the
			    effective passwords will be different.
			</t>
		</section>
	    </section>
	    <section title="Specifying Hashed Password Exchange">
		<t>
		    The following elements must be in any protocol
		    specification that uses Hashed Password Exchange.
		    <list style="symbols">
			<t>
				The scheme name MUST be specified.
				Generally, this will be taken from
				the IANA name assigned to the port,
				but this is not required.  Thus, a
				mail submission URI (TCP port 587)
				might use the scheme name "submission".
			</t>
			<t>
				The rules for deriving the hostname
				from what users enter MUST be
				specified.  They may be as simple
				as "use the name the user specifies,
				e.g., imap.example.com", or they
				may account for common alternatives:
				"If the specified host name does
				not begin with 'www.', prepend it;
				thus, both 'example.com' and
				'www.example.com' would use the
				hostname 'www.example.com' in forming
				the URI.
			</t>
			<t>
				The iteration count MUST be specified.
				The value -- typically in the
				hundreds of thousands with today's
				technology -- SHOULD be different
				for different services, and MAY be
				adjusted based on the platforms on
				which the calculations are typically
				done.  Note that the iteration is
				done at password change time rather
				than run-time, so expense is not a
				major concern.  (Just how long the
				iterations should take will depend
				on the protocol designers' understanding
				of likely platforms and usage
				patterns.  Something that will be
				run exclusively on fast devices and
				with stored hashed passwords should
				use a higher count; something where
				run-time
				user password entry on a slow device
				is considered likely should use a
				lower count.)
			</t>
			<t>
				The rules, if any, for canonicalizing
				the entered password MUST be
				specified.  It is perfectly acceptable
				to accept an arbitrary octet string
				here, but that is not required.
			</t>
			<t>
				The hash function to be used with
				HMAC MUST be specified.  MD5
				<xref target="RFC1321"/> is
				more than sufficient; however, the
				tradeoff is likely to be between
				what code is likely to be available
				in implenetations versus the iteration
				count.  SHA-512
				<xref target="RFC6234"/> is
				much slower than
				MD5, but since the goal is constant
				time, this matters very little;
				thus, MD5 would have a higher
				iteration count than SHA-512 would
				for the same protocol.
			</t>
			<t>
				The encoding rules for sending the
				effective password over the wire.
				The output of HMAC is an arbitrary
				byte string.  Given the length of
				typical HMAC output and the infrequency
				with which they are sent, transmission
				efficiency is not a major concern,
				so a simple hexadecimal encoding
				is fine.  Implementations MAY specify
				truncation; however, they SHOULD
				NOT use effective passwords shorter
				than 16 octets before encoding.
			</t>
			<t>
				If the protocol permits negotiation
				of authentication methods, a separate
				code point MUST be assigned to this
				scheme.
			</t>
		    </list>
		</t>
		<t>
		    How passwords are changed -- that is,
		    how new effective passwords
		    are supplied to the
		    verifying machine -- is beyond the scope of this
		    specification.  If the entered password is sent
		    directly at password change time, quality checks
		    can be enforced; however, this exposes entered
		    passwords to attacks who have compromised the
		    verifying machine.  This is not a major risk,
		    since the rate of password change is low.
		    Conversely, client-side code (e.g., Javascript)
		    can make advisory recommendations on password
		    strength; while the server cannot enforce this,
		    since it will see only effective passwords,
		    very few users will have the will and the skill
		    to override this.
		</t>
		<t>
		    If effective passwords are used only for the
		    usual password verification and not for
		    cryptographic purposes, they should be treated
		    with the care used for ordinary password, i.e.,
		    salted, hashed, etc.  There is little need for
		    extra iterations, though, since the iteration
		    used in calculating them already provides strong
		    protection against dictionary attacks, and it
		    is unlikely that the extra server-side iterations
		    will be significantly larger than the iterations
		    already performed to comply with this specification.
		</t>
	    </section> <section title="Security Considerations">
		<t>To be written.</t>
	    </section>
	</middle> <back>
	    <references title="Normative References">
		<?rfc include="reference.RFC.1321.xml" ?>
		<?rfc include="reference.RFC.2104.xml" ?>
		<?rfc include="reference.RFC.2119.xml" ?>
		<?rfc include="reference.RFC.3986.xml" ?>
		<?rfc include="reference.RFC.5890.xml" ?>
		<?rfc include="reference.RFC.5891.xml" ?>
		<?rfc include="reference.RFC.6234.xml" ?>
	    </references>
	</back>
    </rfc>

