Network Working Group                                     A. Yourtchenko
Internet-Draft                                                   D. Wing
Intended status:  Standards Track                                  cisco
Expires:  February 26, 2011                              August 25, 2010


       NAT confessions: revealing the hosts behind the translator
                  draft-yourtchenko-nat-reveal-hash-00

Abstract

   When an IP address is shared among several subscribers, it is
   impossible to determine which subscriber has initiated that TCP
   connection.  This memo describes a technique to share the identity of
   a subscriber that initiated a TCP connection with the TCP server..
   The proposed method avoids altering the application-level payload and
   works well with SSL-protected connections.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on February 26, 2011.

Copyright Notice

   Copyright (c) 2010 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as


Yourtchenko & Wing      Expires February 26, 2011               [Page 1]

Internet-Draft       Revealing the hosts behind NAPT         August 2010


   described in the Simplified BSD License.


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Notational Conventions . . . . . . . . . . . . . . . . . . . .  4
   3.  Description  . . . . . . . . . . . . . . . . . . . . . . . . .  4
   4.  Calculating the Internal Address Mapping . . . . . . . . . . .  5
   5.  Calculating the Verifier . . . . . . . . . . . . . . . . . . .  6
   6.  Encoding of the VFY into the packet: IP ID encoding  . . . . .  6
   7.  Encoding of the VFY into the packet: TSval encoding  . . . . .  6
   8.  Operation of the mechanism . . . . . . . . . . . . . . . . . .  7
     8.1.  Translator Operation . . . . . . . . . . . . . . . . . . .  7
     8.2.  Server Operation . . . . . . . . . . . . . . . . . . . . .  7
   9.  Interaction with TCP SYN cookies . . . . . . . . . . . . . . .  8
   10. Other Mechanisms to Encode Client Identifier . . . . . . . . .  8
     10.1. Defining a new TCP option to store the address . . . . . .  8
     10.2. Using TSecr in TCP SYN . . . . . . . . . . . . . . . . . .  8
     10.3. Reserving the different port ranges per client . . . . . .  8
   11. Security Considerations  . . . . . . . . . . . . . . . . . . .  8
   12. IANA considerations  . . . . . . . . . . . . . . . . . . . . .  9
   13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . .  9
   14. References . . . . . . . . . . . . . . . . . . . . . . . . . .  9
     14.1. Normative References . . . . . . . . . . . . . . . . . . .  9
     14.2. Informative References . . . . . . . . . . . . . . . . . . 10
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 10


Yourtchenko & Wing      Expires February 26, 2011               [Page 2]

Internet-Draft       Revealing the hosts behind NAPT         August 2010


1.  Introduction

   There are several scenarios where it is valuable to know the identity
   of a TCP client, including geolocation, DoS blocking, and spam
   blacklists.  Today, this is done by equating IPv4 address with
   'identity'.  However, the identity of a TCP client is obscured when
   an IP address is shared I-D.ietf-intarea-shared-addressing-issues
   [I-D.ietf-intarea-shared-addressing-issues].  IP address sharing is
   done by both network address and port translators (NAPT) and by
   application-layer proxies (e.g., HTTP or FTP proxies).

   The current state of the art requires the address sharing alter the
   application-level payload and include the identity of the internal
   host -- usually the internal host's private IP address.  This incurs
   several drawbacks,

   o  adjustment of TCP sequence numbers and acknowledgement numbers for
      the duration of the TCP session

   o  risk of false-positive application matching (e.g., accidentally
      inserting an HTTP header into a non-HTTP payload).

   o  interference with application payload by increasing packet size
      (e.g., MTU)

    With SSL-protected applications the current state of the art
   requires breaking the end-to-end encrypted connection.  This results
   in several undesirable consequences:

   o  necessity for the translator to break the end-to-end encryption,
      typically by installing an addional Certificate Authority on the
      client's CA trust list

   o  noticeable increase in the processing power required on the
      address sharing device to decrypt and re-encrypt that application
      payload

   This specification avoids the problems described above, and defines
   the method of communicating the TCP client's identity to the TCP
   server by overloading the TCP timestamp field and IP Identifier field
   of the initial TCP SYN.

   This extension is necessary because IP address sharing, deployed by
   NAT64 devices, will allow malicious users to connect to IPv4-capable
   servers.  Thus, until a server is only accessible via IPv6 (and
   inaccessible via IPv4), the IPv4-capable server will suffer from an
   inability to identify individual TCP clients as discussed in
   I-D.ietf-intarea-shared-addressing-issues


Yourtchenko & Wing      Expires February 26, 2011               [Page 3]

Internet-Draft       Revealing the hosts behind NAPT         August 2010


   [I-D.ietf-intarea-shared-addressing-issues].


2.  Notational Conventions

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC2119 [RFC2119].


3.  Description

   This proposal leverages the common deployment of  TCP timestamps and
   that a timestamp-aware TCP server will echo the timestamp..

   The caveat with the above is that the remote peer must know in
   advance if the TCP client implements this technique or not -- the
   timestamp on the server side looks just the same.  This could be
   resolved by manual configuration but that is impractical, so an
   automatic detection mechanism is proposed.  The automatic mechanism
   calculates a hash over the values of interest and placing the result
   into another field.  The receiver can then perform the same operation
   and verify.  If the received and computed values match, then the TCP
   timestamp received does contain the encoded internal address.  The
   verifier value is computed as a hash function over the mapped value
   encoded into the timestamp, address after translation, and the TCP
   initial sequence number - i.e. the sequence number within the SYN
   segment.  The usage of the TCP initial sequence number allows to
   avoid the verifier value being almost always the same.  The reason
   for doing so is to satisfy the protocol constraints of the field that
   is used to convey this value.

   In order to find some place for storing this verification value, we
   make another observation:  TCP SYN segments are generally rather
   small, and the minimum MTU on IPv4 is 576.  Typical stacks send the
   TCP SYN with DF=1.  Therefore, they would never be fragmented.  This
   means we could use the 16-bit value of the IP ID to put the verifier
   value in.  The verifier is dependent on the initial sequence number
   (ISN) -- which is should have some randomness properties as described
   in RFC1948 [RFC1948], therefore the IP ID will be reasonably
   different to still serve its purpose even in the extremely unlikely
   case that the TCP SYN is fragmented.

   Using a 16-bit value as a verifier gives 1 in 65536 chances (or,
   0.0015%) probability of erroneously judging that the timestamp
   contains the encoded internal address.  This may be insufficient
   assurance for some of the scenarios.  Therefore, we calculate the
   verifier (referred to as VFY value) to be a 32-bit integer - and


Yourtchenko & Wing      Expires February 26, 2011               [Page 4]

Internet-Draft       Revealing the hosts behind NAPT         August 2010


   store 16 or more bits of this value - at the expense of storing less
   bits of Internal Address Mapping (iAM).  However, we expect that the
   range of iAM for a single public translation would be relatively
   small - so, no information will be lost in this process.


4.  Calculating the Internal Address Mapping

   The main useful property of iAM is that it MUST stay the same for the
   same internal address unless the configuration on the translator has
   changed.  Since the goal is to provide the stable mapping, rather
   than fully reveal the internal address, any method that has this
   property is acceptable - and the choice of it is left to the
   implementors of the translator.  If the addresses to be translated
   are configured as a prefix, then the iAM can be obtained just by
   taking the host bits of the address within the prefix.  If the
   assignment of these addresses is on an individual basis, then the
   simple enumeration might be used.  If the internal addresses are
   assigned to the pool as set of subnets - then the combination of the
   two methods above (the host bits in the least significant part, and
   the enumeration in the most significant part) will give good results.
   This also stimulates allocation of the internal address in equal-
   sized chunks, which should make the maintenance of the network
   easier.

   As a result, the calculation of the iAM on the outgoing SYN segment
   MUST return two values:

   o  iAM = Internal Address Mapping:  a 32-bit unsigned integer

   o  siAM = Size of Internal Address Mapping, in bits:  integer,
      allowed range 9..24 - this is the number of significant bits
      within the iAM.

   The minimum value of siAM being 9 was chosen based on the following
   logic:

   o  having a room of 512 possible hosts allows to keep the property of
      iAM to not change during the smaller configuration changes, in
      case the pool is made up of individual hosts.

   o  the range 9..24 has exactly 16 possible values, which will be
      useful for encoding.

   By encoding only the significant bits of the internal address mapping
   the operator of the translator can minimize the probability of the
   error - all the unused bits are allocated for the value used to
   "fingerprint" the presence of the internal identifier.  The more bits


Yourtchenko & Wing      Expires February 26, 2011               [Page 5]

Internet-Draft       Revealing the hosts behind NAPT         August 2010


   this "Verifier" value can contain - the less is the chance of
   accidental match - and erroneous record of the internal identifier
   when there is none.

   The range from 9 bits to 24 bits allows to encode between 512 and
   16777216 internal identifiers for a single public IP address.


5.  Calculating the Verifier

   The verifier is calculated as a 32-bit result of a hash function.
   This hash function is not expected to be cryptographically strong
   (the 'Security considerations' section explains why), however it
   should have good distribution, good collision resistance, good
   avalanche behavior and be fast and cheap to compute.  These
   properties are satisfied by Murmur hash [URL.Murmur-hash] function,
   therefore it is the hash that we will use.

   The calculation of the VFY is performed as follows:

   VFY = murmur(iAM | AddrPub | siAM, TCP-ISN)

   o  iAM is included into the calculation as a 32 bit word.

   o  siAM is included into the hash calculation as a single byte.
      (TBD:  the 'selector' referenced below might be a more natural
      number to check against, instead of siAM ?).


6.  Encoding of the VFY into the packet: IP ID encoding

   The low 16 bits of the VFY are encoded in network order into the IP
   ID of the packet after translation. the remaining 16 bits form the
   "VFYhi" value, which we attempt to fit into the TSval along with the
   other information.


7.  Encoding of the VFY into the packet: TSval encoding

   The TCP timestamp field encodes the iAM and VFYhi as follows:

    3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1
    1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |E E E E|S S S S| iAM MSB ... iAM LSB  | VFYhi MSB .. VFYhi LSB |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   The range of siAM gives 16 possible ways to store iAM (along with the


Yourtchenko & Wing      Expires February 26, 2011               [Page 6]

Internet-Draft       Revealing the hosts behind NAPT         August 2010


   same number of degrees of assurance for the detection).  In order to
   distinguish between those, we introduce the encoding selector (S)
   field, which will determine how the lower 24 bits are split between
   the iAM and the upper 16 bit of VFY.  Note that the smallest value of
   siAM being 9, we will never be able to store the most significant bit
   of VFY.

   The value of S is the number of zero-fill right-shift operations it
   would take on the low 24 bit in order to "normalize" the iAM - or, in
   other words, it is the number of bits of VFYhi stored within the
   timestamp.

   Best practices in I-D.ietf-tcpm-tcp-timestamps
   [I-D.ietf-tcpm-tcp-timestamps], mention that to reduce the TIME-WAIT
   state the timestamp value should be monotonously increasing across
   the connections with the same 5-tuple.  To give the translators an
   opportunity to achieve this property, we reserve several most
   significant bits within the timestamp to signify the "Epoch" (E).This
   would require storing some additional state per 5-tuple, and the
   implementation of such a mechanism is outside of scope for this
   document.  The implementations that do not implement the monotonously
   increasing timestamps, MUST keep the Epoch bits intact from the
   original value of the timestamp.


8.  Operation of the mechanism

   This section outlines the use of this mechanism by the translators
   and servers.

8.1.  Translator Operation

   The translator is involved into processing of the initial SYN segment
   (calculating the new version of the TCP timestamp and IP ID), as well
   as the SYN-ACK segments (restoring the original value of the TCP
   timestamp within the TSecr field).

8.2.  Server Operation

   The server would operate on every SYN that is of interest for the
   logging.  It would extract the candidate iAM, and calculate the VFY
   value based on the public address and TCP ISN within the received SYN
   segment.  Then it would compare the VFY against the corresponding
   bits in the TSval and IP ID fields.  If there is a match, it means
   (with a reasonable probability) that the iAM was a valid one
   calculated by the translator inbetween.  This information is stored
   for later access by the application listening on that socket (e.g.,
   stored in the TCB).


Yourtchenko & Wing      Expires February 26, 2011               [Page 7]

Internet-Draft       Revealing the hosts behind NAPT         August 2010


9.  Interaction with TCP SYN cookies

   TCP SYN cookies are commonly deployed to mitigate TCP SYN attacks
   RFC4987 [RFC4987].  The mechanism described in this document requires
   the server store extra information which arrives on the TCP SYN,
   which increases the TCP server's attack surface.  To mitigate this,
   the translator should apply the similar algorithm to the timestamp of
   the ACK segment that is sent by the initiator of the connection in
   response to the server's SYN ACK.  The authors considered that
   serverside might use the TSval in its SYN ACK segment, however this
   would interfere with the Extended syncookies.  This section needs
   further discussion.


10.  Other Mechanisms to Encode Client Identifier

   This section outlines other mechanisms that we considered, and
   outlines the reasons we consider them not applicable.

10.1.  Defining a new TCP option to store the address

   This would be the cleanest and simplest approach, and is discussed in
   [ I-D.wing-reveal-address].

10.2.  Using TSecr in TCP SYN

   This value is set to zero, and is effectively unused - so it looks
   like a convenient place.  However this violates the RFC1323
   [RFC1323], and this would require much more thorough testing - and
   update to RFC1323 [RFC1323].

10.3.  Reserving the different port ranges per client

   This approach has an appeal due to its simplicity, but it would be
   specific to each NAPT device operated by each service provider.  That
   is, there is no way to identify the device or know the source port
   range assigned to an TCP client without contacting the administrator
   of the NAPT device.  Restricting clients to a specific range also
   exposes the clients to some security risk I-D.ietf-tsvwg-port-
   randomization [I-D.ietf-tsvwg-port-randomization].


11.  Security Considerations

   The connections that happen, today, without aNAPT necessarily reveal
   the source address of the TCP client -- so revealing the identity of
   the client this should not be a concern except for the installations
   that attempt to use NAPT for "privacy" reasons.  If such an


Yourtchenko & Wing      Expires February 26, 2011               [Page 8]

Internet-Draft       Revealing the hosts behind NAPT         August 2010


   installation exists, it is easy to see that any 1:1 remapping of
   e.g., IP ID would cause the failure of the validation algorithm -
   therefore "protecting the identity".

   Therefore, if an organization has more than one level of NAPT and
   wants to ensure that the internal translators do not disclose the
   information about the internal addresses, it can alter any of the
   elements used for the calculations - e.g. randomize the ISN, or remap
   the IP ID.

   An attacker might might use this functionality to appear as if IP
   address sharing is occuring, in the hopes that a naive server will
   allow additional attack traffic.  TCP servers and applications SHOULD
   NOT assume the mere presence of the functionality described in this
   paper indicates there are other  (benign) users sharing the same IP
   address.

   The modification of the TSVal option value will break TCP-AO  RFC5925
   [RFC5925], which provides integrity protection of the  TCP SYN
   (including TCP options).  However, TCP-AO is already known to not
   survive address sharing (through a NAPT or through an application
   proxy).


12.  IANA considerations

   None.


13.  Acknowledgements

   Thanks to Nicholas Leavy for the review.


14.  References

14.1.  Normative References

   [RFC1323]  Jacobson, V., Braden, B., and D. Borman, "TCP Extensions
              for High Performance", RFC 1323, May 1992.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC5925]  Touch, J., Mankin, A., and R. Bonica, "The TCP
              Authentication Option", RFC 5925, June 2010.


Yourtchenko & Wing      Expires February 26, 2011               [Page 9]

Internet-Draft       Revealing the hosts behind NAPT         August 2010


14.2.  Informative References

   [I-D.ietf-intarea-shared-addressing-issues]
              Ford, M., Boucadair, M., Durand, A., Levis, P., and P.
              Roberts, "Issues with IP Address Sharing",
              draft-ietf-intarea-shared-addressing-issues-01 (work in
              progress), June 2010.

   [I-D.ietf-tcpm-tcp-timestamps]
              Gont, F., "Reducing the TIME-WAIT state using TCP
              timestamps", draft-ietf-tcpm-tcp-timestamps-00 (work in
              progress), June 2010.

   [I-D.ietf-tsvwg-port-randomization]
              Larsen, M. and F. Gont, "Transport Protocol Port
              Randomization Recommendations",
              draft-ietf-tsvwg-port-randomization-09 (work in progress),
              August 2010.

   [RFC1948]  Bellovin, S., "Defending Against Sequence Number Attacks",
              RFC 1948, May 1996.

   [RFC4987]  Eddy, W., "TCP SYN Flooding Attacks and Common
              Mitigations", RFC 4987, August 2007.

   [URL.Murmur-hash]
              "Murmur hash", <http://sites.google.com/site/murmurhash/>.


Authors' Addresses

   Andrew Yourtchenko
   cisco
   6a de Kleetlaan
   Diegem  1831
   BE

   Phone:  +32 2 704 5494
   Email:  ayourtch@cisco.com


Yourtchenko & Wing      Expires February 26, 2011              [Page 10]

Internet-Draft       Revealing the hosts behind NAPT         August 2010


   Dan Wing
   cisco
   170 West Tasman Drive
   San Jose  CA 95134
   USA

   Email:  dwing@cisco.com


Yourtchenko & Wing      Expires February 26, 2011              [Page 11]