Internet Engineering Task Force J. Ubillos Internet-Draft Swedish Institute of Computer Intended status: Experimental Science Expires: January 13, 2011 Z. Ming Tsinghua University July 12, 2010 Name Based Sockets draft-ubillos-name-based-sockets-01 Abstract This memo defines the name based sockets. Name based sockets allow the application developer to refer to remote hosts (and it self) by name only, passing on all IP (locator) management to the operating system. Applications are thus relieved of re-implementing features such as multi-homing, mobility, NAT traversal, IPv6/IPv4 interoperability and address management in general. The operating system can in turn re-use the same solutions for a whole set of guest applications. Name based sockets aims to provide a whole set of features without adding new indirection layers, new delays or other dependences while maintaining transparent backwards compatibility. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on January 13, 2011. Copyright Notice Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents Ubillos & Ming Expires January 13, 2011 [Page 1] Internet-Draft Name Based Sockets July 2010 (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Conventions . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 3 3.2. Motivation . . . . . . . . . . . . . . . . . . . . . . . . 3 3.3. Name based sockets overview . . . . . . . . . . . . . . . 4 3.4. Protocol overview . . . . . . . . . . . . . . . . . . . . 4 4. Initial name exchange . . . . . . . . . . . . . . . . . . . . 5 4.1. Name format . . . . . . . . . . . . . . . . . . . . . . . 6 5. API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 6. Mobility support . . . . . . . . . . . . . . . . . . . . . . . 8 6.1. Shim6 . . . . . . . . . . . . . . . . . . . . . . . . . . 8 6.1.1. Brief overview of changes . . . . . . . . . . . . . . 8 6.1.2. Identity change . . . . . . . . . . . . . . . . . . . 9 6.1.3. The hand-shake with name exchange . . . . . . . . . . 10 6.1.4. Triggers of shim6 . . . . . . . . . . . . . . . . . . 10 6.1.5. Establishing Shim6 context . . . . . . . . . . . . . . 10 6.1.6. Problems for Shim6 to support mobility . . . . . . . . 11 6.1.7. Changes to REAP . . . . . . . . . . . . . . . . . . . 12 7. Security Considerations . . . . . . . . . . . . . . . . . . . 13 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 9. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 13 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 13 10.1. Normative References . . . . . . . . . . . . . . . . . . . 13 10.2. Informative References . . . . . . . . . . . . . . . . . . 13 10.3. URL References . . . . . . . . . . . . . . . . . . . . . . 13 Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 13 Appendix B. Open Issues . . . . . . . . . . . . . . . . . . . . . 13 Ubillos & Ming Expires January 13, 2011 [Page 2] Internet-Draft Name Based Sockets July 2010 1. Conventions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 2. Terminology Locator - An IP address (v4 or v6) on which a host can be reached. Multi-home - A host which is reachable through multiple locators (on one interface or more) Name - A character string (max 255 chars long) on which a host can be identified. A name maps to zero or more locators. 3. Overview 3.1. Introduction Name based sockets provide a unified interface which caters to the application developers who wish to simply open up a communication to a remote host by its name and have the operating system perform the management of locators. It does so by providing a new address family (AF_NAME) which allows the developer to use a name instead of an IP. 3.2. Motivation Network communication has for very long been based on the assumption that applications should deal with the IP (locator) management. This is based on the legacy notion that an IP does not change during a session and that a session is a communication between two given IPs (locators). This situation has changed, locators change during a session, and a device/host might have multiple locators, hosts may be behind NATs or be on different networks (IPv4/IPv6). Today, this is mostly left to the individual applications to solve. This adds complexity to the applications making it a challenge in it self just to meet the networking demands of applications. Name based sockets aims to fix this by pushing the locator management to the operating system, without introducing any new limitations or delays. Note that Name based sockets do not aim to replace all socket() based communications. There are of course cases which are limited due to obvious boot-strapping problems. E.g. a DHCP client, or an DNS- Ubillos & Ming Expires January 13, 2011 [Page 3] Internet-Draft Name Based Sockets July 2010 querying client would do better in not using a name oriented architecture. 3.3. Name based sockets overview Name based sockets provide a new socket interface. It is implemented using a new address family (AF_NAME). This means that the sockets are only used by applications who explicitly invoke it. The result is that applications that do use name based sockets are very aware of the set of features they are provided with, hence they know not to re-implement it. Current implementations are implemented as kernel modules, however a user-space library implementation ought to work just as well. By using DNS as the ID/Locator mapping structure, we do not introduce any new indirections. Please note that the responding host does not need to do any DNS resolution (explained below). We part from the assumption that most application network calls start with a FQDN. The exchange of names is performed in-band, by piggy-backing the needed information on the first couple of packets exchanged. The consequence is: o No extra delays o No extra features until the name exchange has ended. o As a result of this, we also achieve backwards compatibility (a name exchange which never completes) 3.4. Protocol overview Name based sockets work by performing a name exchange in the beginning of a communication. It does this in an non-blocking way. In practice what happens is that the first few packets are exchanged in a normal legacy fashion. However, to these packets, extra information about the corresponding hosts names are piggy-backed. If a name exchange is successful, the extra features provided by name based sockets are enabled. If the exchange does not succeed, normal legacy communication continues unaffected. The name of each host is either its FQDN, its IP in IPv6.arpa format or an arbitrary nonce. It may or may not be authenticated. In the ordinary case, the name is not authenticated, thus the receiver does not need to perform a reverse or forward lookup, hence not adding any further delays to the first packet(s). The motivation for this is to avoid any additional "first-packet" delays. Ubillos & Ming Expires January 13, 2011 [Page 4] Internet-Draft Name Based Sockets July 2010 Once the name exchange has been performed successfully the complete feature set will be made available to the communication automatically. The expected API for a socket using AF_NAME is the same as for e.g. TCP (SOCK_STREAM). This is also the case for SOCK_DGRAM and similar protocols, for all practical purposes, the functionality remains unchanged, however, as a state is created in both ends, a connection oriented model is more intuitive. 4. Initial name exchange When the sender sends the first packet to the receiver it appends its own name as an IP-Option/IPv6-Extension header. It repeats this for a predefined amount of time or packets. On the receiving end, if the receiver supports name based sockets it appends its own name in the same fashion for a predefined amount of time or packets. Should the receiver not be able to interpret the name, the option/extension header is ignored and the legacy communication precedes as normal. This kind of name exchange has two consequences. First and foremost that there are no extra delays on the initial packets. Secondly that the complete feature set provided by name based sockets will not be available until a few packets have been exchanged. In figure 1 the exchange is depicted. The first packet from the sender has its own name appended to it. The next few packets sent will also have the name appended to it for a predefined number of packets (X in the figure) or until a reply including a name extension is received. On the receiving end, should an incoming packet have a name extension, the receiver begins to append its own name to the sent packets. It does so for a predefined number of packets (X in the figure). Ubillos & Ming Expires January 13, 2011 [Page 5] Internet-Draft Name Based Sockets July 2010 .--------. .----------. | Sender | | Receiver | '--------' '----------' | | | | | .-------------------. | | ------| Data packet |-----> | Packet received. | |+ name piggybacked | | The first response | '-------------------' | packet will have | ----------> | the receiver name | -------> | piggy backed to it. | ----> .-----------------------. | | -> | name piggybacked | | | | until reply or X pkts | | | '-----------------------' | | | | | | | | .-------------------. | | <------| Data packet |----- | | |+ name piggybacked | | | '-------------------' | | <---------- | | <------- | | .------------------. <---- | | | name piggybacked | <- | | | until X pkts | | | '------------------' | | | | | Figure 1 4.1. Name format Names can be provided in any of three ways. o FQDN. The Fully Qualified Domain Name of the host. This will allow e.g. DNSsec to provide authenticity of the name. o ip6.arpa. Using one of the hosts interfaces addresses as a name. o Nonce. A one-use only session identifier. Ubillos & Ming Expires January 13, 2011 [Page 6] Internet-Draft Name Based Sockets July 2010 5. API The API follows the connection oriented model, e.g. TCP. The available calls are: listen(): Prep for incoming session fd = listen( src_name, dst_name, local_port, transport ); open(): Initiate outgoing session fd = open( src_name, dst_name, remote_port, transport ); accept(): Receive incoming session accept( src_name, dst_name, fd ) = accept( fd ); read(): Receive data data = read( fd ); write(): Send data write( fd, data ); close(): Close session close( fd ); Ubillos & Ming Expires January 13, 2011 [Page 7] Internet-Draft Name Based Sockets July 2010 Example code snippet // Header files omitted int main (int argc, const char *argv[]) { int fd; char rec_buf[100], snd_buf[100]; struct sockaddr_name name_sock; fd = socket( AF_NAME, SOCK_STREAM, 0); name_sock.sname_family = AF_NAME; strcpy(name_sock.sname_addr.name, "my.host.name"); name_sock->sname_port = htons(5000); bind(fd, (struct sockaddr *)&name_sock, sizeof(name_sock)); connect(fd, (struct sockaddr *)&name_sock, sizeof(name_sock)); write(fd, snd_buf, sizeof(snd_buf)); read(fd, snd_buf, sizeof(snd_buf)); close(fd); } 6. Mobility support 6.1. Shim6 "... The Shim6 protocol, a layer 3 shim for providing locator agility below the transport protocols, so that multihoming can be provided for IPv6 with failover and load-sharing properties, without assuming that a multihomed site will have a provider-independent IPv6 address prefix announced in the global IPv6 routing table. The hosts in a site that has multiple provider- allocated IPv6 address prefixes will use the Shim6 protocol specified in this document to set up state with peer hosts so that the state can later be used to failover to a different locator pair, should the original one stop working. " RFC5533 [RFC5533] 6.1.1. Brief overview of changes To the upper layers, shim6 provides a stable IP address-like identifier (ULID) to identify the remote host and make the IP addresses (locators) transparent to the application. This way of providing a pseudo-address (ULID) does however invite confusion. The ULID selected by Shim6 is actually an IP address which is available for the application when the connection is being established. This address (ULID) may become invalid during the connection RFC5533 Ubillos & Ming Expires January 13, 2011 [Page 8] Internet-Draft Name Based Sockets July 2010 Section 1.5 [RFC5533]. ULID invalidation is beyond the control of the individual hosts, it is controlled by the network. This might cause confusion if the applications continues to use the ULIDs which are no longer valid. Shim6s solution to this problem to terminate the communication immediately when ever any ULID becomes invalid. This is definitely inappropriate in a mobile scenario as connections are expected to be preserved during the mobile period. Moving between two distinct networks, changing your complete locator set is the common scenario (e.g. entirely switching from one WiFi to another.) Name Based Sockets suggest using the name of a host as the identifier. This solves the above problems, as a name is valid for as long as a host wishes it to be. Also, as Name Based Sockets provide a new explicit interface (names rather than 'fake IP addresses'), applications that use it will be aware of the available features, and may make correct assessments of the underlying IP stack and its enhancements. This section describes a set of changes and improvements to shim6 that are to be incorporated with the Name Based Sockets. Briefly, the changes are: o Name is used as ULID rather than IP (xref target="IdentityChange" />). o Node inter-reachability resilience, for when both nodes are simultaneously mobile (Section 6.1.6.1). 6.1.2. Identity change Shim6 selects a locator (IP) in the initial contact with the remote peer and uses this locator as an upper-layer identifier (ULID). To support NBS, we use name (or some structure related to name) as ULID instead of IP addresses. Because the end point identifier is no longer a locator but a name, the initial name exchange is performed by NBS and Shim6 will further use this name to construct the ULID. Shim6 requires that any communication using a ULID MUST be terminated when the ULID becomes invalid. Using names as ULIDs instead of IP address is more in line with the transport semantic. Having names as ULIDs means that the session may still exist even if both communicating hosts' locator lists are empty at a given point of time. This is particularly important when one or both peer(s) are moving. Ubillos & Ming Expires January 13, 2011 [Page 9] Internet-Draft Name Based Sockets July 2010 Note that replacing a ULID with a name does not mean representing the ULID as a string or a string-like structure. In order to make least modification to both Shim6 protocol (where ULID is a 128-bit IPv6 address) and its current implementation, we propose to represent the name as a 128-bit MD5-hash and use this MD5-hash as the corresponding ULID. 6.1.3. The hand-shake with name exchange As is described in RFC5533 [RFC5533], Shim6 does not need to react immediately when connections start up. The initial name exchange is performed by NBS and it requires no help of Shim6. The name exchanged by NBS will be further used as ULID by Shim6. At some time during the communication, some heuristic may determine that it is appropriate to use shim6 to support mobility/multi-homing, so the communicating hosts initiate a 4-way, context-establishment exchange. As a result, both hosts get a locator list of each other. As an extension to Shim6, we do not change the operation sequence of the 4-way exchange, namely the order of I1, R1, I2, R2 will not be changed. What is changed is that the IP-based ULID is replaced by a name-based ULID and the hand shake no longer requires ULID negotiation because it has already been done by NBS. 6.1.4. Triggers of shim6 It is not necessarily worth paying the overhead of setting up a shim context when e.g. only a small number of packets are exchanged between two hosts. As a result, Shim6 functionality will not be started immediately as a new communication is initiated. NBS uses some heuristic for determining when to perform a deferred context establishment. This heuristic might be that more than 50 packets have been sent or received, or that there was a timer expiration while active packet exchange was in place RFC 5533 [RFC5533]. How the heuristic is designed is beyond the scope of this document. 6.1.5. Establishing Shim6 context At a certain time during the connection, some heuristic on host A or B (or both) determine that it is appropriate to pay the Shim6 overhead to improve host-to-host communication. This makes the Shim6 initiate the 4-way, context-establishment exchange RFC 5533 [RFC5533]. As a result, both A and B get a list of locators for each other. In the case of name-based Shim6, ULID is represented as a MD5-hash of Ubillos & Ming Expires January 13, 2011 [Page 10] Internet-Draft Name Based Sockets July 2010 name rather than IP. 6.1.6. Problems for Shim6 to support mobility When only one host moves to a new network, a REAP Update is triggered to prevent connection from being terminated. Under normal circumstances, connection will be smoothly preserved during the REAP Update process. However, REAP itself is not sufficient to support full mobility functionality, as when both hosts move simultaneously, neither of them will receive the update message, which will lead to a connection loss. To deal with this problem, DNS should be involved to provide address information. 6.1.6.1. DNS querying An effective solution for the mobility problem is to have a "stationary infrastructure" to provide address information for all mobile devices. We propose to use DNS as the stationary infrastructure as it associates addresses with names and has enough capability. How DNS incorporates with name-based Shim6 is described in the following part. 6.1.6.2. One peer moves In the case that only one host moves, the moving host starts a REAP Update process to re-establish Shim6 context with the correspondent host. At the same time, DNS should be updated by the moving host. This procedure is the normal REAP [RFC5534] procedure with the added update to DNS. The following sequence illustrates the details: 1. Two hosts, namely A and B are communicating using NBS and Shim6. 2. At certain moment, A moves to a new network and changes its IP address (locator). 3. A updates the authoritative DNS with its new IP address. In parallel, A starts the REAP Update process by sending B an Update Request and the REAP Update process is invoked. 4. New operational locator pair is found by REAP Update process. 5. Handover process is completed and connection is preserved during the process. Ubillos & Ming Expires January 13, 2011 [Page 11] Internet-Draft Name Based Sockets July 2010 6.1.6.3. Both peers move When both hosts moves simultaneously, neither host will receive the REAP Update Request, thus REAP will fail in finding the new operational locator pair. Under such circumstances, both hosts need to query DNS for the correspondent hosts addresses. When new address is retrieved, both hosts initiate REAP Update process as specified in RFC5534 [RFC5534]. The following sequence illustrates the details: 1. Two hosts, namely A and B are communicating using NBS and Shim6. 2. At certain moment, both A and B move simultaneously and both hosts change their respective IP addresses. 3. Both hosts update DNS with their new addresses and send REAP Update Request to their correspondent peer. 4. Due to concurrent move, Update Requests are lost for both directions. 5. Both hosts experience an Update Timeout and query DNS for correspondent hosts' locators using their names. 6. New addresses are returned by the respective DNS queries and REAP Update is now able to operate. A and B re-invoke a REAP Update process using the new addresses. 7. New operational locator pair is found by REAP Update process. 8. Handover process is completed and connection is preserved during the handover process. 6.1.7. Changes to REAP We extend REAP by adding DNS Querying into its Path Exploration. For the sake of backwards compatibility, DNS can be implemented as a separate module and has no impact on the other part of REAP which is specified in RFC 5534. The DNS functionality can be turned off for stationary hosts and be turned on for mobile devices. In the case of mobile scenario, DNS Query and REAP Path Exploration may work together to provide stronger reliability. DNS query might be lost due to link failure or timeout due to high network delay. Under such circumstances, REAP Path Exploration will be triggered because of a SEND TIMEOUT and tries to find an available path. This is meaningful when a host has multiple available interfaces (for Ubillos & Ming Expires January 13, 2011 [Page 12] Internet-Draft Name Based Sockets July 2010 instance Wi-Fi and 3G) and the address change for one interface does not lead to the change for others. 7. Security Considerations 8. IANA Considerations Name based sockets requires a new address family (AF_NAME) to be defined. 9. Contributors 10. References 10.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 10.2. Informative References [RFC5533] Nordmark, E. and M. Bagnulo, "Shim6: Level 3 Multihoming Shim Protocol for IPv6", RFC 5533, June 2009. [RFC5534] Arkko, J. and I. van Beijnum, "Failure Detection and Locator Pair Exploration Protocol for IPv6 Multihoming", RFC 5534, June 2009. 10.3. URL References Appendix A. Change Log Note to RFC Editor: if this document does not obsolete an existing RFC, please remove this appendix before publication as an RFC. Appendix B. Open Issues Note to RFC Editor: please remove this appendix before publication as an RFC. Ubillos & Ming Expires January 13, 2011 [Page 13] Internet-Draft Name Based Sockets July 2010 Authors' Addresses Javier Ubillos Swedish Institute of Computer Science Kistagangen 16 Kista Sweden Phone: +46767647588 EMail: jav@sics.se Zhongxing Ming Tsinghua University FIT Building 4-104, Tsinghua University Beijing China Phone: + EMail: mingzx@126.com Ubillos & Ming Expires January 13, 2011 [Page 14]