P2PSIP Working Group J. Peng Internet-Draft L. Le Intended status: Standards Track China Mobile Expires: May 2, 2012 K. Feng Beijing University of Posts and Telecommunications October 30, 2011 One Hop Lookups Algorithm Plugin for RELOAD draft-peng-p2psip-one-hop-plugin-01 Abstract This document proposes an implementation of overlay plugin algorithm which is called one hop lookups to provide examples and references for the research of one hop based RELOAD. With the development of the real time communications, there are high demands for the improvement of routing efficiency. In the one hop algorithm, each peer maintains complete membership information which can guarantee one hop lookups to improve the routing efficiency. For the RELOAD, using the one hop lookups algorithm to construct Topology Plugin with the same RELOAD core can have a better support for VoIP applications, and the implementation of one hop lookups algorithm plugin is based on the methods provided by RELAOD. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on May 2, 2012. Copyright Notice Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. Peng, et al. Expires May 2, 2012 [Page 1] Internet-Draft One Hop Lookups Plugin October 2011 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Hash functions . . . . . . . . . . . . . . . . . . . . . . . . 4 4. Peer data structure . . . . . . . . . . . . . . . . . . . . . 4 5. Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 6. Joining . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 7. Updates . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 7.1. Update messages definition . . . . . . . . . . . . . . . . 8 7.1.1. Update types . . . . . . . . . . . . . . . . . . . . . 9 7.1.2. Peer types . . . . . . . . . . . . . . . . . . . . . . 9 7.1.3. Event notification types . . . . . . . . . . . . . . . 9 7.1.4. PeerContactItem struct . . . . . . . . . . . . . . . . 9 7.1.5. EventNotification struct . . . . . . . . . . . . . . . 10 7.1.6. DataStructureContent struct . . . . . . . . . . . . . 13 7.1.7. OneHopUpdate message . . . . . . . . . . . . . . . . . 14 7.2. Different using strategies . . . . . . . . . . . . . . . . 15 7.2.1. One hop lookups . . . . . . . . . . . . . . . . . . . 15 7.2.2. D1HT . . . . . . . . . . . . . . . . . . . . . . . . . 15 7.2.3. 1h-Calot . . . . . . . . . . . . . . . . . . . . . . . 15 7.2.4. Two hop lookups . . . . . . . . . . . . . . . . . . . 16 8. Leaving . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 9. Replication . . . . . . . . . . . . . . . . . . . . . . . . . 17 10. Fault tolerance . . . . . . . . . . . . . . . . . . . . . . . 17 10.1. First hop fail . . . . . . . . . . . . . . . . . . . . . . 17 10.2. Leaders fail . . . . . . . . . . . . . . . . . . . . . . . 18 11. Security Considerations . . . . . . . . . . . . . . . . . . . 18 12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 18 14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 14.1. Normative References . . . . . . . . . . . . . . . . . . . 18 14.2. Informative References . . . . . . . . . . . . . . . . . . 19 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 19 Peng, et al. Expires May 2, 2012 [Page 2] Internet-Draft One Hop Lookups Plugin October 2011 1. Introduction RELOAD [I-D.ietf-p2psip-base] is a peer-to-peer (P2P) signaling protocol for use on the Internet. The Architecture of RELOAD has a routing layer which is called the Topology Plugin. It is responsible for implementing the specific overlay algorithm being used. In the RELOAD base protocol, the Topology Plugin is described by Chord, and we define another DHT algorithm in order to provide a high rouging efficiency which can be used in real time communications for a better performance. The real time communication has a high demand on the routing efficiency, for example, the VoIP usage on the application level of RELOAD, it is no doubt that the routing speed or efficiency must be very important, and it depends on the looking up efficiency of the Overlay Topology. There are many kinds of structured peer-to-peer overlay network algorithms like Chord, Pastry, CAN and so on. Most of them have one in common is that they have a routing table which can maintain a small amount of routing state. But all these algorithms need nearly O (log N) steps to find one peer or resource, if there are a large number of peers in the overlay, it costs a lot time to locate peer or resource which may have a bad influence on the application level of RELOAD. The one hop lookups algorithm gives one way to maintain complete membership information at each node in order to increase the routing efficiency. Being a kind of Topology Plugin, O (1) protocols achieve low latency lookups on small or low-churn networks because lookups take only a few hups, but incur high maintenance traffic on large or high-churn networks. On contrast, O (log N) protocols incur less maintenance traffic on large or high-churn networks but require more lookup hops in small networks. The experiment results show that the routing table maintenance bandwidth using is still acceptable when the number of overlay nodes equal 10^5 and 10^6 [SingleHopDHT]. So, weighing the advantages and disadvantages, in a relative small and stable network like a low-churn telecom core net with VoIP applications, the O (1) algorithms are very important as a Topology Plugin of RELOAD. In the one hop algorithm, each peer maintains a complete description of system memberships, and it presents techniques for the peers to maintain this information accurately with lower costs of communications. Different one hop algorithm implementations have different ways to keep the accuracy of routing states. For example, the one hop lookups for peer-to-peer overlay [One-Hop-Lookups] uses event notification mechanism without broadcast to update all the peers' routing table during several seconds; the SandStone [SandStone] gives a way to collect update information by SPM (Super Peer Maintenance) which is called SPM assisted one hop enhancement. Peng, et al. Expires May 2, 2012 [Page 3] Internet-Draft One Hop Lookups Plugin October 2011 The SandStone overlay is typically deployed as a two-layered DHT, including a global DHT and several regional DHT. There is at least one SPM node in each region, and once one peer's state change is detected by its neighboring peer, the neighbor will notify its local SPM, and then the SPM will disseminate the event to other SPMs, finally each SPM will broadcast this notification to all peers under the charge of it. The D1HT [D1HT] algorithm has an EDRA (Event Detection and Reporting Algorithm) to implemnt the event notification schema. In the D1HT, all the peers are ordinary, and all of them use the EDRA to make the goal of routing table maintenance. Like the D1HT, 1h-Calot [1h-Calot] is also a kind of one hop algorithms in a purely peer-to-peer environment, but it informs the event notifications by constructing a multicasting tree. Besides, there is also an algorithm called two hop lookups [TwoHopLookups] which is an improvement of the one hop lookups to support a larger size overlay. We use the one hop lookups algorithm combined with two hop lookups, D1HT and 1h-Calot to construct the RELOAD Topology Plugin because there is no SPM like peers and two layered topology in RELOAD base. In this draft, we first give a simple overview of one hop lookups algorithm with some definitions, then fill the overlay specific data structures into the RELOAD frame and implements this algorithm with topology messages defined in the RELOAD Topology Plugin; at last, the replication and fault tolerance strategies are stated. All of the definitions and implementations based on the requirements and methods provided by RELOAD. 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. 3. Hash functions In this one hop lookups algorithm topology plugin, the size of the node ID and resource ID is 128 bits. The hash of an ID can be computed by SHA-1 [RFC3174] and other hash functions. 4. Peer data structure A peer, or a node, is responsible for a particular Resource-ID which is less than or equal to its Node-ID and is greater than the Node-ID of the previous peer in the neighborhood. In order to have a high routing accuracy and efficiency, each peer must keep and maintain a Peng, et al. Expires May 2, 2012 [Page 4] Internet-Draft One Hop Lookups Plugin October 2011 series of data structure. In the one hop lookups algorithm, we divide the 128-bit circular identifier space into k equal contiguous intervals which are called slices. Each slice has a slice leader, which is chosen as the node that is the successor of the mid-point of the slice space (other choosing methods are list in Section X). Similarly, each slice also can be divided into u equal-sized intervals which are called units; each unit has a unit leader, which is chosen as the successor of the mid-point of the unit space. And of course every peer in the overlay should know its unit header and slice header. The slice leader and unit leader are both ordinary peers who are chosen dynamically, and they also can be configured at first. The leader choosing strategies will be stated in section 10. In the SandStone [SandStone], there is at least one SPM node in each region, so when one peer detects the state change of overlay in its neighborhood, it will notify its local SPM as soon as possible. The local SPM then disseminate this notification to other SPMs, at last, each SPM will broadcast this update to all peers under the charge of it. The SPM is a special peer which can collect update information and do some other management tasks. The SPM does not participate in the routing procedures and it is pre-configured. On the other hand, the SPM uses the broadcast to spread the notifications. Only the bootstrap peer in RELOAD has the broadcast function. Using the broadcast is a simple and fast way to inform information, but it costs a lot of bandwidths and also may cause the notifications redundancy. Using the event based notification mechanism defined in one hop lookups can avoid redundancy in the communications: one peer will get information only from its neighbor that is one step closer to its unit leader. This implies that within a unit, information is always flowing from the unit leader to the ends of the unit. But its disadvantage is obvious: the notifications are spread too slowly which may cause the delay of overlay stabilization. In the D1HT, all the peers are same; they use the EDRA mechanism to make sure the event can be disseminated to all peers. To disseminate the information about the events, each peer p sends up to m propagation messages at each time interval. The D1HT algorithm uses the EDRA to spread event notifications. The details of EDRA can be found in [D1HT]. In the 1h-Calot, a purely peer-to-peer architecture is preferable, in which nodes assume equal roles and share load evently. It uses overlay multicast to efficiently replicate complete O (n) routing tables on every node. And the peer in 1h-Calot also informs other peers of its arrival or departure by multicasting a notification through a tree. The details of multicasting tree construction can be Peng, et al. Expires May 2, 2012 [Page 5] Internet-Draft One Hop Lookups Plugin October 2011 found in [1h-Calot]. In the two hop algorithm, the architecture is same as one hop algorithm with slice leaders and unit leaders. In addition, every slice leader chooses a group of its own nodes for each other slice; the group may be chosen randomly. Then the slice leader sends routing information about one group to exactly one other slice leader. The infornation about the group is then disseminated to all members of that slice as in the one hop event notification schema. After this, every node not only keeps full routing information about nodes in its own slice, but also keeps a table of k-1 (k is the number of slice) nodes that are close to it, one from every other slice. From above, we can conclude the information peer should maintain as follows. Routing table: The routing table of each node keeps all of nodes' address, and then the source node can find the destination by just one hop at most if the items of routing table are right. So it is very important to keep the correctness of each node's full routing table. In order to achieve the goal, the notification of membership change events must reach every node in the overlay within a specified amount of time. Predecessor and successor information: The information especially Node-ID and address about peer's predecessor and successor must be kept in order to maintain the overlay circle. A peer will communicate with its predecessor and successor with keep-alive message (defined in Update message) every several seconds. Of course, this structure can be added as mark-ups or identifiers in the routing table. Unit leader information: The information about the unit leader who is responsible for the unit the peer belongs to. A peer can communicate with its unit leader to receive the Update message. Of course, this structure can be added as mark-ups or identifiers in the routing table. Slice leader information: The information about the slice leader who is responsible for the slice the peer belongs to. A peer can send event notification message (defined in Update message) to its slice leader to inform events. Of course, this structure can be added as mark-ups or identifiers in the routing table. The four kinds of information are maintained in all of peers, but there are still some kinds of other data structures should be kept in some special peers like unit leader and slice leader. Peng, et al. Expires May 2, 2012 [Page 6] Internet-Draft One Hop Lookups Plugin October 2011 Unit boundary identifier: This identifier does not keep in all the peers, it is stored on the peer who is the boundary of the unit in order to make sure the Update message just being transferred within unit. Unit boundary peer list: The unit leader should know the information about the unit's boundary peer in order to know its control area. Slice leader list: Every slice leader should know the information about all the slice leaders in the overlay in order to dispatch the event notification messages in time. Group member list: Every slice leader should know the information about all the group members in the slice it is responsible in order to exchange group messages with other slice leaders. This list will be used in the two hop lookups algorithm. Each peer with these data structures can compose a well running overlay topology based on the one hop lookups algorithm. 5. Routing The routing table of the peer has full information of all the nodes in the overlay. If the peer is not responsible for a resource whose ID is r, it will query the routing table in order to find the first peer whose id is greater than r, which means that the peer should be responsible for this resource; and then sends a request message to the destination node. If no such node is found, it finds the largest Node-ID in the interval between the peer and Resource-ID r. In the two hop algorithm, when a peer wants to query the successor of a resource, it sends a lookup request to its chosen peer in the slice containing the key of this resource. The chosen peer then examines its own routing table to identify the successor of the key and forwards the request to that peer. So, one peer can find the resource at most two steps. 6. Joining The Join message defined in [I-D.ietf-p2psip-base] with a Node-ID has contained enough information to construct a joining request. The joining process for a joining peer (JP) with Node-ID n is as follows (using the one hop lookups algorithm as an example). Peng, et al. Expires May 2, 2012 [Page 7] Internet-Draft One Hop Lookups Plugin October 2011 (1) JP MUST connect to its chosen or preconfigured bootstrap node. (2) JP SHOULD send an Attach request to its admitting peer (AP) for Node-ID n. The "send_update" flag should be used to acquire the routing table and other information for AP. (3) JP MUST send a Join to AP, the AP sends the response to the Join. (4) AP MUST do a series of Store requests to JP to store the data that JP will be responsible for. (5) JP SHOULD send Attach request to its predecessor in order to let it knows the arriving of new peer; the predecessor should update its successor information at once. (6) The predecessor detects a change in its routing table for example it has a new successor, it MUST send an event notification message which is a kind of Update messages to its slice leader directly. Of course, the Update message can also be sent by the AD. (7) The slice leader MUST collect all notifications it receives from the peers in his slice and sends Update messages to other slice leaders every several seconds to inform these events. (8) Other slice leaders MUST dispatch the Update messages they received to all the unit leaders in their respective slices. (9) A unit leader SHOULD piggyback this information on its Update message to its successor and predecessor. (10) Other nodes propagate this information from their successors; they SHOULD send it to their successors and vice versa. (11) Peers at unit boundaries SHOULD NOT send information to their neighboring peers outside their unit. The Update messages depending on the RELOAD message formats will be defined in the next section. The peer can find its successor by sending a Find request with the Resource-ID equals its Node-ID. 7. Updates The update message is the primary overlay-specific maintenance message which can be used by the sender to notify other peers the current state of overlay. Every peer in the overlay can maintain and keep the correctess of routing table by sending and receiving update messages. 7.1. Update messages definition The update message defined in this section will be used by the one hop algorithm, D1HT, 1h-Calot and the two hop algorithm. It is a template for one hop like algorithms. Peng, et al. Expires May 2, 2012 [Page 8] Internet-Draft One Hop Lookups Plugin October 2011 7.1.1. Update types enum { eventUpdate (0), dataStructureUpdate (1), (255) } UpdateType; The UpdateType gives an enumeration of update message which includes eventUpdate and dataStructureUpdate. The eventUpdate message will take a list of event notifications which will have an influence on peer's routing table. The dataStructureUpdate message will take a list of data structure information wich includes the routing table items. This structure allows for unknown option types. 7.1.2. Peer types enum { ordinaryPeer (0), unitLeaderPeer (1), sliceLeaderPeer (2), (255) } PeerType; The PeerType gives an enumeration of peer type which can be used by different one hop like algorithms. For example, the concepts of unit leader and slice leader are used in heterogeneous DHTs like the one hop lookups and two hop lookups. And in D1HT and 1h-Calot, all the peers are ordinary. 7.1.3. Event notification types enum { peerJoin (0), peerLeave (1), groupPeerInformation (3), (255) } EventType; The EventType gives an enumeration of event kinds, the common events include peer joining and peer leaving. Almost all of the algorithms need the two events. The third event type is groupPeerInformation which is used in the two hop algorithm to spread group members' information. 7.1.4. PeerContactItem struct struct { Node-ID peer_id IpAddressPort peer_addr_port } PeerContactItem; The struct of PeerContactItem is used to construct the information about peers. The Node-ID and IpAddressPort are defined in [I-D.ietf-p2psip-base] which represents the ID of the peer and peer's Peng, et al. Expires May 2, 2012 [Page 9] Internet-Draft One Hop Lookups Plugin October 2011 IP address and port. 7.1.5. EventNotification struct struct { EventType event_type; PeerType peer_type; uint32 length; select (event_type) { case peerJoin: select (PeerType) { case ordinaryPeer: case unitLeaderPeer: case sliceLeaderPeer: uint32 slice_number; uint32 unit_number; PeerContactItem joining_peer; PeerContactItem replaced_peer; } case peerLeave: select (PeerType) { case ordinaryPeer: case unitLeaderPeer: case sliceLeaderPeer: uint32 slice_number; uint32 unit_number; PeerContactItem leaving_peer; } case groupPeerInformation: select (PeerType) { case ordinaryPeer: case unitLeaderPeer: case sliceLeaderPeer: uint32 group_size; uint32 slice_number; PeerContactItem <0...n> group_peer_list; } } } EventNotificationItem; The structure of EventNotificationItem is an item of EventNotification message. The peer who receives this item can refresh its routing table and other important data structures by analysing the information it takes. The EventNotificationItem structure contains parameters as follows. Peng, et al. Expires May 2, 2012 [Page 10] Internet-Draft One Hop Lookups Plugin October 2011 event_type The type of event the message will take, its value is one of the types defined in the EventType. This structure allows for unknown event types. peer_type The type of peer who wants to get this information, its value is one of the types defined in the PeerType. This structure allows for unknown peer types. slice_number, unit_number The number of slice and unit the peer who caused this event belongs to. These two parameters are only used in the one hop lookup algorithm and the two hop lookup algorithm. joining_peer The contact information of the peer's who is going to be one of the overlay members. replaced_peer The contact information of the peer's who is replaced by the new arrival one. This variable is only used in the one hop lookup algorithm and the two hop lookup algorithm when the slice leader or unit leader changes. leaving_peer The contact information of the peer's who is going to leave the overlay. group_size This variable means the number of peers in one group. This variable is only used in the two hop lookup algorithm. group_peer_list The representatives or entry points' contact information for other slices. This list is only used in the two hop lookup algorithm. length This variable means the length of the remainder of this message. typedf EventNotificationItem <0...n> EventNotificationItemList; struct { uint32 uptime; PeerContactItem detect_peer; uint32 D1HT_TTL; uint32 multicast_range_start; uint32 multicast_range_end; uint32 event_notification_item_counts; EventNotificationItemList event_notification_item_list; } EventNotification; The EventNotificaton struct is the most important message body in Updates. This message can be used in the one hop lookup algorithm, Peng, et al. Expires May 2, 2012 [Page 11] Internet-Draft One Hop Lookups Plugin October 2011 D1HT, 1h-Calot and the two hop lookups algorithm. But different algorithms will use different variables. In the D1HT and 1h-Calot, all the peers are ordinary, so both of them will not use the variables related to unit or slice. The EventNotification structure contains parameters as follows. uptime The time this peer has been up in seconds. detect_peer This variable means the peer's contact information who detects these events. D1HT_TTL This variable is only used in the D1HT algorithm to be a mark for message spreading. Other algorithms can set it as zero. multicast_range_start, multicast_range_end These two variables mean the multicast range marked by the root of a mulcasting tree. They are only used in the 1h-Calot. event_notification_item_counts This variable means the number of events one update message will take. In the 1h-Calot algorithm, every peer will construct the multicasting tree to spread notifications immediately when it detects the overlay changes. So, the variable can be set as 1 in the 1h-Calot. event_notification_item_list This variable means a list of event notifications. In the one hop lookup algorithm, D1HT and the two hop lookup algorithm, this variable can reduce some network traffic. Peng, et al. Expires May 2, 2012 [Page 12] Internet-Draft One Hop Lookups Plugin October 2011 7.1.6. DataStructureContent struct struct { PeerType peer_type; PeerContactItem predecessor_peer; PeerContactItem successor_peer; PeerContactItem routing_table_list <0...n>; uint32 length; select (peer_type) { case ordinaryPeer: uint32 unit_number; uint32 slice_number; PeerContactItem unit_leader; PeerContactItem slice_leader; case unitLeaderPeer: PeerContactItem boundary_peers <2>; case sliceLeaderPeer: PeerContactItem slice_leaders_list <0...n>; PeerContactItem group_peers_list <0...n>; } } DataStructureContent The structure DataStructureContent mainly used in the transferring of data structure during the peer joining procedure or receiving a message contains the send_update flag [I-D.ietf-p2psip-base]. The length of the structure changes in different peer type because different peer has different data structure. The ordinary peer only has predecessor, successor, unit leader, slice leader and routing table information. And the unit leader and slice leader should also know some other information. The DataStructureContent structure contains parameters as follows. routing_table_list This variable represents the peer's routing table which has all of the peers' information in the overlay. In the two hop lookup algorithm, the routing table contains all of the same slice peers's information and other slices' entry peers information. boundary_peers This variable represents the information of peers' who are the boundary of unit. It is used in the one hop lookup algorithm and the two hop lookup algorithm. Peng, et al. Expires May 2, 2012 [Page 13] Internet-Draft One Hop Lookups Plugin October 2011 slice_leaders_list All of the slice leaders' information will be stored in this list, and it can be transferred from one slice leader to another or from the old slice leader to the new one. It is used in the one hop lookup algorithm and the two hop lookup algorithm. 7.1.7. OneHopUpdate message typedef EventNotification<0...n> EventNotificationList; typedef DataStructureContent<0...n> DataStructureContentList; struct { UpdateType update_type; uint32 length; select (update_type) { case eventUpdate: EventNotificationList event_notification_list; case dataStructureUpdate: DataStructureContentList data_structure_list; } } OneHopUpdateReq; This structure is the Update message used in the O (1) protocols. The update message is composed by two kinds of list. Using list can take more information. One of them is the EventNotification structures, and the other is the DataStructureContent structures. All the peers in the overlay can use this update request to inform event notification or transport routing information. struct { uint32 update_response; } OneHopUpdateAns; The structure OneHopUpdateAns is used to be a response to the OneHopUpdateReq message. It is only has a response number to represent the receiver's attitude which may include success, fail and error. This number also can be defined as a type. update_response The response of the Update message, different number has different meaning includes success, error and so on. Peng, et al. Expires May 2, 2012 [Page 14] Internet-Draft One Hop Lookups Plugin October 2011 7.2. Different using strategies The update message defiend above can cover 4 kinds of O (1) protocols, they are the one hop lookup protocol, D1HT, 1h-Calot and the two hop lookup protocol. Different protocol has different using strategies for the same update message. 7.2.1. One hop lookups In the one hop lookups, event notifications can be disseminated in a hierarchical architecture. The slice leader collects all the events happened in its responsible slice and spreads this event notification list to all the other slice leaders. The notifications are composed of peer joining and leaving without group information. It also does not use the variable D1HT_TTL which is designed for D1HT specially. Other slice leaders receive the notification list and then send it to the unit leaders it is responsible for. At last, the unit leader spread this list to the ordinary peers via the keep-alive messages. 7.2.2. D1HT In the D1HT, event can be disseminated by the EDRA (Event Detection and Reporting Algorithm), which is able to notify an event to whe whole system in logartithmic time and yet to have good load-balance properties coupled with very low bandwitdth overhead. This algorithm will use the D1HT_TLL variable defined in the EventNotification structure. In the EDRA, each message will have a Time-To-Live (TTL) counter and will be addressed to its successor. The details of this algorithm can be found in [D1HT]. The update message of D1HT is far different from the one hop lookups algorithm, because all the peers in D1HT are ordinary, so it just use the messages related to the ordinary peers. Other protocols will set the D1HT_TTL as zero or a negative number. 7.2.3. 1h-Calot The overlay architecture of 1h-Calot is same as D1HT because all the peers are ordinary. In fact, the principles of 1h-Calot are extremely simple: multicast maintains the routing tables; information in the routing table is then used to guide multicast and routing. In the 1h-Calot, a resource is stored on the node whose identifier is the closet to the resource's key in absolute distacnce, regardless of the direction (clockwise or counterclockwise), this is the basic of the routing table maintenance algorithm. A new arrival peer informs other peers of its arrival by multicasting a notification through a tree rooted at the new arrival peer. Every message will take a range variable to tell the next peer its responsible informing boundary. 1h-Calot multicasts each notification through a different tree Peng, et al. Expires May 2, 2012 [Page 15] Internet-Draft One Hop Lookups Plugin October 2011 expanded just in time according to the current content of the routing tables. This algorithm will use the multicast_range_start and multicast_range_end variables defined in the EventNotification structure to carry the range information. Beacause the 1h-Calot peer will notify and construct its multicasting tree as soon as the peer detects the overlay changes, so there is only one update message on a multicasting tree which means that the variable event_notification_item_counts will always be one. 7.2.4. Two hop lookups Although the one hop lookups is a very efficient DHT algorithm, for systems of larger size, the bandwidth requirements of this scheme may become too large for a significant fraction of peers. The two hop lookups schema keeps a fixed fraction of the total routing state on each peer and consumes much less bandwidth, and thus scales to a larger system size. The overlay architecture of two hop lookups is as same as the one hop lookups. In addition, every slice leader chooses a group of its own peers for each other slice; the group may be chosen randomly or they may be based on proximity metrics. The slice leader sends routing information about one group which can be carried in the group_information_list defined in the EventNotificationItem structure to exactly one other slice leader. The information about the group is then dissemanated to all members of that slice as in the one hop lookups. At las, each peer not only keeps full routing information about peers in its own slice, but also has routing information for the peers in every other slice. The update message dissemanating is still accomplished by the event notification schema used in the one hop lookups. 8. Leaving The leaving action is also a kind of event, so it can be expressed by the update message. When a peer leaves the overlay peacefully, it needs to send a leaving message to its predecessor or successor which may just include some traditional information like the Node-ID and some security proves like the certificate. And the predecessor or succeor who receives this leaving message will verify its identity and address a update message contains the peer leaving information to its slice leader. The data transfer can also use the update messages, but it sometimes depends on the replication strategies. If a peer leaves the overlay without any notification, the neighbor Peng, et al. Expires May 2, 2012 [Page 16] Internet-Draft One Hop Lookups Plugin October 2011 peer (predecessor or successor) must inform this event to the leader by the update messages, and then the backup peer of the leaving one should transfer the backup data to the new one who is charging the leaving peer's namespace. When a unit leader or slice leader leaves this overlay, the overlay should choose a new one to replace the old one. 9. Replication The replica placement policy of the one hop algorithm can use the way defined in SandStone. It is designed to maximize data reliability and availability, and to maximize network bandwidth utilization. In the SandStone, the subscriber data can be replicated on multiple peers, three on default. We can use a SandStone like replica replacement policy with three replications. The first replica is stored in the primary peer's successor; the second is saved in a peer of different unit but in same slice; and the third is put in a peer from different slice. Above all, one of the most important issues is the choosing of the replica peer. Maybe we can use a special function whose input is peer's Node-ID to calculate the three replica peers, and of course the root peer should know about the replication method in order to be able to search the data. 10. Fault tolerance 10.1. First hop fail In the one hop algorithm, if a query fails on its first attempt it does not return an error to an application. Instead, queries can be rerouted with the RELOAD message RouteQueryAns. We can define the RouteQueryAns as follows. struct { uint32 response_state; PeerContactItem destination_peer; } RouteQueryAns; The structure RouteQueryAns is composed by a state number and a Node-ID. The state number tells the query state and the PeerContactItem gives a peer's information that is closer to the destination. And in most of cases, two hops are enough to locate one peer or resource. If the two hops are still wrong, then we can know the lookup procedure is fail. Peng, et al. Expires May 2, 2012 [Page 17] Internet-Draft One Hop Lookups Plugin October 2011 response_state The response mark of the RouteQueryReq message's; and it may include success, failure and errors. destination_peer If the query is failed, the application can send RouteQueryReq message to this peer again for finding resources. If a lookup query from peer A to peer B because the routing table is outdate or the peer B has left, peer A can retry the query by sending the RouteQueryReq to the peer C who is the successor of peer B. And at that time joining a peer D to be peer B's new successor and peer C's predecessor, then peer C can reply RouteQueryAns message with the redirect_peer of peer D's, and peer A can contact it in this routing step. 10.2. Leaders fail The most important issues are how to keep the correct functioning of unit leaders and slice leaders; we need to recover from their failure. When a slice or unit leader fails, its successor soon detects the failure and becomes the new leader; the successor of a failed slice leader will communicate with its unit leaders and other slice leaders to recover information about the missed events. 11. Security Considerations There is no specific security consideration associated with this draft. 12. IANA Considerations There are no IANA considerations associated to this memo. 13. Acknowledgements 14. References 14.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [I-D.ietf-p2psip-base] Jennings, C., Lowekamp, B., Rescorla, E., Baset, S., and Peng, et al. Expires May 2, 2012 [Page 18] Internet-Draft One Hop Lookups Plugin October 2011 H. Schulzrinne, "REsource LOcation And Discovery (RELOAD) Base Protocol", draft-ietf-p2psip-base-15 (work in progress), May 2011. 14.2. Informative References [RFC3174] Eastlake, D. and P. Jones, "US Secure Hash Algorithm 1 (SHA1)", RFC 3174, September 2001. [One-Hop-Lookups] Gupta, A., Liskov, B., and R. Rodrigues, "One Hop Lookups for Peer-to-Peer Overlays", June 2003. [SandStone] Shi, G., Chen, J., Gong, H., Fan, L., Xue, H., Lu, Q., and L. Liang, "SandStone: A DHT based Carrier Grade Distributed Storage System", September 2009. [TwoHopLookups] Gupta, A., Liskov, B., and R. Rodrigues, "Efficient Routing for Peer-to-Peer Overlays". [D1HT] Monnerat, L. and C. Amorim, "D1HT: A Distributed One Hop Hash Table". [1h-Calot] Tang, C., Buco, M., Chang, R., Dwarkadas, S., Luan, L., Edward, E., and C. Ward, "On the Tradeoff among Capacity, Routing Hops, and Being Peer-to-Peer in the Design of Structured Overlay Networks". [SingleHopDHT] Monnerat, L. and C. Amorim, "Peer-to-Peer Single Hop Distributed Hash Tables". Authors' Addresses Jin Peng China Mobile Unit 2, 28 Xuanwumenxi Ave, Xuanwu District Beijing 100053 P.R.China Email: Penjin@chinamobile.com Peng, et al. Expires May 2, 2012 [Page 19] Internet-Draft One Hop Lookups Plugin October 2011 Lifeng Le China Mobile Unit 2, 28 Xuanwumenxi Ave, Xuanwu District Beijing 100053 P.R.China Email: Lelifeng@chinamobile.com Kai Feng Beijing University of Posts and Telecommunications 10 Xi Tu Cheng Rd. Haidian District Beijing 100876 P.R.China Email: fengkai_sunny@139.com Peng, et al. Expires May 2, 2012 [Page 20]