NSIS Working Group R. Bless Internet-Draft M. Doll Intended status: Informational Univ. of Karlsruhe Expires: April 19, 2007 Oct 16, 2006 Inter-Domain Reservation Aggregation for QoS NSLP draft-bless-nsis-resv-aggr-00 Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on April 19, 2007. Copyright Notice Copyright (C) The Internet Society (2006). Abstract QoS NSLP is a recently proposed signaling protocol that allows to establish QoS reservations in the Internet. In order to enable large scale deployment, inter-domain aggregation should be considered as mechanism to allow for the necessary scalability. This draft describes the major problems that must be solved and proposes also solutions to these problems, requiring only modest modifications and extensions to the currently defined GIST and QoS NSLP specifications. Bless & Doll Expires April 19, 2007 [Page 1] Internet-Draft Inter-Domain Reservation Aggregation Oct 2006 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Aggregation Concept . . . . . . . . . . . . . . . . . . . . . 3 2.1. Aggregate Setup . . . . . . . . . . . . . . . . . . . . . 4 2.2. Aggregate Use and Changes . . . . . . . . . . . . . . . . 4 2.3. Aggregate Teardown . . . . . . . . . . . . . . . . . . . . 5 3. Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.1. Determination of Aggregator and Deaggregator . . . . . . . 5 3.2. Signaling between Aggregator and Deaggregator . . . . . . 6 3.3. Route Change Detection for Aggregated Flows in an Aggregate . . . . . . . . . . . . . . . . . . . . . . . . 7 3.4. A Priori Determination of a Flow's Path . . . . . . . . . 8 4. Solution Proposals . . . . . . . . . . . . . . . . . . . . . . 9 4.1. Determination of Aggregator and Deaggregator . . . . . . . 9 4.2. Signaling Between Aggregate Endpoints . . . . . . . . . . 10 4.3. Route Change Detection for Aggregated Flows in an Aggregate . . . . . . . . . . . . . . . . . . . . . . . . 11 4.3.1. IP Layer Solution . . . . . . . . . . . . . . . . . . 13 4.3.2. GIST Layer Solution . . . . . . . . . . . . . . . . . 13 4.3.3. NSLP Layer Solution . . . . . . . . . . . . . . . . . 15 4.4. A Priori Determination of a Flow's Path . . . . . . . . . 15 5. Security Considerations . . . . . . . . . . . . . . . . . . . 16 6. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17 6.1. Normative References . . . . . . . . . . . . . . . . . . . 17 6.2. Informative References . . . . . . . . . . . . . . . . . . 17 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18 Intellectual Property and Copyright Statements . . . . . . . . . . 19 Bless & Doll Expires April 19, 2007 [Page 2] Internet-Draft Inter-Domain Reservation Aggregation Oct 2006 1. Introduction A primary objective of NSIS QoS Signaling is to perform resource- based admission control for data flows. Per-flow signaling, however, has scalability issues. Aggregation of resource reservations can be used to achieve better scalability in the control plane (e.g., as proposed in [RFC3175]). Aggregation achieves two important reductions: reduction of state (or reservation context) information and reduction of signaling message processing. Aggregation in the data plane can be achieved by using forwarding according to the Differentiated Services architecture [RFC2475]. For the sake of simplicity, we assume that the latter is used to provide QoS for packet forwarding and that Autonomous Systems contain one or more Differentiated Services domains. For the following discussion, we assume that the reader is familiar with RSVP aggregation concepts as described in [RFC3175]. Furthermore, the term "aggregated flow" denotes a flow that is contained in a reservation aggregate that encompasses several single reservations (of the aggregated flows). Currently, QoS NSLP describes coarsely the process of reservation aggregation, and it supports a single aggregation level using router alert option codepoints. This is usually sufficient if only intra- domain aggregation is considered. But, as an early charter of the NSIS WG stated: "When considering end-to-end communication, it is likely that several administrative domains are traversed. Interworking between domains in which different QoS solutions solutions are deployed is problematic." When considering the ultimate goal of providing QoS for an end-to-end communication from a global perspective, intra-domain aggregation is not sufficient for a scalable end-to-end QoS support, because if aggregated flows leave an aggregation domain, the next domain sees all individual (i.e., non- aggregated) flows again. Thus, especially larger transit providers will have to manage a lot of individual flows suffering from scalability problems. Moreover, using manually established static aggregates between providers would be a huge management overhead. Therefore, we want to design a mechanism that allows to dynamically create aggregates between different providers on demand. 2. Aggregation Concept This section briefly describes the concept of aggregated reservations as assumed in this draft. Bless & Doll Expires April 19, 2007 [Page 3] Internet-Draft Inter-Domain Reservation Aggregation Oct 2006 2.1. Aggregate Setup Aggregation of QoS reservations is performed between an aggregator and a deagreggator (which is located downstream from the aggregator, cf. also RFC 3175 [RFC3175]). Based on some trigger (e.g., the current number of reservations), the aggregator decides to subsume several existing flow reservations along the same path segment (e.g., same AS hops) into a larger aggregate reservation. In order to make this decision, the aggregator must find a potential deaggregator. The flows (and their reservations) must follow exactly the same path, at least up to the deaggregator. Therefore, the aggregator must either know the actual path taken by the flows (e.g., by using corresponding routing protocol information) or it must get notified by the deaggregator explicitly. In this case the deaggregator must know that the aggregator is an upstream node which is common to all reservations under consideration. The comprising aggregate reservation is setup by an appropriate signaling between aggregator and deaggregator. During this procedure, the existing reservations are moved into the aggregate reservation, i.e., all intermediate nodes between aggregator and deaggregator delete all state information related to the individual reservations that should be aggregated, so they only see a single aggregate reservation afterwards. This will achieve the desired reduction of managed states. Aggregator and deaggregator manage both the individual reservations and the aggregate reservation, i.e., they don't save any state information but need to manage one additional state for the aggregate. Furthermore, in order to save message processing cost, the aggregate capacity should be somewhat larger than actually required by the subsumed flows. In this case the aggregator should not need to adapt the aggregate capacity every time a flow leaves or joins the aggregate. Thus, the aggregate capacity should change only infrequently, usually by applying some hysteresis function (cf. discussion in [RFC3175], sec. 1.4.4). 2.2. Aggregate Use and Changes If a new reservation request approaches the aggregator it must determine a priori whether the new flow "fits" into an existing aggregate. So the flow's route must be known and whether enough residual capacity is left in the aggregate to subsume the new request. In case that the aggregate capacity is too small, it must be increased prior to including the new reservation. Signaling messages for all aggregated flows should be directly forwarded from aggregator to deaggregator in order to save signaling Bless & Doll Expires April 19, 2007 [Page 4] Internet-Draft Inter-Domain Reservation Aggregation Oct 2006 message processing by nodes between aggregator and deaggregator. Furthermore, these nodes do not have any knowledge about the aggregated flow sessions anymore, thus one must avoid to signal messages related to these single flows to them. 2.3. Aggregate Teardown The aggregate can be torn down some time after the last reservation has left the aggregate. The aggregator will notice either an explicit teardown or a refresh timeout for the last reservation. If no new reservation request shows up after a waiting period, the aggregate reservation will be torn down completely. 3. Problems We see several problems for a QoS NSLP to support inter-domain aggregation, namely: o Determination of Aggregator and Deaggregator o Signaling between Aggregator and Deaggregator o A Priori Determination of a Flow's Path o Route Change Detection for Aggregated Flows in an Aggregate These points are discussed in more detail in the following sections. 3.1. Determination of Aggregator and Deaggregator When aggregation within a domain is considered, it is no problem to choose an aggregator and deaggregator for a set of flows, because boundary routers at the domain borders ("aggregation region") are typically acting as aggregators and deaggregators for flows entering and leaving the domain respectively. Thus, their role is predetermined. But if aggregation between domains is considered, it is not obvious which routers are aggregators or deaggregators for a set of flows, because there are many choices due to the fact that flows usually traverse several different administrative domains. Aggregates are more efficient the longer they are, because longer aggregates save more states and control message processing. A set of flows can be aggregated along the path that they share, i.e., along the set of nodes that are traversed by all flows within this set. In example of Figure 1 three different flows are shown, f1 from host H1 to sink S1, f2 from H2 to S1, and, f3 from H3 to S3. While all three flows can be aggregated along domains Dd, De, and Df, only f2 and f3 can be aggregated from Dd up to Dg. Furthermore, the flows are entering the domain Dd usually at different ingress routers and join somewhere within the domain. However, they may leave the domain by the same egress router. For the deaggregation domain, the reverse is true: Bless & Doll Expires April 19, 2007 [Page 5] Internet-Draft Inter-Domain Reservation Aggregation Oct 2006 the aggregated flows enter the domain via the same ingress, but may split sooner or later within the domain, leaving it via different exits. H1--Da--+ \ S1 \ / H2--Db----Dd---De--Df--Dg--S2 / \ / S3 H3--Dc--+ f1: data flow H1->S1 Hx: Host x f2: data flow H2->S2 Sx: Sink x f3: data flow H3->S3 Dx: Domain x Example for aggregation of flows along different domains Figure 1 In summary, there are many more choices to determine an aggregator- deaggregator pair for a set of flows. Moreover, it is important to consider who initiates the establishment of an aggregate. In RFC 3175 [RFC3175] it is the deaggregator that initiates the reservation, which corresponds nicely to the receiver-initiated reservation scheme of RSVP. For QoS NSLP both ends are basically able to initiate an aggregate reservation. The more natural choice would be that the aggregator initiates establishment of an aggregate reservation. In this case, it is required that the aggregator has knowledge about potential deaggregators. This information may be collected during establishment of the reservation for a single flow and reported back to the initiator. See Section 4 for a possible solution. 3.2. Signaling between Aggregator and Deaggregator Signaling messages related to flows that are aggregated in an encompassing aggregate should be forwarded directly from aggregator to deaggregator and vice versa. This is necessary, because in- between nodes know only the aggregate flow and do not have any information about individual flows that are contained in this aggregate. Moreover, the aggregate should not only save states, but it should also allow for saving signaling message processing. The intra-domain aggregation defines that NTLP uses a router alert option to signal directly from aggregator to deaggregator, i.e., all Bless & Doll Expires April 19, 2007 [Page 6] Internet-Draft Inter-Domain Reservation Aggregation Oct 2006 NSIS QNEs in-between do not interpret the signaling message. However, this simple and effective scheme does not work for inter- domain aggregation, because the codepoint space is much too small to cover the huge set of potential aggregator-deaggregator pairs. Moreover, when considering GIST, a further problem occurs if the aggregator has to send periodically a Query message for every flow in order to detect any route changes for this flow. However, this should not cancel the aggregation gain, i.e., nodes between aggregator and deaggregator should ideally not process these messages or store state about these flows. Thus, on the one hand these query messages should detect any change in the path between aggregator and deaggregator, on the other hand, nodes in-between should preferably not process these per-flow signaling messages. 3.3. Route Change Detection for Aggregated Flows in an Aggregate It may occur that the route of an aggregated flow changes during its lifetime. If the routing change does not affect the part of the data path that is also covered by the aggregate, it is not a problem, because it will be managed by the usual GIST/QoS NSLP mechanisms. If the route change also affects the encompassing aggregate in the same way as the aggregated flows, it would be covered by trying to reserve resources for the re-routed aggregate. However, the flow may actually leave the aggregate's path and either return to it before or after the deaggregator (cf. flows f1 and f2 respectively in Figure 2). An alternative, as mentioned in [RFC3175], would be to "tunnel" the data packets between aggregator and deaggregator. However, due to the burden for routers caused by the overhead of tunneling data packets as well as MTU related problems, we do not consider such solutions in this draft. Therefore, a mechanism must be defined to detect any route changes affecting aggregated flows. Bless & Doll Expires April 19, 2007 [Page 7] Internet-Draft Inter-Domain Reservation Aggregation Oct 2006 +-----+ / Dst \ | | | |De |&&&&&&&&&&& \ | / && +--|--+ && f2 | & +--D--+ +-&---+ / # \ / & \ |Dd # | | && Dg| | #%%%%| |&& | \ # /%% &&\ / +--#--+ %%&& +-----+ # &&%% +--#--+ && %% +-----+ / # && % \ |Dc #&&& | | %%% Df| | # | | % | \ # / \ % / +--#--+ +--%--+ # % +--#--+ % / # \ f1 % |Db #%%%%|%%%%%%%%%%%% | # | \ # / +--A--+ Dx: Domain x | #: Aggregate +--|--+ A: Aggregator / | \ D: Deaggregator |Da | | %: f1, returning to aggregate route | Src | before deaggregator \ / &: f2, returning to aggregate route +-----+ behind deaggregator Possible route changes of aggregated flows Figure 2 3.4. A Priori Determination of a Flow's Path In order to utilize an already established aggregate reservation, an aggregator must know if a new incoming reservation can be integrated into an already established aggregate. This requires that the aggregator is able to determine the path that the flow will take a priori. In case the flow runs along the same path as an already established aggregate and the aggregate has enough unused capacity, the aggregator may include the request into the aggregate and forward Bless & Doll Expires April 19, 2007 [Page 8] Internet-Draft Inter-Domain Reservation Aggregation Oct 2006 it directly to the deaggregator. However, predicting a flow's path is difficult in the inter-domain case: usually only an AS path (i.e., a sequence of AS numbers) for a given destination prefix can be determined by using BGP routing table information. Thus, an aggregator usually does not know the exact ingress or egress border routers for a given AS. Especially multi-homing techniques between ASes make it difficult to predict an exact path, e.g., flows whose AS paths differ only in their destination AS may enter the same penultimate AS through different ingress routers. Furthermore, some mechanism must be provided in order to verify the prediction, i.e., to revert if the prediction was wrong. An alternative to prediction would be to probe the actual path first, preferably without installing any state. This would, however, increase the reservation setup time, because a round-trip signaling message exchange would be required before one could determine whether there is an existing aggregate that would match the flow. 4. Solution Proposals This section sketches some proposals to the previously described problems. This is preliminary work and some details still need to be worked out further in forthcoming version of this draft. 4.1. Determination of Aggregator and Deaggregator Determination of aggregator and deaggregator could be accomplished by using a QoS NSLP mechanism to record the route for individual reservations. Therefore each QNE that is able and willing (i.e., if local policy allows it) to serve as a deaggregator may simply append its IP address to a new protocol object ("Route-Record") that holds a list of addresses. This protocol object would be carried in a RESERVE message. Usually, it is sufficient to record only two addresses per domain, i.e., ingress and egress QNE. The average AS path length is usually well below 4 ASes, so the total number of recorded addresses would still be small. It may be useful to record also the AS number in addition to the QNE addresses. Moreover, the ideal object to record would be the peer identity, but due to its non-unique and potentially lengthy format they are probably harder to process more efficiently than IP addresses. The completed "Route- Record" object would be reported back to potential aggregators in the RESPONSE message. So if aggregation is going to be used, requesting a RESPONSE message by inserting an RII object at the aggregator would be required. If an RII object is already present, the response must be checked by the aggregator. An aggregator may then store the list of traversed QNEs together with Bless & Doll Expires April 19, 2007 [Page 9] Internet-Draft Inter-Domain Reservation Aggregation Oct 2006 the per-flow session data and pick a deaggregator according to its own criteria later. Usually, the aggregator should choose a deaggregator that is far away in order to achieve long and thus efficient aggregates. In case he determined a QNE to serve as a deaggregator, a new RESERVE will be sent towards the deaggregator. The RESERVE message would contain the totalized capacity of all individual reservations and a list of the session IDs for all flows that should be aggregated. This would require specification of a new SESSIONID_LIST object which is contained in the aggregation messages RESERVE and RESPONSE. The list in the RESPONSE message will contain all sessions that could not be aggregated, e.g., in case of aggregation conflicts, i.e., when some flows were already aggregated in a way that they cannot be aggregated as intended by the new aggregation request. Furthermore, a flag (AGGREGATION bit) in the RESERVE or RESPONSE message could indicate that it is a special type of RESERVE and RESPONSE message containing the additional SESSIONID_LIST object. Using a flag instead of a new message type may have some implementation advantages, because most of the code is completely identical to a normal RESERVE processing. However, it is also possible to define a new message type for aggregation establishment (e.g, ARESERVE and ARESPONSE). 4.2. Signaling Between Aggregate Endpoints The objective is to forward per flow signaling messages (e.g., refreshing RESERVEs of a flow's session) between aggregator and deaggregator directly, so that no intermediate QNE has to process these messages. The QNI sends a per flow message (e.g., a refreshing RESERVE) that arrives at the aggregator. The aggregator detects that this flow is part of a larger aggregate reservation and performs "aggregate signaling", i.e., it sends the message along a special direct messaging association (MA) that must be established between aggregator and deaggregator for the aggregate. A possible mechanism to establish a corresponding messaging association is described below. At QoS NSLP level the aggregator should also insert the BOUND_SESSION_ID object containing the session ID of the aggregate's session. When the signaling message arrives at the deaggregator, it notices that this message arrived via the direct message association between the aggregate endpoints and removes the inserted BOUND_SESSION_ID. Then normal message processing at QoS NSLP level continues. In order to establish a direct signaling message transport between aggregator and deaggregator, the GIST Query message must be conveyed to the deaggregator. This could be done via several ways, e.g., at IP layer or at GIST layer as described in the next section in more Bless & Doll Expires April 19, 2007 [Page 10] Internet-Draft Inter-Domain Reservation Aggregation Oct 2006 detail. However, the aggregator will create a new GIST session that solely serves the purpose to directly transfer signaling messages between aggregator and deaggregator. The Query will be sent as Q-mode encapsulated message with the single flow as destination address and a special MRM (further details for this special query encapsulation are described in the next section). Once the Query arrives at the deaggregator it will sent a UDP encapsulated Response directly back to the aggregator and a messaging association should be created. Using an SCTP connection for a messaging association would be a good choice so that messages for different flows can be mapped to different streams. Signaling messages for individual flows that arrive at the aggregator are mapped to the GIST session for aggregate signaling, i.e., they are directly sent to the deaggregator. A problem is, however, that the session ID for the individual flow must be conveyed by additional means, because GIST must use its present session ID for aggregate encapsulation. 4.3. Route Change Detection for Aggregated Flows in an Aggregate As mentioned above, it should be possible to establish a messaging association directly between aggregator and deaggregator, e.g., using the GIST bypass mechanism, so that intermediate QNEs drop out of the signaling path. However, in case that all messages for aggregated single reservations are passed over this direct message association unconditionally, reservation path and data path would probably diverge once the route of this flow changes and sheers off the aggregate path. Before describing the details of the method, we summarize the sequence of the overall aggregate operations: 1. Deaggregator is discovered. 2. Aggregator establishes an aggregate reservation. 3. Aggregator initiates a direct signaling messaging association. The messaging associations for all aggregated flows at both sides are updated or installed, so that signaling messages for the aggregated flows use the direct signaling MA. 4. Aggregator performs periodically route change checks for aggregated flows. 5. Additional flows may be added to the aggregate later (leaving flows are straightforward). A direct forwarding of signaling messages between aggregator and deaggregator would be a problem if a flow within the aggregate changes its route within the aggregate, leaving the aggregate's path (maybe even re-joining it later on, cf. flow f2 in Figure 2). In case a flow diverges from the aggregate route it must establish a new Bless & Doll Expires April 19, 2007 [Page 11] Internet-Draft Inter-Domain Reservation Aggregation Oct 2006 reservation along the new part of the path from the branching QNE. A further problem occurs due to the fact, that the QNEs within the aggregate don't know anything about this particular flow session. Furthermore, the single flow should be removed from the aggregate reservation, otherwise new requests have to be rejected unnecessarily. Therefore, it must be checked regularly for path divergence between the flow's path and the aggregate's path. With RSVP aggregation or the intra-domain aggregation for QoS NSLP this is automatically done by using the router alert option, i.e. per flow signaling messages will be routed along their natural path, possibly swerving from the aggregate's path. The interior nodes still don't have to processs the signaling messages. Some boundary node will intercept the message in its role as potential deaggregator and possibly trigger creation of a new aggregate or initiate integration of the single flow into an existing aggregate. For inter-domain aggregation though, the RAO-based approach is not usable, so another approach must be developed. Route detection is the task of the NTLP layer. In GIST, a periodic Query per routing entry is triggered in order to discover new routes or route changes respectively. The Query is sent with Q-mode encapsulation. This also works with intra-domain aggregation by setting the corresponding RAO. For inter-domain aggregation we want to use a similar mechanism. A GIST node cannot easily detect if the flow's path diverges from the aggregate's route. A GIST node could detect that the IP next hop of the flow and the IP next hop of the aggregate flow differ. But this need not necessarily result in a change of the next GIST peer. Therefore, a check for route divergence in a GIST node is not a reliable indication that the flow actually left the aggregate route. Thus, a better indication would be that a signaling message arrives at a GIST node where the aggregate reservation is yet unknown (e.g., the session ID for the aggregate is unknown). In this case the node should not establish a new GIST signaling session for the aggregate, but send back an error indication to the aggregator instead. This would at least detect the case when a flow leaves the aggregate and hits a different GIST node. Not covered by this detection is the case when a flow diverges from the route and rejoins the aggregate route later thereby leaving out some GIST nodes on the aggregate route. To detect such situations a GIST node must either determine whether the previous peer who sent the message is still the same peer as in the aggregate, or, use the GIST Hop Count as indicator that a QNE was skipped. Bless & Doll Expires April 19, 2007 [Page 12] Internet-Draft Inter-Domain Reservation Aggregation Oct 2006 Therefore, we define an "aggregate query encapsulation" mode that detects any divergent routes for a flow and its encompassing aggregate. This query aggregate encapsulation must not use the direct messaging association between the end points, because it would not detect any route changes then. It can, however, also be used to initially establish such a direct signaling relationship. 4.3.1. IP Layer Solution The most efficient solution would be a bypass at IP level, like the special Router Alert Option for aggregate signaling. But, in this inter-domain case, a simple Router Alert Option codepoint is not enough to cover the huge set of different deaggregators in different domains. One solution would be a new IP option carring an additional destination address as an IP packet option (a new hop-by-hop option in IPv6, which we call Route Verify). On receipt of such a packet the router simply performes one additional routing lookup for the conveyed destination address and must compare the next hop for the normal destination address and the additional destination address. In our case, the outer IP destination address would be the one from the deaggregator and the additional destination address the same as the destination from the flow's path-coupled MRI. In case that the next hop entry is the same for both destination addresses, the router simply forwards the message, keeping this additional option. If the next hops differ, however, flow and aggregate diverge at this router and the originator of the signaling message should be notified of this fact. This would require to send a special ICMP message back to the aggregator indicating the route divergence. This ICMP message must be correlated to the GIST signaling message at the aggregator again. However, as mentioned above, route divergence between GIST nodes may not be relevant. Furthermore, because only the IP destination address is used for a routing decision and not the MRI, routes may be different from MRI-based routing decisions. 4.3.2. GIST Layer Solution The GIST only method would be as follows: if the aggregator would send a Q-mode encapsulated message for a single reservation, a GIST Query message with a new MRM (aggregate forwarding - AF) is sent instead. We call this mode Aggregate Q-Mode (AQ-mode). The Aggregate Q-mode encapsulation is as follows: the query is sent with Q-mode encapsulation, i.e., it has the RAO set, uses the flow destination address as IP destination address and uses the aggregators address as source address (S-Bit is set in the GIST header). The session-ID is the one of the single flow, but it contains the new aggregate AF-MRM instead of the normal path-coupled Bless & Doll Expires April 19, 2007 [Page 13] Internet-Draft Inter-Domain Reservation Aggregation Oct 2006 MRM. The AF-MRM contains a type field, the session ID of the aggregate, and the path-coupled MRI of the single flow. In summary its structure is like this: MRI = type session-id (aggregate flow) PC-MRI (single flow) The type field allows to differentiate between different AQ-modes where interpretation of contained session IDs and MRIs differ. It contains one of the following values: o "Route Check": Perform a route check for the specified flow only. o "Establish Direct": Establish a new direct signaling session for the given flow. o "Add Flow": if message hits endpoint, message should be passed to NSLP. The session ID carried within the aggregate forwarding MRI (AF-MRI) is the one of the established aggregate (which we call session ID A for now), the PC-MRI is a fully encapsulated path-coupled MRI object (i.e., including the common object header). First we describe how the route change check is performed and then how a direct signaling messaging association can be established by using the AQ-mode encapsulation. When the aggregator sends such a signaling message in direction towards the deaggregator, the next GIST node (supporting QoS NSLP) will intercept the message due to the RAO and detect the AF-MRM. It then checks whether session ID A is known. If session ID A is known and the node is not the endpoint (deaggregator) for this session it will forward the message further downstream (basically unchanged, i.e., IP hop count and Gist Hop Count are decrement) according to the flows PC-MRI. If the node is the endpoint for the session, message forwarding is terminated and the message has to be processed by GIST, i.e., it must refresh the routing state for the single flow. In contrast to a normal Query the R flag in the GIST header need not be set, so a Response may be suppressed, because the primary objective is to check for diverging routes, that are indicated by Error messages. But if the R flag is set, a Response message must be sent back directly to the aggregator via the existing direct messaging association. If a GIST node receives such an AQ-mode encapsulated message but does not have any installed state for session ID A, it MUST send back an error message (yet to be defined but indicating that the signaling message left the aggregate's path) directly to the IP source of the query message, which is the aggregator. The aggregator should then indicate a route change to the QoS NSLP and should remove the single flow reservation from the aggregate and initiate a normal single flow Bless & Doll Expires April 19, 2007 [Page 14] Internet-Draft Inter-Domain Reservation Aggregation Oct 2006 reservation along the further path (optimizations like shifting the aggregate reserved resources for this flow along the unchanged aggregate path are for further study). The Gist Hop Count must be checked on reception at the endpoint, because it will indicate if a GIST node was skipped due to rerouting. Its value must be compared with the one that was received stored during establishment of the direct signaling connection. This requires that the latter is set up immediately after the establishment of the reservation aggregate as described earlier. Setting up a direct signaling messaging association is possible by using the same AQ-mode encapsulation for the initial Query, having the AF-MRI type set to "Establish Direct". The session ID carried by GIST is the one of the newly to be established direct signaling MA. The PC-MRI in the AF-MRI contains a description of the MA signaling flow between the two aggregate endpoints. For this AF-MRM type the R flag must be set (R=1) so the endpoint (deaggregator) must send a Response message directly back to the aggregator to continue the initial GIST handshake. Therefore, the S flag must be set, too. Details for the last type of this AQ-mode are described in Section 4.4. 4.3.3. NSLP Layer Solution In principle it is also possible to use the NSLP layer for the functions described in the previous GIST related section, e.g., signaling messages are received by the NSLP layer and bypassed if the node is not the deaggregator. The Bound-Session-ID object could be used to refer to the aggregate session and a new flag in the NSLP common header could be used to indicate that this message should be forwarded unless arrived at the deaggregator. If this Bound-Session-ID is unkown, the signaling message has left the aggregate and an error message should indicate this fact. The disadvantage of this method is, that is possesses more processing overhead than using bypassing via GIST as describe above. 4.4. A Priori Determination of a Flow's Path This is the most difficult task of inter-domain aggregation. We propose to solve it by using (BGP) routing tables, the GIST AQ-mode, and new QoS NSLP mechanisms. A QNE may detect that the flow's destination address of a new incoming reservation request is in the same prefix or AS as already aggregated reservations. In this case, it may try to integrate this Bless & Doll Expires April 19, 2007 [Page 15] Internet-Draft Inter-Domain Reservation Aggregation Oct 2006 reservation into the same already existing aggregate (either having some capacity left or having to increase the capacity of the aggregate reservation first). Moreover, the QNE may try to use an existing aggregate if the flow traverses the same AS path as the aggregate, but as stated in Section 3, this prediction may be inaccurate. Thus, an optimistic approach would try to use the existing aggregate first, but doing some path verification in nearly the same way as for route change detection. Therefore, we use a variant of the AQ-mode for that purpose, too. However, because the Query in the AQ-mode cannot carry larger payloads it is not well suited to carry any larger QoS NSLP RESERVE message. Thus, we use the following mechanism: The RESERVE for the newly to be established flow is sent directly to the (predicted) deaggregator over the direct signaling messaging association. It must not be forwarded, however, before not being sure that the flow follows the path of the aggregate. Therefore, the RESERVE must not be forwarded and has to wait until a path verification message via AQ-mode arrives at the deaggregator. This would be a new type of QoS NSLP message, simply carrying a unique message ID (e.g., a 128-bit value) that is chosen by the aggregator. This message ID must be also contained in the RESERVE message to allow for a successful matching of these mutual dependent signaling messages. If the AQ-mode Query message arrives before the RESERVE message, the deaggregator will note that the message ID was received and can immediately forward the RESERVE, because the "Waiting Condition" is already satisfied. However, waiting messages will time out after a while, because the path prediction may have been wrong and the flow diverged from the predicted path. Additionally, one could design an explicit cancellation mechanism, so that the aggregator could explicitly cancel waiting messages if it has been notified of a diverging route. The proposed method has the advantage of saving more than one round trip time compared to a mechanism where the path is probed first. Furthermore, it is advantageous when nested aggregates have to be increased in their capacity. Details of such a concept and using waiting conditions for messages in a signaling protocol are described in [DARIS]. 5. Security Considerations Basically, the security considerations of GIST and QoS-NSLP apply. Inter-domain aggregation, however, may open new aspects due to different trust relationships between domains. So not every provider may be willing to accept aggregate reservations. On the other hand, Bless & Doll Expires April 19, 2007 [Page 16] Internet-Draft Inter-Domain Reservation Aggregation Oct 2006 using the proposed mechanisms for deaggregator discovery, it is no problem to avoid acting as deaggregator by not writing own addresses into the Route-Record object. So the particular policy of a provider could be easily realized. Furthermore, domains that share or carry a lot of end-to-end reservations would likely cooperate with each other. The newly proposed waiting condition for messages cannot be used for DoS attacks that try to exhaust state memory, because every deaggregator will accept such messages only within an aggregate context. Usually, a trust relationship between aggregator and deaggregator exists and they may also use a secure direct signaling messaging association (which is recommended). Thus, messages from the aggregator could be authenticated. Attackers are not able to send such message blindly, because the deaggregator would drop them due to their unauthorized origin and a non-matching session ID. These are preliminary considerations, so they probably cover not all possible aspect of the proposed solutions. There will be more details in the next versions of this draft. 6. References 6.1. Normative References [I-D.ietf-nsis-ntlp] Schulzrinne, H. and R. Hancock, "GIST: General Internet Signaling Transport", draft-ietf-nsis-ntlp-11 (work in progress), August 2006. [I-D.ietf-nsis-qos-nslp] Manner, J., "NSLP for Quality-of-Service Signaling", draft-ietf-nsis-qos-nslp-11 (work in progress), June 2006. 6.2. Informative References [DARIS] Bless, R., "Dynamic Aggregation of Reservations for Internet Services", Telecommunications Systems Volume 26, Issue 1, pp. 33--52, Kluwer, http://tm.uka.de/doc/2003/ictsm-daris-journal-crc-web.pdf, May 2004. [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., and W. Weiss, "An Architecture for Differentiated Services", RFC 2475, December 1998. [RFC3175] Baker, F., Iturralde, C., Le Faucheur, F., and B. Davie, Bless & Doll Expires April 19, 2007 [Page 17] Internet-Draft Inter-Domain Reservation Aggregation Oct 2006 "Aggregation of RSVP for IPv4 and IPv6 Reservations", RFC 3175, September 2001. Authors' Addresses Roland Bless Institute of Telematics, Universitaet Karlsruhe (TH) Zirkel 2 Karlsruhe 76187 Germany Phone: +49 721 608 6413 Email: bless@tm.uka.de URI: http://www.tm.uka.de/~bless Mark Doll Institute of Telematics, Universitaet Karlsruhe (TH) Zirkel 2 Karlsruhe 76187 Germany Phone: +49 721 608 6403 Email: doll@tm.uka.de URI: http://www.tm.uka.de/~doll Bless & Doll Expires April 19, 2007 [Page 18] Internet-Draft Inter-Domain Reservation Aggregation Oct 2006 Full Copyright Statement Copyright (C) The Internet Society (2006). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Intellectual Property The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Acknowledgment Funding for the RFC Editor function is provided by the IETF Administrative Support Activity (IASA). Bless & Doll Expires April 19, 2007 [Page 19]