RIFT A. Atlas Internet-Draft Individual Intended status: Standards Track Z. Zhang Expires: April 25, 2019 Juniper Networks October 22, 2018 Policy Guided Prefixes with Routing In Fat Trees draft-atlas-rift-pgp-00 Abstract In a fat tree, it can be sometimes desirable to guide traffic to particular destinations or keep specific flows to certain paths. In RIFT, this traffic steering/engineering is done by using policy- guided prefixes with their associated communities. Routes based on policy-guided prefixes are preferred over regular routes. Any node can originate a policy-guided prefix and advertise it in both north and south directions, and the calculation in both directions are distance vector based. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on April 25, 2019. Atlas & Zhang Expires April 25, 2019 [Page 1] Internet-Draft rift-pgp October 2018 Copyright Notice Copyright (c) 2018 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Specification . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1. Ingress Filtering . . . . . . . . . . . . . . . . . . . . 4 2.2. Applying Policy . . . . . . . . . . . . . . . . . . . . . 4 2.3. Store Policy-Guided Prefix for Route Computation and Regeneration . . . . . . . . . . . . . . . . . . . . . . 5 2.4. Re-origination . . . . . . . . . . . . . . . . . . . . . 6 2.5. Reachability Computation with PGP Consideration . . . . . 6 3. Security Considerations . . . . . . . . . . . . . . . . . . . 7 4. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 7 5. Normative References . . . . . . . . . . . . . . . . . . . . 7 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 8 1. Introduction In a fat tree, it can be sometimes desirable to guide traffic to particular destinations or keep specific flows to certain paths. In RIFT, this is done by using policy-guided prefixes with their associated communities. Each community is an abstract value whose meaning is determined by configuration. It is assumed that the fabric is under a single administrative control so that the meaning and intent of the communities is understood by all the nodes in the fabric. Any node can originate a policy-guided prefix. Since RIFT uses distance vector concepts in a southbound direction, it is straightforward to add a policy-guided prefix to an S-TIE. For easier troubleshooting, the approach taken in RIFT is that a node's southbound policy-guided prefixes are sent in its S-TIE and the receiver does inbound filtering based on the associated communities (an egress policy is imaginable but would lead to different S-TIEs per adjacency possibly which is not considered in RIFT protocol Atlas & Zhang Expires April 25, 2019 [Page 2] Internet-Draft rift-pgp October 2018 procedures). A southbound policy-guided prefix can only use links in the south direction. If an PGP S-TIE is received on an East-West or northbound link, it must be discarded by ingress filtering. Conceptually, a southbound policy-guided prefix guides traffic from the leaves up to at most the north-most level. It is also necessary to to have northbound policy-guided prefixes to guide traffic from the north-most level down to the appropriate leaves. Therefore, RIFT includes northbound policy-guided prefixes in its N PGP-TIE and the receiver does inbound filtering based on the associated communities. A northbound policy-guided prefix can only use links in the northern direction. If an N PGP TIE is received on an East-West or southbound link, it must be discarded by ingress filtering. By separating southbound and northbound policy-guided prefixes and requiring that the cost associated with a PGP is strictly monotonically increasing at each hop, the path cannot loop. Because the costs are strictly increasing, it is not possible to have a loop between a northbound PGP and a southbound PGP. If East-West links were to be allowed, then looping could occur and issues such as counting to infinity would become an issue to be solved (if complete generality of path - such as including East-West links and using both north and south links in arbitrary sequence - then a Path Vector protocol or a similar solution must be considered). Besides the usage for traffic engineering, PGPs can also be used to ensure nodes are administratively reachable for debugging purpose after certain failures. For example, a node looses all its northbound adjacencies but is not at the top of the fabric. If it detects that some other members at its level are advertising northbound adjacencies MAY inject its loopback address into southbound PGP TIE and become reachable "from the south" that way. Further, a solution may be implemented where based on e.g. a "well known" community such a southbound PGP is reflected at level 0 and advertised as northbound PGP again to allow for "reachability from the north" at the cost of additional flooding. 2. Specification PGPs are advertised in PGPrefixTIEs included in PGP N/S-TIEs. S-PGPs are propagated in south direction only and N-PGPs follow northern direction strictly. THRIFT schema in the base RIFT specification needs to be updated. For example: o TIEElement needs to add "7: optional PGPrefixElement pog_prefixes;" Atlas & Zhang Expires April 25, 2019 [Page 3] Internet-Draft rift-pgp October 2018 o "struct PGPrefixElement" needs to be defined. Should PrefixAttributes be used for PGPrefixElement (do all defined fields in PrefixAttributes apply to PGPrefixElement)? o "struct Community" needs to be referenced in PGPrefixElement Future revisions of this document and the base RIFT specification will coordinate the THRIFT schema. 2.1. Ingress Filtering The set of policy-guided prefixes received in a TIE is subject to ingress filtering and then re-originated to be sent out in the receiver's appropriate TIE. Both the ingress filtering and the re- origination use the communities associated with the policy-guided prefixes to determine the correct behavior. The cost on re- advertisement MUST increase in a strictly monotonic fashion. When a node X receives a PGP S-TIE or a PGP N-TIE that is originated from a node Y which does not have an adjacency with X, all PGPs in such a TIE MUST be filtered. Similarly, if node Y is at the same level as node X, then X MUST filter out PGPs in such S- and N-TIEs to prevent loops. Next, policy can be applied to determine which policy-guided prefixes to accept. Since ingress filtering is chosen rather than egress filtering and per-neighbor PGPs, policy that applies to links is done at the receiver. Because the RIFT adjacency is between nodes and there may be parallel links between the two nodes, the policy-guided prefix is considered to start with the next-hop set that has all links to the originating node Y. A policy-guided prefix has or is assigned the following attributes: cost: This is initialized to the cost received community_list: This is initialized to the list of the communities received. next_hop_set: This is initialized to the set of links to the originating node Y. 2.2. Applying Policy The specific action to apply based upon a community is deployment specific. Here are some examples of things that can be done with communities. The length of a community is a 64 bits number and it can be written as a single field M or as a multi-field (S = M[0-31], Atlas & Zhang Expires April 25, 2019 [Page 4] Internet-Draft rift-pgp October 2018 T = M[32-63]) in these examples. For simplicity, the policy-guided prefix is referred to as P, the processing node as X and the originator as Y. Prune Next-Hops: Community Required: For each next-hop in P.next_hop_set, if the next-hop does not have the community, prune that next-hop from P.next_hop_set. Prune Next-Hops: Avoid Community: For each next-hop in P.next_hop_set, if the next-hop has the community, prune that next-hop from P.next_hop_set. Drop if Community: If node X has community M, discard P. Drop if not Community: If node X does not have the community M, discard P. Prune to ifIndex T: For each next-hop in P.next_hop_set, if the next-hop's ifIndex is not the value T specified in the community (S,T), then prune that next-hop from P.next_hop_set. Add Cost T: For each appearance of community S in P.community_list, if the node X has community S, then add T to P.cost. Accumulate Min-BW T: Let bw be the sum of the bandwidth for P.next_hop_set. If that sum is less than T, then replace (S,T) with (S, bw). Add Community T if Node matches S: If the node X has community S, then add community T to P.community_list. 2.3. Store Policy-Guided Prefix for Route Computation and Regeneration Once a policy-guided prefix has completed ingress filtering and policy, it is almost ready to store and use. It is still necessary to adjust the cost of the prefix to account for the link from the computing node X to the originating neighbor node Y. There are three different policies that can be used: Minimum Equal-Cost: Find the lowest cost C next-hops in P.next_hop_set and prune to those. Add C to P.cost. Minimum Unequal-Cost: Find the lowest cost C next-hop in P.next_hop_set. Add C to P.cost. Maximum Unequal-Cost: Find the highest cost C next-hop in P.next_hop_set. Add C to P.cost. Atlas & Zhang Expires April 25, 2019 [Page 5] Internet-Draft rift-pgp October 2018 The default policy is Minimum Unequal-Cost but well-known communities can be defined to get the other behaviors. Regardless of the policy used, a node MUST store a PGP cost that is at least 1 greater than the PGP cost received. This enforces the strictly monotonically increasing condition that avoids loops. Two databases of PGPs - from N-TIEs and from S-TIEs are stored. When a PGP is inserted into the appropriate database, the usual tie- breaking on cost is performed. Observe that the node retains all PGP TIEs due to normal flooding behavior and hence loss of the best prefix will lead to re-evaluation of TIEs present and re- advertisement of a new best PGP. 2.4. Re-origination A node must re-originate policy-guided prefixes and retransmit them. The node has its database of southbound policy-guided prefixes to send in its S-TIE and its database of northbound policy-guided prefixes to send in its N-TIE. Of course, a leaf does not need to re-originate southbound policy- guided prefixes. 2.5. Reachability Computation with PGP Consideration During reachability computation, after prefixes are attached as specified in section 5.2.6 "Attaching Prefixes" of the RIFT base specification, PGPs are considered. Each policy-guided prefix P has its cost and next_hop_set already stored in the associated database, as specified in Section 2.3; the cost stored for the PGP is already updated to considering the cost of the link to the advertising neighbor. By definition, a policy-guided prefix is preferred to a regular prefix. Atlas & Zhang Expires April 25, 2019 [Page 6] Internet-Draft rift-pgp October 2018 for each policy-guided prefix P: if P not in route_database: add (P, type=PolicyGuided, P.cost, next_hop_set) end if if P in route_database : if (route_database[P].type is not PolicyGuided) or (route_database[P].cost > P.cost): update route_database[P] with (P, PolicyGuided, P.cost, next_hop_set) else if route_database[P].cost == P.cost update route_database[P] with (P, PolicyGuided, P.cost, merge(next_hop_set, route_database[P].next_hop_set)) else // Not preferred route so ignore end if end if end for Figure 1: Adding Routes from Policy-Guided Prefixes Notice that a policy-guided prefix is always preferred to a regular prefix, even if the policy-guided prefix has a larger cost. PGPs may overlap with prefixes introduced by automatic de- aggregation. The topic is under further discussion. The break in connectivity that leads to infeasibility of a PGP is mirrored in adjacency tear-down and according removal of such PGPs. Nevertheless, the underlying link-state flooding will be likely reacting significantly faster than a hop-by-hop redistribution and with that the preference for PGPs may cause intermittent black-holes. 3. Security Considerations To be provided. 4. Acknowledgements 5. Normative References [I-D.ietf-rift-rift] Team, T., "RIFT: Routing in Fat Trees", draft-ietf-rift- rift-03 (work in progress), October 2018. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . Atlas & Zhang Expires April 25, 2019 [Page 7] Internet-Draft rift-pgp October 2018 Authors' Addresses Alia Atlas Individual EMail: akatlas@gmail.com Zhaohui Zhang Juniper Networks EMail: zzhang@juniper.net Atlas & Zhang Expires April 25, 2019 [Page 8]