Intarea working group Janardhanan Narasimhan Internet Draft Balaji Venkat Venkataswami Intended Status: Proposed Standard Dell-Force10 Expires: 23 July 2012 Rich Groves Microsoft Peter Hoose Facebook 23 January 2012 Traceflow draft-janapath-intarea-traceflow-00.txt Abstract This document describes a new OAM protocol - TraceFlow that captures information pertaining to a traffic flow along the path that the flow takes through the network. TraceFlow is ECMP and link-aggregation aware and captures the information about constituent members through which the traffic flow passes. TraceFlow gathers information that is relevant to the flow such as outgoing interface Layer 3 address, Next-hop to which the packet of the flow is forwarded, effect of network policies such as access control lists on the flow. This draft requires the Traceflow protocol to be processed by Layer 3 devices only. Devices such as Layer 2 devices, MPLS LERs/LSRs along the way are passed through without any processing as if in a pass-through mode. IP tunnels such as IP-in-IP, IP-in-GRE mechanisms are expected to pass the Traceflow packets through them using the pass through mode. For achieving its purpose Traceflow advocates the use of a specific UDP destination port to be assigned from IANA. Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." Janardhanan et.al. Expires July 2012 [Page 1] INTERNET DRAFT Traceflow January 2012 The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Copyright and License Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 5 2. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1. Evolution of IP networks . . . . . . . . . . . . . . . . . 5 3. Packet Formats . . . . . . . . . . . . . . . . . . . . . . . . 7 3.1. Flow Discovery Request/Response Packet Format . . . . . . . 7 3.2. Flow Discovery Request TLVs . . . . . . . . . . . . . . . . 8 3.2.1. Flow Descriptor TLV . . . . . . . . . . . . . . . . . . 8 3.2.2. Originator Address TLV . . . . . . . . . . . . . . . . 10 3.2.3. Information Request bitmap TLV . . . . . . . . . . . . 11 3.2.4. Termination TLV . . . . . . . . . . . . . . . . . . . . 12 3.3. Flow Discovery Response TLVs . . . . . . . . . . . . . . . 13 3.3.1. Information Response TLV . . . . . . . . . . . . . . . 13 3.3.1.1 Utilization Anomaly TLV . . . . . . . . . . . . . . 16 3.3.2. Result TLV . . . . . . . . . . . . . . . . . . . . . . 19 3.3.3. Additional Informational Code TLV . . . . . . . . . . . 21 3.4. TLVs common to Flow Discovery Request and Response . . . . 22 3.4.1. Encapsulated Packet TLV . . . . . . . . . . . . . . . . 22 3.4.2. Encapsulated Packet Mask TLV . . . . . . . . . . . . . 24 3.4.3. Record Route TLV . . . . . . . . . . . . . . . . . . . 25 4. Protocol Operation . . . . . . . . . . . . . . . . . . . . . . 26 4.0.1 Assessing why redundant responses come through. . . . . 30 Janardhanan et.al. Expires July 2012 [Page 2] INTERNET DRAFT Traceflow January 2012 4.1. Using Hardware to gather details for the response packet. . 31 4.2 Interaction with MPLS based transit devices. . . . . . . . . 31 4.3 Applicability to Layer 2 devices. . . . . . . . . . . . . . 31 4.4 Applicability to platforms that have trouble determining incoming Interface. . . . . . . . . . . . . . . . . . . . . 31 4.5 Applicability to Network Address Translators . . . . . . . . 31 5. Application Scenarios . . . . . . . . . . . . . . . . . . . . . 32 5.1. Troubleshooting network failures . . . . . . . . . . . . . 32 5.2. Network flow planning . . . . . . . . . . . . . . . . . . . 33 5.2.1 Programmatic migration to mitigate LAG link polarization . . . . . . . . . . . . . . . . . . . . . . 34 6. Security Considerations . . . . . . . . . . . . . . . . . . . . 35 7. Hardware pre-requisites for implementing Traceflow. . . . . . . 35 7.1 filter to trap packets with UDP destination port . . . . . . 35 7.2 Packet injection mode directly to egress port. . . . . . . . 36 7.3 Packet injection mode through hardware engine but not to output port. . . . . . . . . . . . . . . . . . . . . . . . . 36 7.4 Hardware rate limiter support (preventing DOS attacks) . . . 36 7.5 RPF check support in hardware (security consideration) . . . 36 7.6 Regular Security ACLs in the boundary of the network. . . . 37 7.7 Implementing the LAG / ECMP using software state . . . . . . 37 7.8 Implementation considerations . . . . . . . . . . . . . . . 37 7.7.l Using ingress port as part of the LAG/ECMP hashing function. . . . . . . . . . . . . . . . . . . . . . . . . 37 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 38 9. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 38 APPENDIX A: . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 A.1. Encapsulation Format Choices . . . . . . . . . . . . . . . 38 A.1.1. Carrying a separate Flow Descriptor TLV inside the Flow . . . . . . . . . . . . . . . . . . . . . . . . . 38 A.1.2. Using the traffic flow's parameter values in the external header. . . . . . . . . . . . . . . . . . . . 39 A.2. Layer 4 Protocol Choices and Router Alert option . . . . . 39 A.2.1. UDP Encapsulation . . . . . . . . . . . . . . . . . . . 39 A.2.2. ICMP Encapsulation . . . . . . . . . . . . . . . . . . 39 A.3. Legacy Devices (Not supporting TraceFlow) . . . . . . . . . 40 A.4. TTL Scoping . . . . . . . . . . . . . . . . . . . . . . . . 40 A.5. Additional Information in the Flow Discovery Response . . . 40 A.6. Choices for supporting remote TraceFlow requests . . . . . 41 A.6.1. Terminating the request at the Proxy device and re-originate it . . . . . . . . . . . . . . . . . . . . 41 A.6.2. Source-Routing the request through the Proxy device . . 41 A.7. Applicability to Multicast . . . . . . . . . . . . . . . . 41 A.8. Applicability to Layer 2 networks . . . . . . . . . . . . . 41 A.9. Applicability to IPv6 . . . . . . . . . . . . . . . . . . . 42 A.10. Applicability to MPLS . . . . . . . . . . . . . . . . . . 42 A.11. Flow Discovery and Response packet fragmentation . . . . . 42 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Janardhanan et.al. Expires July 2012 [Page 3] INTERNET DRAFT Traceflow January 2012 9.1. Normative References . . . . . . . . . . . . . . . . . . . 42 9.2. Informative References . . . . . . . . . . . . . . . . . . 42 Author's Addresses . . . . . . . . . . . . . . . . . . . . . . . . 43 Janardhanan et.al. Expires July 2012 [Page 4] INTERNET DRAFT Traceflow January 2012 1 Introduction TraceFlow protocol allows user to determine the path taken by a flow through a network. It provides capability to collect relevant information at each hop of the network that pertains to the forwarding for the flow. Information can include individual member information in a link-aggregation group (LAG) or ECMP. There is a need for a mechanism that allows user to determine the path that a flow takes through a network [3]. Current solutions (such as traceroute) do not provide the details about the exact physical or logical interface through with the flow passes in cases where LAG and/or ECMP are employed or policy based routing is in effect. Such information at intermediate hops in the network can prove to be useful to network operators in trouble-shooting network failures. 1.1 Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 2. Motivation Network operators have traditionally managed IP networks with classic OAM tools like Ping and Traceroute[2]. Operators typically use Ping to perform end-2-end connectivity checks, and Traceroute to trace hop-by-hop path to a given destination. Traceroute is also used to isolate the point of failure along the path to a given destination. These tools have performed very well for the IP networks they were designed for. 2.1. Evolution of IP networks With the passage of time networks have morphed into more complex heterogeneous entities. Many a times Layer-2 switches and MPLS LSRs are intermixed with IP routers. MPLS ping and MPLS traceroute also known as LSP ping and LSP traceroute handle the identification of the intermediate hops through which they travel, using methods such as router alert label. Relevant RFCs specify these methods as far as MPLS troubleshooting goes. This document doesnt intend to interfere with the MPLS OAM methods. Traceflow is exclusively intended for pure Layer 3 troubleshooting and will not troubleshoot layer 2 device failure or MPLS transit node failure. Also plain IP-in-IP tunneling varieties of forwarding will not be of interest in this document. Janardhanan et.al. Expires July 2012 [Page 5] INTERNET DRAFT Traceflow January 2012 Increasing number of networks are using multipath configurations to improve load-balancing and redundancy in their networks. These multipaths could be in the form of end-2-end ECMP paths, or LAGs between directly connected hops. Existing tools such as Ping and Traceroute that follow the destination IP address based routing model may not follow the path taken by the actual traffic in multipath and/or policy based routing scenarios. The forwarding of actual traffic in such scenarios is based on a set of packet header fields. Clearly, the OAM tools have not kept up with the new requirements of the evolving networks. Hence there is a need to extend the OAM tools to facilitate the operators to execute new OAM functions: 1. Perform Ping or traceroute based on a set of link layer and/or TCP/IP header fields of actual user traffic. This feature will be very useful for troubleshooting network problems, and planning/provisioning network resources. 2. Trace end-2-end paths comprising of a mix of Layer-2 hops, IP+MPLS routers along the way. Layer 2 hops and MPLS hops are traversed through in pass through mode. 3. Collect more intelligent and useful information to enable operators to perform more detailed problem analysis. This document proposes a new OAM protocol - TraceFlow that attempts to bridge the gap between today's fast evolving networks and the traditional OAM tools. The following section (Section 3) discusses the packet formats used by TraceFlow to avoid forward references in subsequent sections. It is suggested that first-time readers skip section 3 and read the Protocol Overview in Section 4. Applications scenarios are discussed in section 5 and the security considerations in section 6. Janardhanan et.al. Expires July 2012 [Page 6] INTERNET DRAFT Traceflow January 2012 3. Packet Formats 3.1. Flow Discovery Request/Response Packet Format Flow Discovery Request and Response packets follow the general format shown below. The TLVs included in each message type may be different. 0 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Version | Hopcount | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Reserved | Query ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 16-byte opaque System Identifier of the Requestor. | // // | Used as a unique identifier of the system requesting. | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 16-byte opaque System Identifier of the Responder. | // // | Used as a unique identifier of the system Responding. | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | TLVs... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The Flow Discovery Request packet SHOULD be sent with the DF bit set in the external IP header. Version: The version number of the protocol. This document defines protocol version 1. Hopcount: Allows keeping track of the number of transit nodes that processed the Flow Discovery Request packet. This field is decremented at each device that processes the Flow Discovery Request packet. This field also helps in determining if there were any legacy devices not supporting TraceFlow protocol along the way. Length: Length of the packet including the length of the header. This offers a mechanism whereby the length of the payload can be determined by a simple subtraction of header length from this given Length field. Type: 1 Direct Flow Discovery Request - Ping mode 2 Direct Flow Discovery Request - Traceroute mode 3 Indirect Flow Discovery Request - Ping mode Janardhanan et.al. Expires July 2012 [Page 7] INTERNET DRAFT Traceflow January 2012 4 Indirect Flow Discovery Request - Traceroute mode 5 Response for the Flow Discovery Request Reserved: This field should be set to zero on transmit and ignored on received entity. Future use could be determined at a later version of the protocol. Query ID: A unique identifier generated by the originator that allows it to co-relate the responses from the transit nodes with the Flow Discovery Request packet generated. System Identifier: (Requestor and Responder) This is a opaque 16 byte field, which would be unique per node in that network, and it is up to the administrators to define what this means within their network, as long as they ensure that it is unique across all the nodes in that network. The Requestor fills in its System Identifier in its request packet while the Responder fills in both Requestor field (from the packet received) and the Responder field which corresponds to its System Identifier. Thus the Discovery Request packet contains the Requestor System Identifier and the Response packet contains both Requestor and Responder System Identifier as well. The TLVs are divided into three categories: 1. TLVs that can show up in the Flow Discovery Request packet 2. TLVs that can show up in the Flow Discovery Response packet 3. TLVs that can show up in the Flow Discovery Request as well as Response packet Those TLVs that are not understood in previous versions of the protocol are ignored. These TLVs SHOULD be considered as opaque and passed along to the next transit device along the path. Hence these opaque TLVs are treated as transitive for versions of the protocol that dont understand them. 3.2. Flow Discovery Request TLVs 3.2.1. Flow Descriptor TLV This TLV is included in the Flow Discovery Request packet and identifies the traffic flow that the originator device is interested in probing. This is a mandatory TLV. The definition of a traffic flow varies from one network to another. Most traffic flows in today's networks can be uniquely identified Janardhanan et.al. Expires July 2012 [Page 8] INTERNET DRAFT Traceflow January 2012 using fields from the data packet's headers. TraceFlow protocol requires the first 256 bytes of the traffic flow's data packet to be encoded in this Flow Descriptor TLV. For version 1 including the versions to come henceforth, these 256 bytes SHOULD include the Layer 2 headers as well. This way when Traceflow supports Layer 2 devices the information in the 256 bytes would help to discover intermediate Layer 2 devices as well. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Code | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Value... | padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type: The type of the TLV. In this case, the value is 1 meaning Flow Descriptor TLV Code: The Code identifies the sub-type of the TLV. In this case, this field is not defined. It SHOULD be set to 0. Length: The length of the TLV Value: The value encoded in this TLV depending on the Type and the Code specified Padding: This might be necessary to ensure the packet ends on a word boundary Refer to section 3.4.1.1 (Encapsulated Packet TLV) that describes how a data packet can be used to specify the traffic flow. Janardhanan et.al. Expires July 2012 [Page 9] INTERNET DRAFT Traceflow January 2012 3.2.2. Originator Address TLV This TLV carries the address of the originator of the Flow Discovery Request packet. The responses from the intermediate devices processing the request are sent to this address. This is an optional TLV to be included only when an Indirect Flow Discovery Request is originated. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Code | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Value... | padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type: 2 Originator Address Code: 1 IPv4 Address 2 IPv6 Address Janardhanan et.al. Expires July 2012 [Page 10] INTERNET DRAFT Traceflow January 2012 3.2.3. Information Request bitmap TLV This TLV is used by the originator device to specify the information requested for the flow identified by the Flow Descriptor TLV in the Flow Discovery Request packet. This is an optional TLV. In absence of this TLV, the transit and the end devices processing the Flow Discovery Request packet respond with the default set of information. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Code | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Flags... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type: 3 Information Request Code: 1 Incoming Interface related 2 Outgoing Interface related Flags: Bit 0 : IP Address Bit 1 : SNMP ifName Bit 2 : SNMP ifIndex and ifType Bit 3 : Lag details. Bit 4 : Ecmp details. To be specified only for Outgoing interface. Bit 5 : Hash algorithm. To be specified only for Outgoing interface. Note that the Hash algorithm mask TLVs can be specified in the response packet. But the actual hash algorithm need not be specified in the response packet. Code: 3 Global information Flags: Bit 0 : Next Hop Router Address Janardhanan et.al. Expires July 2012 [Page 11] INTERNET DRAFT Traceflow January 2012 3.2.4. Termination TLV This TLV includes a list of addresses. If a device notices that it owns any of the addresses listed in this TLV, it MUST NOT forward the Flow Discovery request packet any further and MUST respond to the originator with a Flow Discovery Response packet. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Code | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Address-type | Address... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Address-type | Address... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ // // +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Address-type | Address... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Address-type: 0x1: IPv4 Address 0x2: IPv6 Address Address: The address where the request MUST be terminated. Janardhanan et.al. Expires July 2012 [Page 12] INTERNET DRAFT Traceflow January 2012 3.3. Flow Discovery Response TLVs 3.3.1. Information Response TLV This TLV is used by the devices processing the Flow Discovery Request packet to provide the information requested by the originator device. This is a mandatory TLV. It should be included in the response sent to the device originating the Flow Discovery Request packet. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Code | Sub-Code | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Length | Value... | padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type: 5 Information Response Code: 1 Incoming Interface related 2 Outgoing Interface related Sub-Code: 0 : IP Address 1 : SNMP ifName 2 : SNMP ifIndex and ifType 3 : Lag details 4 : Ecmp details. To be specified only for Outgoing interface. 5 : Hash algorithm. To be specified only for Outgoing interface. Janardhanan et.al. Expires July 2012 [Page 13] INTERNET DRAFT Traceflow January 2012 The LAG and ECMP details are described in more detail. Following is the frame format if the originator device requested LAG or ECMP related details. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Code | Sub-Code | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Length | No. of members | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Component Link Information.. | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ // // +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Component Link Information.. | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ No. of members: This is the number of members in the LAG or the ECMP segment that is being described Component Link Information: Individual component links are encoded in this field. The "No. of members" field describes how many component links are listed. Janardhanan et.al. Expires July 2012 [Page 14] INTERNET DRAFT Traceflow January 2012 The frame format for the "Component Link Information" portion of the TLV is shown below. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SNMP ifIndex | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SNMP ifType | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Flags | SNMP ifName length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SNMP ifName... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ SNMP ifIndex: The ifIndex of the component link being specified SNMP ifType: The ifType of the component link being specified Flags: 0x1: If set, the Component Link is administratively down. 0x2: If set, the Component Link is operationally down. If the above cannot be determined then the flags SHOULD be set to 0. The rest of the bits in the Flags field are reserved. Janardhanan et.al. Expires July 2012 [Page 15] INTERNET DRAFT Traceflow January 2012 3.3.1.1 Utilization Anomaly TLV An optional TLV to report LAG utilization anomaly is also included. The user could configure a threshold of congruence with respect to utilization amongst the least utilized member of the LAG and the maximally used member of the LAG. If say the threshold is configured as 80% and if the difference in utilization between the least utilized member of the LAG and the maximally used member of the LAG, then an anomaly TLV is sent to report such a condition. On getting this Utilization anomaly TLV the Originator device could report this to the user and a subsequent NMS query to the appropriate device could reveal more information into this anomaly. The TLV format for this Utilization anomaly TLV would be as follows. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Code | Sub-Code | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Length |Configured Divergence Threshold| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SNMP ifIndex of Least used component link in the LAG | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SNMP ifIndex of Most used component link in the LAG | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Actual Divergence in percentage| Padding... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ It is important to note that in the case of this Optional TLV the device which reports it to the Originator should support keeping track of the rate at which each member unit of the LAG is forwarding traffic and report the divergence in terms of the rate. If the implementation cannot keep track of the rate then it would have to report the divergence in terms of packet counts. But the latter might lead to a mis-interpretation in case of link up down events or other conditions. Janardhanan et.al. Expires July 2012 [Page 16] INTERNET DRAFT Traceflow January 2012 TLV format specifies the packet fields that are used by the hash algorithm configured on the device. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Code | Sub-Code | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Length | No. of hash parameters | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | byte-offset-1 | no. of bytes | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | byte-offset-2 | no. of bytes | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Encapsulated Packet ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ No. of hash parameters: This specifies the number of parameters in the packet that are used by the hash algorithm to calculate the egress port Byte-offset-N: This is the offset to the start of the Nth parameter that is used by the hash algorithm to calculate egress port No. of bytes: For the byte-offset specified, the number of bytes starting at that offset that are used by the hash algorithm Encapsulated Packet: The encapsulated packet received in the Flow Discovery Request packet on the input port by the device is returned in the response packet. This should be the packet that is used in the egress component link calculations by the device processing the Flow Discovery Request packet. Note that the Hash algorithm mask TLVs can be specified in the response packet. But the actual hash algorithm need not be specified in the response packet. Janardhanan et.al. Expires July 2012 [Page 17] INTERNET DRAFT Traceflow January 2012 The following TLV is mandatory. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Code | Sub-Code | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Length | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Address-type | Next Hop Address ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The next hop address is encoded as shown above. Code: 3 Global information Sub-Code: 1 Next Hop Address Address-type: 0x1: IPv4 Address 0x2: IPv6 Address Next Hop Address: This field carries the next hop address. Janardhanan et.al. Expires July 2012 [Page 18] INTERNET DRAFT Traceflow January 2012 3.3.2. Result TLV The device processing the Flow Discovery Request packet includes a Result TLV in the response to the originator device to indicate the result of the processing. This TLV is mandatory. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Code | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Result Code | Sub-code | Diagnostic Data.. | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Diagnostic Data... | padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type: 7 Result TLV Result Code: This field carries a value indicating the result of the processing of the Flow Discovery Request packet Sub-Code: This field further qualifies the "Result Code" field and provides more information about the result of processing the Flow Discovery Request packet Diagnostic Data: This field is used in conjunction with the "Result Code" and "Sub-code" to return any information that may be useful to the originator of the Flow Discovery Request packet. Its format is defined based on the "Result Code" and "Sub-code" field. Result Code: 1 Success Result Sub-code: 0 Result Code: 2 Administratively disabled Result Sub-code: 0 Diagnostic Data: A list of Information Request Sub-Codes that are not being fulfilled. These Sub-Codes could indicate whether the outgoing interface is currently disabled or not. If the forwarding tables in hardware are set to the interface which has been Administratively disabled then that would indicate an error in those tables which may lead to a confirmation that the software state is not in sync with the hardware. Janardhanan et.al. Expires July 2012 [Page 19] INTERNET DRAFT Traceflow January 2012 Result Code: 3 Routing failure Result Sub-code: 1 No route in table Result Sub-code: 2 RPF check failed Result Sub-code: 3 ARP Failure. Result Code: 4 Packet Error Result Sub-code: 1 hopcount = 0 This may be the case where the TTL has counted down to 0 in IPv4 or Hopcount has counted down to 0 in IPv6. This is a method by which even if the ICMP "Time to Live Exceeded" packets are dropped on the way back, the Originator may be able to determine that the TTL counted down to zero. Result Code: 5 Malformed packet Result Sub-code: 1 Unknown TLVs for this version. In this case the packet is not dropped but forwarded with the unknown TLVs. This offers the older versions of the protocol the ability to report back to the originator that the packet was processed but with one or more unknown TLVs, but that the packet was forwarded to the next transit device with the unknown TLVs. Result Code: 6 Data-path Error Result Sub-code: 1 Fragmentation needed but not allowed by Flow Information TLV in Flow Discovery Request packet Result Code: 7 Generic Error Result Sub-code: 0 (TBD: Sub-codes to identify the type of error may need to be defined) Janardhanan et.al. Expires July 2012 [Page 20] INTERNET DRAFT Traceflow January 2012 3.3.3. Additional Informational Code TLV This TLV may accompany the Result TLV if the device processing the Flow Discovery Request packet has any additional information that the originator device may be interested in. This TLV is optional. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Code | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Status Code | Sub-code | Additional Data.. | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Additional Data... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type: 8 Additional Informational Code Status code: 1 ACL drop Status Sub-code: 1 Ingress ACL drop Status Sub-code: 2 Egress ACL drop Status code: 2 Dataplane failure Status Sub-code: 1 Switch fabric failure Status Sub-code: 2 Linecard failure Status Sub-code: 3 Port failure Status Code: 3 Generic Information Status Sub-code: 1 TTL/Hopcount mismatch noticed Status Sub-code: 2 Default route used to forward packet Status Sub-code: 3 Per-packet load-balancing enabled. In case of TTL/Hopcount mismatch, the "Additional Data" field carries the difference in the Hopcount and the IP TTL field values. This may provide an indication of the number of previous hop routers that did not support TraceFlow protocol. Janardhanan et.al. Expires July 2012 [Page 21] INTERNET DRAFT Traceflow January 2012 3.4. TLVs common to Flow Discovery Request and Response 3.4.1. Encapsulated Packet TLV This TLV is included in the Flow Discovery Request and is returned in the Flow Discovery Response packet by devices processing the request packet. In the response packet, this TLV contains the encapsulated packet as it was received from the previous-hop device. It helps the originator keep track of how the data packet gets modified along the way. This TLV is mandatory. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Code | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Flags | First Hdr | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Encapsulated Packet... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type: 1 Flow Discovery Request Code: 1 Encapsulated traffic flow data packet Encapsulated Packet: The first 256 bytes of a data packet belonging to the flow are encapsulated in this field of the packet Flags: 0x1: fan-out option; if set, the transit node SHOULD forward the Flow Discovery Request packet to all possible egress links for the specified flow. Since use of the fan-out option is liable to create multiple instances of the packet through each egress link possible in a LAG or ECMP situation, this should be used with caution. A specific admin command knob should be available to turn this option off or on, on the device. Thus even if fan-out is requested in the Flags the fan-out discovery is done only if the said transit device permits it through an admin command knob. First Hdr: Specifies the first header that appears in the encapsulated packet. The values defined by this document are: 0x1: Layer 2 MAC Header 0x2: IPv4 Header Janardhanan et.al. Expires July 2012 [Page 22] INTERNET DRAFT Traceflow January 2012 0x3: IPv6 Header 0x4: MPLS Header Janardhanan et.al. Expires July 2012 [Page 23] INTERNET DRAFT Traceflow January 2012 3.4.2. Encapsulated Packet Mask TLV This TLV allows the operator to specify what portion of the encapsulated packet carries flow data and what portion is left unspecified. This allows the intermediate nodes to determine if they have enough information to calculate an egress interface to forward the Flow Discovery Request packet. If this TLV is omitted from the Flow Discovery Request packet, no portion of the packet is left unspecified and the transit device may use any of the fields to make the forwarding decision. This TLV is optional. This TLV includes a sequence of tuples. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Code | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | No. of tuples | byte-offset-1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | no. of bytes | byte-mask-1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | byte-offset-2 | no. of bytes | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | byte-mask-1 | ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ No. of tuples: Total number of tuples carried in this TLV Byte-offset: The byte offset for the field being specified No. of bytes: The number of bytes from the byte-offset to consider Mask: The mask to be applied to the bytes starting at the byte- offset. This specifies the bits starting at byte-offset the length of which is specified by the number of bytes which is to be used in determinaton of the information to calculate the egress interface to forward the Flow Discovery Request packet. Janardhanan et.al. Expires July 2012 [Page 24] INTERNET DRAFT Traceflow January 2012 3.4.3. Record Route TLV This TLV is used to record the information about the path taken by a Flow Discovery Request packet as it traverses through the network. It is included by the originator and each transit device processing the Flow Discovery Request packet includes information about its incoming interface in this TLV. This TLV is included in the response sent by the transit nodes (in trace-route mode) to the originator of the Flow Discovery Request packet. This TLV is optional. However if it is included by the originator node in the Flow Discovery Request packet, the subsequent nodes SHOULD prepend to the list of addresses. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Code | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Address-type | Incoming interface Address... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type : 9 Record-Route TLV Code: 1 Address-type: 0x1: IPv4 Address 0x2: IPv6 Address Incoming interface Address: This field carries the incoming interface address at the device processing the Flow Discovery Request packet. Each node receiving the request packet with this TLV should prepend its incoming interface address to this TLV. The device SHOULD include the Record-Route TLV as it received on its input interface in the Flow Discovery Response packet it sends out. Janardhanan et.al. Expires July 2012 [Page 25] INTERNET DRAFT Traceflow January 2012 4. Protocol Operation A Flow Discovery Request packet is a UDP packet addressed to a well- known destination port. The source UDP port in the packet is ephemeral. It consists of a "Flow Descriptor" TLV that allows the originator of the request to encode a flow data packet in the TLV. On Layer 3 or multi-layer devices that incorporate Layer 3 based forwarding, using a UDP port would be most useful. Hardware support for this needs to be provided in terms of programming a filter that inspects a packet for a specific UDP destination port and punts the same to the software. Layer-2 devices in L2 clouds are passed through and so are MPLS LSRs. For the pure L3 devices the ability to setup the filter to enable traceflow should be turned on by a per-device knob. Certain fields in a traffic flow data packet get modified by the transit devices as the data packet traverses the network. A transit device that processes a Flow Discovery Request packet would need to edit those fields in the encapsulated data packet that represents the flow. Some such fields are source and destination MAC Addresses and MPLS label stack. Consider a transit device that uses the source or destination MAC address of a data packet in order to determine the egress port. The transit device could choose to pick up the MAC addresses from the external header of the Flow Discovery Request packet or from the encapsulated packet. TraceFlow can operate in two separate modes: 1) Trace-route mode: In the traceroute mode of operation, each transit device and the end node respond to the Flow Discovery Request packet by sending a flow discovery response. 2) Ping mode: Transit nodes do not send a response message to the originator. Rest of the behavior is same as traceroute mode. The following applies to Ping and Traceroute mode unless otherwise specified. The destination address of the Flow Discovery Request packet is the destination address for the desired traffic flow. In Ping mode a separate TLV may be included that specifies a list of addresses. If a device processing the Flow Discovery Request packet notices that one of its IP addresses matches with one of the addresses specified in the Termination TLV, then the device MUST NOT forward the Flow Discovery Request packet further and send a response packet to the originator. Janardhanan et.al. Expires July 2012 [Page 26] INTERNET DRAFT Traceflow January 2012 The Flow Discovery Request packet travels the exact same path that a data packet for the specified traffic flow would have followed. This includes the exact physical or logical interface that belongs to a LAG or a set of ECMP paths. It is important to note here that the hardware supports a mechanism to determine where the packet would be forwarded and send the result to the software as well as inject the packet to the next-hop along the way to the destination. If per-packet loadbalancing is enabled on the way to the destination then it would be ambiguous to return the Discovery response packet since another iteration of flow discovery packets headed through the node would result in packet being forwarded across a interface (logical or otherwise) which is different from the one in the previous iteration. So if per-packet is enabled on the multipaths that exist (ECMP or otherwise) it is important to return in the response packet that it is so configured on that node. A status code is reserved for this to note this anomaly. This may totally vary the path that is taken by a traceflow packet than an actual data packet if two or more ECMP or UCMP paths exist. The device interested in receiving information about the traffic flow originates a Flow Discovery Request packet. The Flow Descriptor TLV in this packet specifies the flow of interest whereas a Requested Information TLV specifies the flow related information that the originator device is requesting from each transit router. The Flow Discovery packet needs to be processed by all routers along the path to the destination. This can be achieved by using a well-known UDP port as the destination port in the UDP header. When a transit device receives a Flow Discovery Request packet, it reads the flow information from the Flow Descriptor TLV, looks up the local forwarding database(s) and determines an egress port or ports for this traffic flow. The transit device forwards the Flow Discovery packet along the egress port calculated using this lookup. The egress port is calculated based on the flow information from the Flow Descriptor TLV in the request packet and not based on destination IP address in the IP header of the Flow Discovery Request packet. When processing the Flow Discovery Request packet, the transit node MUST consider the packet length specified in the encapsulated packet in the Flow descriptor TLV. The transit device also gathers the relevant information for the flow which could include details such as: 1. incoming and outgoing interface related details such as ifIndex, IP Address, Lag and ECMP related information. 2. Next-Hop Router information Janardhanan et.al. Expires July 2012 [Page 27] INTERNET DRAFT Traceflow January 2012 The transit device processing the Flow Discovery Request packet may choose to respond to only a subset of the information requested in the Flow Discovery Request packet. The transit device includes additional information related to the incoming or outgoing LAG or ECMP interface. This additional information includes the number of LAG or ECMP links that are configured and their operational status and the parameters included in the hashing algorithm that is used to select an egress port for the traffic flow. This information is sent back to the IP address specified as the Originator IP Address in the Flow Discovery Request packet. In case the Indirect Request is used the Originator TLV specifies the IP address else the source IP address in the outer header is the Originator address. The Flow Discovery Request packet includes a hop count field which is initialized to the same value as the IP header's TTL field. This hop count field is decremented by one at each intermediate hop router that processes the Flow Discovery Request. In conjunction with the TTL field in the IP header this hop count field can help determine if there are any intermediate routers that do not support the TraceFlow protocol. When an intermediate hop router detects that the hop count field is greater than the IP header TTL field it indicates that one or more previous hop routers do not support the TraceFlow protocol. This information is added to the response sent to the Originator IP Address. Thus the intermediate router after one or more hops of devices not supporting Traceflow, will determine the fact that one or more previous devices did not support Traceflow. The output at the Originator end can be customized to display in the following format.. Device 1: (Description) (Traceflow capable) Unknown Devices : n (where n >= 1) Device 2: (Description) (Traceflow capable) The IP TTL field as well as the hopcount field SHOULD be initialized to values that limit the Flow Discovery Request packet to the desired network boundary. This may be required to restrict the Traceflow packets to specific boundaries within an administrative domain given that there are well defined such boundaries within the domain. A router can originate periodic Flow Discovery Requests for a traffic flow. The Query ID field in the Flow Discovery Request packet helps the originator identify the responses from the transit routers as they process the request. Janardhanan et.al. Expires July 2012 [Page 28] INTERNET DRAFT Traceflow January 2012 When processing a Flow Discovery Request packet at a device along the path towards the destination it is likely that the device may encounter an error condition and is not able to continue processing the packet. Some examples of the error conditions are: 1. TraceFlow protocol has been administratively disabled 2. Unicast RPF check failed for the flow specified in the Flow Discovery Request packet 3. No route exists in the routing table to route the flow specified in the Flow Descriptor TLV. 4. IP TTL or the Hop Count field in the Flow Discovery Packet becomes zero. The "Result TLV" is used to carry this information back to the originator of the Flow Discovery Request packet. It is also likely that the device is able to successfully process the Flow Discovery Request packet; however it encounters a condition during the processing that may be of interest to the originator. Some examples of such conditions are: 1. The flow specified in the Flow Descriptor TLV would be dropped due to Ingress ACL or Egress ACL policies 2. Dataplane failure may prevent the specified flow from being successfully switched/routed. 3. IP TTL and the Hop-count field in the Flow Discovery Request packet do not match possibly due to one or more previous hop routers not supporting the TraceFlow protocol. 4. The specified flow would be routed using default route in the routing table. This information is returned to the originator of the Flow Discovery Request packet using the "Additional Information Code TLV". The originator of the Flow Discovery Request packet may set the fan- out bit in the Flow Descriptor TLV to request the transit node to forward the request packet through all possible egress ports for the specified flow. The transit device would process the Flow Discovery Request packet as described above and forward it out of all possible egress ports in multipath scenarios. If the fan-out option is selected, the Flow Discovery Request packet received, is forwarded only on the primary port of the LAG interface. The primary port Janardhanan et.al. Expires July 2012 [Page 29] INTERNET DRAFT Traceflow January 2012 selected may differ from vendor to vendor. This helps reduce the number of redundant request packets generated as a result of the fan- out behavior. The originator of the request packet with the fan-out option enabled may get redundant responses in certain circumstances. Note that the LAG details are provided in the response packet, only if the LAG exists on an L3 device. This is due to the fact that L2 devices supporting LAG do not have the capability to process the Traceflow protocol for now. In future drafts L2 support may be added to the Traceflow protocol and at that point it may be dealt with in detail. 4.0.1 Assessing why redundant responses come through. In case a fan-out happens at a initial point in the path towards the destination, there might be a case that the paths diverge initially and cover a few transit devices before they re-converge to one more points to the destination. In this case the multiple fan-out Discovery packets may result in redundant responses from the same re- converged transit devices along the way. This can be used to find out if there exist totally dis-joint paths to the destination. If the redundant responses emanate from the ultimate destination it is reasonably easy to figure out that there exist totally dis-joint paths to the destination. But if in case redundant responses arise from transit devices much earlier than the destination there would be a need to assume that the reconvergence of paths (partially dis-joint case) has occurred earlier to the ultimate destination. This would be a most opportune moment to use this feature for finding all possible paths by correlating the information received at the originator using an Network management station on an appliance or otherwise. The Flow Discovery Request packet SHOULD pass through the Layer 2 or MPLS routed segments along the path in pass-through mode as data packets. The appendix discusses the possibility of extending the TraceFlow protocol to allow the devices in the Layer 2 and MPLS segments along the path of the traffic flow to respond to the Flow Discovery Request packet. But this is saved for future work. The discussion so far has assumed that the Flow Discovery Request packet would originate on one device (say device A) and terminate on some other device (say device B). It is likely that a third device (say device C) would be interested in obtaining the flow related information for a flow traversing from device A to device B. In this case, device C sends a Flow Discovery Packet to device A. The Flow Discovery Request type specified in the packet would indicate to device A that this is an indirect request from device C to obtain information relevant to the flow specified in the Flow Descriptor TLV. Device A then generates a new Flow Discovery Request packet with Janardhanan et.al. Expires July 2012 [Page 30] INTERNET DRAFT Traceflow January 2012 the destination IP set to device B and the Originator IP Address set to device C. All transit routers that process this request would send their responses to device C. See security considerations to get more information on issues with the indirect mode and ways to mitigate them. 4.1. Using Hardware to gather details for the response packet. It is RECOMMENDED that the TLVs SHOULD be filled with as much information gathered directly by reading the hardware elements that are used in forwarding of a flow. 4.2 Interaction with MPLS based transit devices. Current MPLS ping standard supports ping/traceroute between ingress and egress LSRs only. There is need for a singular probe that traces all types of hops which includes MPLS LSRs which can be addressed with our protocol. But we intend to support only pass pipe mode (pass through) of tracing where entire MPLS lsp is treated as a single interface. Uniform mode where we trace every hop along the way is totally excluded in this scheme. It may however be taken up for future work. In the MPLS case given the difference in the TTL value one can arrive at the conclusion that the MPLS network in the middle did a pass through of the packet. The egress LER can begin to send back the Discovery responses from where the Ingress LER left off. 4.3 Applicability to Layer 2 devices. Layer 2 devices in this version of the draft are totally bypassed with respect to Traceflow. L2 devices are expected to merely forward the Traceflow frames. Future work may be done to extend to support Traceflow on Layer 2 devices. 4.4 Applicability to platforms that have trouble determining incoming Interface. Appropriate hardware assists need to be done to indicate to the software as regards which incoming interface the packet came on with regard to platforms that have trouble determining which interface the packet came through. 4.5 Applicability to Network Address Translators This aspect has not been studied well as yet and future revisions of the draft or addendum documents to this draft may make this behaviour more clearer. The aspect to worry about is the shipping back of the Janardhanan et.al. Expires July 2012 [Page 31] INTERNET DRAFT Traceflow January 2012 response packet to the originator in case the outer IP header is subject to translation. Both the encapsulated packet and the outer IP header may need to undergo translation. Normally firewalls that surround NATs or are in-built with the capability of NATs may drop packets for which the port assignments are not set for pass-thru or translation. So some hole poking on the firewall may be required to pass the response through to get the response packet back to the originator. As specified, this aspect has to be thought through and document in subsequent versions or added as additional drafts modifying the behaviour to enable NAT traversal of Traceflow packets. One advantage though is that since the request and response is not an ICMP packet, the Traceflow packets may need to be considered as mere data packets and may pass through without a hitch. Trust boundaries as encompassed by firewalls may however not like the intrusion. 5. Application Scenarios This section discusses Trouble-shooting applications of this proposal. The application scenarios can broadly be divided into two categories: 1. Troubleshooting network failures 2. Network planning 5.1. Troubleshooting network failures Several network monitoring tools provide us the capability to monitor the health of a network by polling information from the network devices (primarily through the use of SNMP). They help us in detecting network failures, imminent failures or other anomalies in the network. For troubleshooting these failures, the network operators typically rely initially on tools such as ping and traceroute. Unfortunately they do not provide detailed information about the traffic flow that is affected for a couple of reasons: 1. It is likely that ping and traceroute control packets follow a different path through the network compared to the traffic flow that is being investigated - for example when policy-based routing is in effect or when there are one or more ECMP segments along the path of the traffic flow. 2. Ping and traceroute do not provide us with details about the constituent members of a port-channel trunk through which the affected flow would have traversed. Janardhanan et.al. Expires July 2012 [Page 32] INTERNET DRAFT Traceflow January 2012 3. It is common practice to rate limit ping and traceroute traffic at the router. This creates a lack of deterministic responses to ping and traceroute. Being able to trace the exact path that a particular flow might have taken through the network and obtain all relevant information about the hops along that path provides the network operator with enough information to troubleshoot a network failure quickly. By setting the fan-out bit in the Flow Descriptor TLV, the operator should be able to determine all possible paths through the network that traffic to a particular destination may take. Along with the paths, the operator should also be able to obtain information relevant to the traffic flow from transit devices along the paths. This might prove to be useful in trouble-shooting certain type of network problems. 5.2. Network flow planning During production, it may be useful to know which ephemeral source port can be used to divert the flow on a suitable LAG member or an ECMP component link by using Traceflow packets with different ephemeral source port / ports in a range. It would be useful to determine that the network access-lists are properly configured and the traffic would not get blocked inadvertently by an access-list somewhere. Typically the issues listed above are discovered once the network is in production. By having the ability to exercise the traffic flow's data path before it starts handling production traffic would help the operator to: 1. Rectify any configuration issues such as ACL policies. 2. Modify the ephemeral source port to get the flow traffic to flow across a specific constituent member of a port-channel trunk or an ECMP path Note that this application of the Traceflow protocol may not be relevant to all types of networks. Campus networks, enterprise networks and datacenters with well defined traffic flow patterns may benefit from the capability to detect the above problems. However for tier 1 providers this application of the TraceFlow has limited relevance as the traffic flows are not well-defined. The operator may use the fan-out bit in the Flow Descriptor TLV to Janardhanan et.al. Expires July 2012 [Page 33] INTERNET DRAFT Traceflow January 2012 request the transit devices to provide all the paths that traffic flow to a certain destination address would take. This allows the operator to validate the ECMP or LAG configuration in the network. 5.2.1 Programmatic migration to mitigate LAG link polarization In later versions of the openflow specification virtual ports such as LAGs are exposed to the openflow forwarding path. It is imperative that the controller has a standards based ability to discover lag hashing functionality. Through the traceflow discovery and fanout process the controller is able to proactively determine which action to take to influence flows to move from one Lag member to another. This will aid in the automated troubleshooting of link polarity problems Janardhanan et.al. Expires July 2012 [Page 34] INTERNET DRAFT Traceflow January 2012 6. Security Considerations This section discusses threats to which TraceFlow might be vulnerable and discusses means by which those threats might be mitigated. There is a concern that this protocol might allow an external user to probe the detailed path that a flow takes through a network. The network operator can associate multiple levels with the different types of information that are included in the response to a Flow Discovery Request packet. For example only the "Next Hop Router" may be marked as publicly accessible information whereas everything else may be marked as private information. On receiving a Flow Discovery Request packet originating outside the local network, only the publicly accessible information is included in the response to the originator. However if the request was originated locally the device includes all requested information in the response. The Result TLV and Additional Information Codes TLV provide detailed information about the processing of the Flow Discovery Request packet and may possibly leak information about the locally configured policies. The amount of information to be included in these TLVs should also depend on whether the request was originated externally or internally. The network operator may choose to silently drop the Flow Discovery Request packet without providing any indication of the reason for doing so if the request was originated externally. Today most network operators throttle conventional OAM traffic (For example ping and traceroute) that is serviced by the device to protect against Denial-of-Service attacks. Such mechanisms should be employed for TraceFlow packets for the same reason. Rate limiting any packets punted to the software can include traffic relating to management plane. Many platforms offer to rate limit M no of packets per second or per minute. Facilities like these can be used to procure a rate limited quantum of traffic to go to the management plane as would be the case in Traceflow traffic. Configuring M would be a user provided option with a default set to a suitable quantum. Hardware assisted rate limiting would be a pre-requisite for this feature. 7. Hardware pre-requisites for implementing Traceflow. 7.1 filter to trap packets with UDP destination port Filters with a corresponding PUNT to software action should be programmable in hardware to trap packets with UDP destination port signifying Traceflow packets. For platforms that support hardware Janardhanan et.al. Expires July 2012 [Page 35] INTERNET DRAFT Traceflow January 2012 based filtering would benefit most from this filter support. All Layer 3 devices would be most appropriate for programming this filter. However please note that the UDP port based filter will not be and SHOULD not be applied to MPLS packets or IP-in-IP tunneled packets. This tunneling variety of packets be it MPLS or IP-in-IP (include IP-GRE) are out of scope of this document. 7.2 Packet injection mode directly to egress port. For the purpose of making Traceflow take a proper output member in a LAG or ECMP case, there should be packet injection mode supported in hardware. Once the software control plane for Traceflow gets the packet, the updated packet should be sent across to the appropriate next-hop transit device through the appropriate LAG or ECMP member as is calculated by the hardware algorithm and for this purpose the hardware should support packet injection mode directly to egress port without interference from the hardware forwarding engine. In this mode the software sends the packet across to the egress port bypassing the hardware forwarding engine from the software control plane to make it take the appropriate LAG or ECMP member which ever is appropriate. 7.3 Packet injection mode through hardware engine but not to output port. For the purpose of making Traceflow provide the proper result as to which LAG / ECMP member the packet will go out on, the hardware should provide assist to the CPU to inject the packet to get the forwarding result but not route or switch the packet onto the next- hop. 7.4 Hardware rate limiter support (preventing DOS attacks) There should exist support for hardware rate limiter based on filters in order that DOS attacks are not mounted on the control plane / the software part of the Traceflow engine. Normally the control plane of the Traceflow engine exists in the Router Processor Module of the transit devices or the end device against which a Traceflow traceroute and ping packets are sent respectively. This hardware rate limiter makes use of the filter to count the number of packets per unit time like a minute to determine if too many Traceflow packets are being sought to be sent to the control plane in the Route Processor Module. This is another requirement from the hardware. 7.5 RPF check support in hardware (security consideration) To implement security across trust boundaries Reverse Path Forwarding check (RPF check) should be enabled on the domain's boundary devices. Janardhanan et.al. Expires July 2012 [Page 36] INTERNET DRAFT Traceflow January 2012 This is to ensure that the IP addresses internal to the domain are not used by outside entities to initiate a Traceflow from the outside of the boundary of the domain in question. 7.6 Regular Security ACLs in the boundary of the network. Apart from RPF check to check whether the Originator IP address is internal to the network and is being spoofed from an outside the boundary entity, regular security ACLs should be programmed at the boundary to ensure that outside entities are not allowed to generate Traceflow packets into the boundary and across into the insides of a network domain. 7.7 Implementing the LAG / ECMP using software state Earlier exact same hashing function / functions that the hardware implements was required to be implemented in the software control plane of the Traceflow engine in the Route Processor Module. This is in effect to determine the LAG or ECMP member through which the packet will be forwarded if sent through hardware. This mimicing is not sufficient as the hardware software synchronization may not be in place at that point in time. That is the hardware and software may be out of sync with each other resulting in the wrong result if mimicing the hardware in software, is the mechanism to get the result. The hardware would possibly give us a wrong result if actually exercised. In effect the hardware assist should support packet injection from CPU and provide the required results back to the CPU Traceflow control process. 7.8 Implementation considerations Several aspects of hardware utilize internal packet headers to determine aspects of an incoming packet such as ingress port, ACL based packet drops etc. All the said codes corresponding to the reasons why a packet is dropped should be determined through the packet injection mode available in a hardware in part utilizing these internal headers. This is so because when a packet is sought to be forwarded and is actually dropped in hardware the reason codes like ACL based drops, policing etc., should be available to the software control plane to construct the Traceflow response packet with their appropriate fields. 7.7.l Using ingress port as part of the LAG/ECMP hashing function. LAG / ECMP hashing function on certain platforms use the ingress port as well in their hashing to arrive at the LAG / ECMP member on which the packet is to be forwarded out on. Normally packet injection mode supporting platforms provide the ability to inject a packet into the Janardhanan et.al. Expires July 2012 [Page 37] INTERNET DRAFT Traceflow January 2012 hardware Forwarding Engine and make it look like the packet came in on a specific ingress port. Now on some vendor platforms this may not be possible. On platforms where the ingress port is not part of the equation to the hashing function, they can support Traceflow with normal packet injection supported. When ingress port is involved, CPU injection MAY be used. If we do so the LAG or ECMP that the packet takes MAY be different from the one that is actually chosen if the ingress port was taken into account. All this just because ingress port is part of a hashing function determining a LAG / ECMP member and some platforms dont support packet injection from software with the ingress port under consideration. 8. IANA Considerations TraceFlow protocol would need a UDP port assignment to be used as the destination port in the TraceFlow packets. 9. Contributors This document in its original version was submitted to the IETF on August 16th 2008 by the following authors. These authors were namely A. Viswanathan, S. Krishnamurthy, R. Manur, V. Zinjuvadia who at that time were part of Force10 Networks with inputs and suggestions from Shane Amante. We would like to acknowledge their contribution to this draft as in its original version. This document was prepared using Nroff Internet Draft Editor. APPENDIX A: A.1. Encapsulation Format Choices A.1.1. Carrying a separate Flow Descriptor TLV inside the Flow Discovery Request packet This is the approach selected for this proposal. In order to specify a flow, the originating device encapsulates the entire data packet belonging to the traffic flow of interest in the Flow Descriptor TLV. If a traffic flow data packet is not readily available, the operator Janardhanan et.al. Expires July 2012 [Page 38] INTERNET DRAFT Traceflow January 2012 may have to generate a data packet with the traffic flow information available and encapsulate that in the Flow Descriptor TLV. Future revisions of this document may update the Flow Descriptor TLV if there is a need to allow the Flow Descriptor TLV to carry individual flow parameters (such as the Source IP Address, Destination IP Address, UDP/TCP Port numbers, etc.) in sub-TLV format rather than using an encapsulated data packet. A.1.2. Using the traffic flow's parameter values in the external header. This is done to encapsulate the Flow Discovery Request packet. This approach involves using the traffic flow's header as the outer header of the Flow Discovery Request packet. This ensures that the Flow Discovery Request packet would take the same path as the traffic flow would have. We could use Layer 2 EtherType to differentiate between this OAM packet and the data packets belonging to the traffic flow. This approach was not selected due to the added requirement on the intermediate devices to process new EtherType which might be limited by hardware. Moreover it is likely that the OAM packet would have to make a stop at the intermediate device anyway in order to gather the relevant information for the traffic flow specified. If the Flow Discovery Request packet does not use a special EtherType, it would be difficult for network operator to filter these OAM packets as they would be indistinguishable compared to the traffic flow. Moreover such TraceFlow OAM packets may be considered as 'spoofed' packets. Even though this approach is not being selected for TraceFlow protocol in this document, it helps TraceFlow protocol in supporting certain networks with legacy devices (not supporting TraceFlow). This approach may be reconsidered in future revisions of this document. A.2. Layer 4 Protocol Choices and Router Alert option A.2.1. UDP Encapsulation This approach has been selected in this proposal. The Traceflow packets are UDP packets with a well-known destination port number (to be requested from IANA). A.2.2. ICMP Encapsulation This approach involves sending TraceFlow packets as ICMP packets. This was not selected in this proposal due to the simplicity of the UDP approach. Janardhanan et.al. Expires July 2012 [Page 39] INTERNET DRAFT Traceflow January 2012 A.3. Legacy Devices (Not supporting TraceFlow) It is necessary that the entire flow information available through the encapsulated packet in the Flow Discovery Request packet be used in determining the egress port. If the Flow Discovery Request packet reaches a legacy device that does not support TraceFlow, it is likely that the request packet gets forwarded along a different egress link compared to the egress link through which the data packets belonging to the traffic flow would have been forwarded. Hence the information received from the transit routers beyond the legacy device in a TraceFlow probe may not be useful. Typically if the legacy device does not employ LAGs or ECMP paths or policy-based routing, the TraceFlow packet may proceed in the direction that the traffic flow would have taken and subsequent transit nodes may still be able to provide useful and relevant information to the originator of the Flow Discovery Request packet. A.4. TTL Scoping Conventional traceroute employs TTL Scoping as a means to determine the path followed by destination address based hop-by-hop routing of a packet. TraceFlow protocol does not employ TTL Scoping in the current specification. However using TraceFlow with TTL Scoping has certain applications in networks that contain some legacy devices that do not support TraceFlow. This may be explored in future revisions of this document if there is interest in the community to solve this problem. An implementation may allow the operator to send out the TraceFlow packets with TTL Scoping just like conventional traceroute. In such a mode following points should be noted: 1) The originator node may receive multiple packets from the transit nodes - an ICMP 'TTL Expired' packet and a TraceFlow response packet 2) In this mode, the transit devices SHOULD send out the TraceFlow response packet only if the TTL has also expired for that Flow Discovery Request packet on that device. This is needed to prevent duplicate Flow Discovery Response packets from the transit node for each request packet that the originator device sends when performing TTL Scoping. A.5. Additional Information in the Flow Discovery Response This document lists the information that can be requested by the Janardhanan et.al. Expires July 2012 [Page 40] INTERNET DRAFT Traceflow January 2012 originator of the TraceFlow Flow Discovery Request packet and that may be included by the transit devices in their response. Future revisions of this document may modify this list based on the feedback from the community. For example the QoS related statistics and queue depth information may be included in the Flow Discovery Response packets for the traffic flow being investigated. A.6. Choices for supporting remote TraceFlow requests A.6.1. Terminating the request at the Proxy device and re-originate it This approach was selected in this proposal. For indirect Flow Discovery Requests, the originating device sends the request to another proxy device that is the intended starting point for probing the flow and gathering relevant information about the flow. This proxy device receives the Flow Discovery Request packet, processes it and re-originates a Flow Discovery Request towards the destination of the flow. A.6.2. Source-Routing the request through the Proxy device This approach involved sending the Flow Discovery Request with IP Source Routing option that forced the packet to be received by the proxy device that is the intended starting point for probing the flow and gathering relevant information about the flow. It was not selected for this proposal. A.7. Applicability to Multicast Multicast networks have also evolved into more complex heterogeneous networks in the recent years. These advancements place more burden on multicast OAM tools employed by network operators. Troubleshooting network problems, monitoring network performance and network planning and provisioning become difficult due to the gap between the complexities in the network compared to the capabilities of the OAM tools. Mtrace [4] has evolved into a useful OAM tool to address some of the problems faced in multicast network. However it does not address all the problems discussed in this document. We believe that TraceFlow protocol can be extended to assist the network operator with their multicast deployments. Specific mechanics of any such extensions may be defined in the later versions of the draft. A.8. Applicability to Layer 2 networks The Layer 2 devices in the path taken by the TraceFlow packets should be able to snoop on the higher layer headers in the packet to determine that it is a TraceFlow Flow Discovery Request packet. Most of the TraceFlow packet processing and operations discussed in this Janardhanan et.al. Expires July 2012 [Page 41] INTERNET DRAFT Traceflow January 2012 document should apply to the layer 2 devices also. But however, the current version of the draft treats Layer 2 devices as pass-through. Refer to section 4.3 to see more of the discussion with respect to this issue. However specific mechanics of any separate extensions necessary for Layer 2 networks may be defined in the later versions of the protocol. A.9. Applicability to IPv6 The TraceFlow protocol described in this document should apply to IPv6 networks or IPv4-IPv6 dual stack networks with straight-forward extensions. Specific mechanics of extensions to address IPv6 networks may be defined in the later versions of the draft. A.10. Applicability to MPLS MPLS networks are to be considered at a later point in time in the future. Revisions or addendums to this proposal to include MPLS networks are currently out of scope of this document. A.11. Flow Discovery and Response packet fragmentation It is highly RECOMMENDED that the network allow the Flow Discovery Request packet to travel through to the destination without fragmentation. The Flow Discovery Response packet that is originated by the transit devices processing the request packet may be fragmented on its way to the originator device. 9. References 9.1. Normative References [KEYWORDS] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC1776] Crocker, S., "The Address is the Message", RFC 1776, April 1 1995. [TRUTHS] Callon, R., "The Twelve Networking Truths", RFC 1925, April 1 1996. 9.2. Informative References Janardhanan et.al. Expires July 2012 [Page 42] INTERNET DRAFT Traceflow January 2012 [EVILBIT] Bellovin, S., "The Security Flag in the IPv4 Header", RFC 3514, April 1 2003. [RFC5513] Farrel, A., "IANA Considerations for Three Letter Acronyms", RFC 5513, April 1 2009. [RFC5514] Vyncke, E., "IPv6 over Social Networks", RFC 5514, April 1 2009. Author's Addresses Janardhanan Narasimhan.P, Dell-Force10, Olympia Technology Park, Fortius block, 7th & 8th Floor, Plot No. 1, SIDCO Industrial Estate, Guindy, Chennai - 600032. TamilNadu, India. Tel: +91 (0) 44 4220 8400 Fax: +91 (0) 44 2836 2446 Email: Pathangi_janardhanan@dell.com Balaji Venkat Venkataswami, Dell-Force10, Olympia Technology Park, Fortius block, 7th & 8th Floor, Plot No. 1, SIDCO Industrial Estate, Guindy, Chennai - 600032. TamilNadu, India. Tel: +91 (0) 44 4220 8400 Fax: +91 (0) 44 2836 2446 Email: BALAJI_VENKAT_VENKAT@dell.com Richard Groves, Microsoft Corporation, One Microsoft Way, Redmond, WA 98052 Email: rgroves@microsoft.com Peter Hoose, Facebook, Willow Rd., Menlo Park, CA 94025 Email: phoose@fb.com Janardhanan et.al. Expires July 2012 [Page 43]