ANIMA T. Eckert, Ed. Internet-Draft Huawei Intended status: Informational M. Behringer Expires: July 20, 2018 January 16, 2018 Using Autonomic Control Plane for Stable Connectivity of Network OAM draft-ietf-anima-stable-connectivity-08 Abstract OAM (Operations, Administration and Maintenance - as per BCP161, (RFC6291) processes for data networks are often subject to the problem of circular dependencies when relying on connectivity provided by the network to be managed for the OAM purposes. Provisioning while bringing up devices and networks tends to be more difficult to automate than service provisioning later on, changes in core network functions impacting reachability cannot be automated because of ongoing connectivity requirements for the OAM equipment itself, and widely used OAM protocols are not secure enough to be carried across the network without security concerns. This document describes how to integrate OAM processes with an autonomic control plane in order to provide stable and secure connectivity for those OAM processes. This connectivity is not subject to aforementioned circular dependencies. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on July 20, 2018. Eckert & Behringer Expires July 20, 2018 [Page 1] Internet-Draft AN Stable Connectivity OAM January 2018 Copyright Notice Copyright (c) 2018 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1. Self-dependent OAM Connectivity . . . . . . . . . . . . . 3 1.2. Data Communication Networks (DCNs) . . . . . . . . . . . 3 1.3. Leveraging a generalized autonomic control plane . . . . 4 2. Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1. Stable Connectivity for Centralized OAM . . . . . . . . . 5 2.1.1. Simple Connectivity for Non-GACP capable NMS Hosts . 6 2.1.2. Challenges and Limitation of Simple Connectivity . . 7 2.1.3. Simultaneous GACP and data-plane Connectivity . . . . 8 2.1.4. IPv4-only NMS Hosts . . . . . . . . . . . . . . . . . 9 2.1.5. Path Selection Policies . . . . . . . . . . . . . . . 12 2.1.6. Autonomic NOC Device/Applications . . . . . . . . . . 14 2.1.7. Encryption of data-plane connections . . . . . . . . 15 2.1.8. Long Term Direction of the Solution . . . . . . . . . 15 2.2. Stable Connectivity for Distributed Network/OAM . . . . . 16 3. Architectural Considerations . . . . . . . . . . . . . . . . 16 3.1. No IPv4 for GACP . . . . . . . . . . . . . . . . . . . . 16 4. Security Considerations . . . . . . . . . . . . . . . . . . . 17 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 19 7. Change log [RFC Editor: Please remove] . . . . . . . . . . . 19 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 22 8.1. Normative References . . . . . . . . . . . . . . . . . . 22 8.2. Informative References . . . . . . . . . . . . . . . . . 23 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 24 1. Introduction Eckert & Behringer Expires July 20, 2018 [Page 2] Internet-Draft AN Stable Connectivity OAM January 2018 1.1. Self-dependent OAM Connectivity OAM (Operations, Administration and Maintenance - as per BCP161, [RFC6291]) for data networks is often subject to the problem of circular dependencies when relying on the connectivity service provided by the network to be managed. OAM can easily but unintentionally break the connectivity required for its own operations. Avoiding these problems can lead to complexity in OAM. This document describes this problem and how to use an autonomic control plane to solve it without further OAM complexity: The ability to perform OAM on a network device requires first the execution of OAM necessary to create network connectivity to that device in all intervening devices. This typically leads to sequential, 'expanding ring configuration' from a NOC (Network Operations Center). It also leads to tight dependencies between provisioning tools and security enrollment of devices. Any process that wants to enroll multiple devices along a newly deployed network topology needs to tightly interlock with the provisioning process that creates connectivity before the enrollment can move on to the next device. When performing change operations on a network, it likewise is necessary to understand at any step of that process that there is no interruption of connectivity that could lead to removal of connectivity to remote devices. This includes especially change provisioning of routing, forwarding, security and addressing policies in the network that often occur through mergers and acquisitions, the introduction of IPv6 or other mayor re-hauls in the infrastructure design. Examples include change of an IGP or areas, PA (Provider Aggregatable) to PI (Provider Independent) addressing, or systematic topology changes (such as L2 to L3 changes). All these circular dependencies make OAM complex and potentially fragile. When automation is being used, for example through provisioning systems, this complexity extends into that automation software. 1.2. Data Communication Networks (DCNs) In the late 1990s and early 2000, IP networks became the method of choice to build separate OAM networks for the communications infrastructure within Network Providers. This concept was standardized in ITU-T G.7712/Y.1703 [ITUT] and called "Data Communications Networks" (DCN). These were (and still are) physically separate IP(/MPLS) networks that provide access to OAM interfaces of all equipment that had to be managed, from PSTN (Public Eckert & Behringer Expires July 20, 2018 [Page 3] Internet-Draft AN Stable Connectivity OAM January 2018 Switched Telephone Network) switches over optical equipment to nowadays Ethernet and IP/MPLS production network equipment. Such DCN provide stable connectivity not subject to aforementioned problems because they are a separate network entirely, so change configuration of the production IP network is done via the DCN but never affects the DCN configuration. Of course, this approach comes at a cost of buying and operating a separate network and this cost is not feasible for many providers, most notably smaller providers, most enterprises and typical IoT networks (Internet of Things). 1.3. Leveraging a generalized autonomic control plane One of the goals of the IETF ANIMA (Autonomic Networking Integrated Model and Approach ) working group is the specification of a secure and automatically built inband management plane that provides similar stable connectivity as a DCN, but without having to build a separate DCN. It is clear that such 'in-band' approach can never achieve fully the same level of separation, but the goal is to get as close to it as possible. This goal of this document is to discuss how such an inband management plane can be used to support the DCN-like OAM use-case, leverage its stable connectivity and details the options of deploying it incrementally - short and long term. The evolving ANIMA working groups specification [I-D.ietf-anima-autonomic-control-plane] ) calls this inband management plane the "Autonomic Control Plane" (ACP). The discussions in this document are not depending on the specification of that ACP, but only on a set of high level constraints decided early on in the work for the ACP. Unless being specific about details of the ACP, this document uses the term "Generalized ACP" (GACP) and is applicable to any designs that meet those high level constraints. For example - but not limited to - variations of the ACP protocol choices. The high level constraints of a GACP assumed and discussed in this document are as follows: VRF Isolation: The GACP is a virtual network ("VRF") across network devices - its routing and forwarding are separate from other routing and forwarding in the network devices. Non-GACP routing/ forwarding is called the "data-plane". IPv6 only addressing: The GACP provides only IPv6 reachability. It uses ULA addresses ([RFC4193]) that are routed in a location independent fashion for example through per network device subnet Eckert & Behringer Expires July 20, 2018 [Page 4] Internet-Draft AN Stable Connectivity OAM January 2018 prefixes. Automatic addressing in the GACP is therefore simple & stable: it does not require allocation by address registries, addresses are identifiers, they do not change when devices move, and no engineering of the address space to the network topology is necessary. NOC connectivity: NOC equipment (controlling OAM operations) either has access to the GACP directly or has an IP subnet connection to a GACP-edge device. Closed Group Security: GACP devices have cryptographic credentials to mutually authenticate each other as members of a GACP. Traffic across the GACP is authenticated with these credentials and then encrypted. The only traffic permitted in & out of the GACP that is not authenticated by these credentials is through explicit configuration the traffic from/to the aforementioned non-GACP NOC equipment with subnet connections to a GACP-edge device (as a transition method). The GACP must be built autonomic and its function must not be disruptable by operator or automated (NMS/SDN) configuration/ provisioning actions. These are allowed to only impact the "data- plane". This aspect is not currently covered in this document. Instead, it focusses on the impact of the above constraints: IPv6 only, dual connectivity and security. 2. Solutions 2.1. Stable Connectivity for Centralized OAM The ANI is the "Autonomic Networking Infrastructure" consisting of secure zero touch Bootstrap (BRSKI - [I-D.ietf-anima-bootstrapping-keyinfra]), GeneRic Autonomic Signaling Protocol (GRASP - [I-D.ietf-anima-grasp]), and Autonomic Control Plane (ACP - [I-D.ietf-anima-autonomic-control-plane]). Refer to [I-D.ietf-anima-reference-model] for an overview of the ANI and how its components interact and [RFC7575] for concepts and terminology of ANI and autonomic networks. This section describes stable connectivity for centralized OAM via the GACP, for example via the ACP with or without a complete ANI, starting by what we expect to be the most easy to deploy short-term option. It then describes limitation and challenges of that approach and their solutions/workarounds to finish with the preferred target option of autonomic NOC devices in Section 2.1.6. This order was chosen because it helps to explain how simple initial use of a GACP can be, how difficult workarounds can become (and Eckert & Behringer Expires July 20, 2018 [Page 5] Internet-Draft AN Stable Connectivity OAM January 2018 therefore what to avoid), and finally because one very promising long-term solution alternative is exactly like the most easy short- term solution only virtualized and automated. In the most common case, OAM will be performed by one or more applications running on a variety of centralized NOC systems that communicate with network devices. We describe differently advanced approaches to leverage a GACP for stable connectivity. There is a wide range of options, some of which are simple, some more complex. Three stages can be considered: o There are simple options described in sections Section 2.1.1 through Section 2.1.3 that we consider to be good starting points to operationalize the use of a GACP for stable connectivity today. These options require only network and OAN/NOC device configuration. o The are workarounds to connect a GACP to non-IPv6 capable NOC devices through the use of IPv4/IPv6 NAT (Network Address Translation) as described in section Section 2.1.4. These workarounds are not recommended but if such non-IPv6 capable NOC devices need to be used longer term, then this is the only option to connect them to a GACP. o Near to long term options can provide all the desired operational, zero touch and security benefits of an autonomic network, but a range of details for this still have to be worked out and development work on NOC/OAM equipment is necessary. These options are discussed in sections Section 2.1.5 through Section 2.1.8. 2.1.1. Simple Connectivity for Non-GACP capable NMS Hosts In the most simple candidate deployment case, the GACP extends all the way into the NOC via one or more "GACP-edge-devices". See also section 6.1 of [I-D.ietf-anima-autonomic-control-plane]. These devices "leak" the (otherwise encrypted) GACP natively to NMS hosts. They act as the default routers to those NMS hosts and provide them with IPv6 connectivity into the GACP. NMS hosts with this setup need to support IPv6 (see e.g. [RFC6434]) but require no other modifications to leverage the GACP. Note that even though the GACP only uses IPv6, it can of course support OAM for any type of network deployment as long as the network devices support the GACP: The data-plane can be IPv4 only, dual-stack or IPv6 only. It is always separate from the GACP, therefore there is no dependency between the GACP and the IP version(s) used in the data-plane. Eckert & Behringer Expires July 20, 2018 [Page 6] Internet-Draft AN Stable Connectivity OAM January 2018 This setup is sufficient for troubleshooting such as SSH into network devices, NMS that performs SNMP read operations for status checking, software downloads into autonomic devices, provisioning of devices via NETCONF and so on. In conjunction with otherwise unmodified OAM via separate NMS hosts it can provide a good subset of the stable connectivity goals. The limitations of this approach are discussed in the next section. Because the GACP provides 'only' for IPv6 connectivity, and because addressing provided by the GACP does not include any topological addressing structure that operations in a NOC often relies on to recognize where devices are on the network, it is likely highly desirable to set up DNS (Domain Name System - see [RFC1034]) so that the GACP IPv6 addresses of autonomic devices are known via domain names that include the desired structure. For example, if DNS in the network was set up with names for network devices as devicename.noc.example.com, and the well-known structure of the data- plane IPv4 addresses space was used by operators to infer the region where a device is located in, then the GACP address of that device could be set up as devicename_.acp.noc.example.com, and devicename.acp.noc.example.com could be a CNAME to devicename_.acp.noc.example.com. Note that many networks already use names for network equipment where topological information is included, even without a GACP. 2.1.2. Challenges and Limitation of Simple Connectivity This simple connectivity of non-autonomic NMS hosts suffers from a range of challenges (that is, operators may not be able to do it this way) or limitations (that is, operator cannot achieve desired goals with this setup). The following list summarizes these challenges and limitations. The following sections describe additional mechanisms to overcome them. Note that these challenges and limitations exist because GACP is primarily designed to support distributed ASA (Autonomic Service Agent, a piece of autonomic software) in the most lightweight fashion, but not mandatorily require support for additional mechanisms to best support centralized NOC operations. It is this document that describes additional (short term) workarounds and (long term) extensions. 1. (Limitation) NMS hosts cannot directly probe whether the desired so called 'data-plane' network connectivity works because they do not directly have access to it. This problem is similar to probing connectivity for other services (such as VPN services) that they do not have direct access to, so the NOC may already Eckert & Behringer Expires July 20, 2018 [Page 7] Internet-Draft AN Stable Connectivity OAM January 2018 employ appropriate mechanisms to deal with this issue (probing proxies). See Section 2.1.3 for candidate solutions. 2. (Challenge) NMS hosts need to support IPv6 which often is still not possible in enterprise networks. See Section 2.1.4 for some workarounds. 3. (Limitation) Performance of the GACP may be limited versus normal 'data-plane' connectivity. The setup of the GACP will often support only non-hardware accelerated forwarding. Running a large amount of traffic through the GACP, especially for tasks where it is not necessary will reduce its performance/ effectiveness for those operations where it is necessary or highly desirable. See Section 2.1.5 for candidate solutions. 4. (Limitation) Security of the GACP is reduced by exposing the GACP natively (and unencrypted) into a subnet in the NOC where the NOC devices are attached to it. See Section 2.1.7 for candidate solutions. These four problems can be tackled independently of each other by solution improvements. Combining some of these solutions improvements together can lead towards a candidate long term solution. 2.1.3. Simultaneous GACP and data-plane Connectivity Simultaneous connectivity to both GACP and data-plane can be achieved in a variety of ways. If the data-plane is IPv4-only, then any method for dual-stack attachment of the NOC device/application will suffice: IPv6 connectivity from the NOC provides access via the GACP, IPv4 will provide access via the data-plane. If as explained above in the simple case, an autonomic device supports native attachment to the GACP, and the existing NOC setup is IPv4 only, then it could be sufficient to attach the GACP device(s) as the IPv6 default router to the NOC subnet and keep the existing IPv4 default router setup unchanged. If the data-plane of the network is also supporting IPv6, then the most compatible setup for NOC devices is to have two IPv6 interfaces. One virtual ((e.g. via IEEE 802.1Q [IEEE802.1Q]) or physical interface connecting to a data-plane subnet, and another into an GACP connect subnet. See section 8.1 of [I-D.ietf-anima-autonomic-control-plane] for more details. That document also specifies how the NOC devices can receive auto configured addressing and routes towards the ACP connect subnet if it supports [RFC6724] and [RFC4191]. Eckert & Behringer Expires July 20, 2018 [Page 8] Internet-Draft AN Stable Connectivity OAM January 2018 Configuring a second interface on a NOC host may be impossible or be seen as undesired complexity. In that case the GACP edge device needs to provide support for a "Combined ACP and data-plane interface" as also described in section 8.1 of [I-D.ietf-anima-autonomic-control-plane]. This setup may not work with auto configuration and all NOC host network stacks due to limitations in those network stacks. They need to be able to perform RFC6724 source address selection rule 5.5 including caching of next- hop information. For security reasons, it is not considered appropriate to connect a non-GACP router to a GACP connect interface. The reason is that the GACP is a secured network domain and all NOC devices connecting via GACP connect interfaces are also part of that secure domain - the main difference is that the physical link between the GACP edge device and the NOC devices is not authenticated/encrypted and therefore, needs to be physically secured. If the secure GACP was extendable via untrusted routers then it would be a lot more difficult to verify the secure domain assertion. Therefore the GACP edge devices are not supposed to redistribute routes from non-GACP routers into the GACP. 2.1.4. IPv4-only NMS Hosts One architectural expectation for the GACP as described in Section 1.3 is that all devices that want to use the GACP do support IPv6. Including NMS hosts. Note that this expectation does not imply any requirements against the data-plane, especially no need to support IPv6 in it. The data-plane could be IPv4 only, IPv6 only, dual stack or it may not need to have any IP host stack on the network devices. The implication of this architectural decision is the potential need for short-term workarounds when the operational practices in a network do not yet meet these target expectations. This section explains when and why these workarounds may be operationally necessary and describes them. However, the long term goal is to upgrade all NMS hosts to native IPv6, so the workarounds described in this section should not be considered permanent. Most network equipment today supports IPv6 but it is by far not ubiquitously supported in NOC backend solutions (HW/SW), especially not in the product space for enterprises. Even when it is supported, there are often additional limitations or issues using it in a dual stack setup or the operator mandates for simplicity single stack for all operations. For these reasons an IPv4 only management plane is still required and common practice in many enterprises. Without the desire to leverage the GACP, this required and common practice is not Eckert & Behringer Expires July 20, 2018 [Page 9] Internet-Draft AN Stable Connectivity OAM January 2018 a problem for those enterprises even when they run dual stack in the network. We discuss these workarounds here because it is a short term deployment challenge specific to the operations of a GACP. To connect IPv4 only management plane devices/applications with a GACP, some form of IP/ICMP translation of packets IPv4<->IPv6 is necessary. The basic mechanisms for this are defined in SIIT ([RFC7915]). There are multiple solutions using this mechanism. To understand the possible solutions, we consider the requirements: 1. NMS hosts need to be able to initiate connections to any GACP device for management purposes. Examples include provisioning via Netconf/(SSH), SNMP poll operations or just diagnostics via SSH connections from operators. Every GACP device/function that needs to be reachable from NMS hosts needs to have a separate IPv4 address. 2. GACP devices need to be able to initiate connections to NMS hosts for example to initiate NTP or radius/diameter connections, send syslog or SNMP trap or initiate Netconf Call Home connections after bootstrap. Every NMS host needs to have a separate IPv6 address reachable from the GACP. When connections from GACP devices are made to NMS hosts, the IPv4 source address of these connections as seen by the NMS Host must also be unique per GACP device and the same address as in (1) to maintain the same addressing simplicity as in a native IPv4 deployment. For example in syslog, the source-IP address of a logging device is used to identify it, and if the device shows problems, an operator might want to SSH into the device to diagnose it. Because of these requirements, the necessary and sufficient set of solutions are those that provide 1:1 mapping of IPv6 GACP addresses into IPv4 space and 1:1 mapping of IPv4 NMS host space into IPv6 (for use in the GACP). This means that stateless SIIT based solutions are sufficient and preferred. Note that GACP devices may use multiple IPv6 addresses in the GACP. For example, [I-D.ietf-anima-autonomic-control-plane] section 6.10 defines multiple useful addressing sub-schemes supporting this option. All those addresses may then need to be reachable through the IPv6/IPv4 address translation. The need to allocate for every GACP device one or multiple IPv4 addresses should not be a problem if - as we assume - the NMS hosts can use private IPv4 address space ([RFC1918]). Nevertheless even with RFC1918 address space it is important that the GACP IPv6 addresses can efficiently be mapped into IPv4 address space without too much waste. Eckert & Behringer Expires July 20, 2018 [Page 10] Internet-Draft AN Stable Connectivity OAM January 2018 The currently most flexible mapping scheme to achieve this is [RFC7757] because it allows configured IPv4 <-> IPv6 prefix mapping. Assume the GACP uses the ACP "Zone Addressing" Sub-Scheme and there are 3 registrars. In the Zone Addressing Sub-Scheme, there is for each registrar a constant /112 prefix for which in RFC7757 an EAM (Explicit Address Mapping) into a /16 (e.g.: RFC1918) prefix into IPv4 can be configured. Within the registrars /112 prefix, Device- Numbers for devices are sequentially assigned: with V-bit effectively two numbers are assigned per GACP device. This also means that if IPv4 address space is even more constrained, and it is known that a registrar will never need the full /15 extent of Device-Numbers, then a longer than /112 prefix can be configured into the EAM to use less IPv4 space. When using the ACP "Vlong Addressing" Sub-Scheme, it is unlikely that one wants or need to translate the full /8 or /16 bits of addressing space per GACP device into IPv4. In this case, the EAM rules of dropping trailing bits can be used to map only N bits of the V-bits into IPv4. This does imply though that only V-addresses that differ in those high-order N V-bits can be distinguished on the IPv4 side. Likewise, the IPv4 address space used for NMS hosts can easily be mapped into an address prefix assigned to a GACP connect interface. A full specification of a solution to perform SIIT in conjunction with GACP connect following the considerations below is outside the scope of this document. To be in compliance with security expectations, SIIT has to happen on the GACP edge device itself so that GACP security considerations can be taken into account. E.g.: that IPv4 only NMS hosts can be dealt with exactly like IPv6 hosts connected to a GACP connect interface. Note that prior solutions such as NAT64 ([RFC6146]) may equally be useable to translate between GACP IPv6 address space and NMS Hosts IPv4 address space, and that as workarounds this can also be done on non GACP Edge Devices connected to a GACP connect interface. The details vary depending on implementation because the options to configure address mappings vary widely. Outside of EAM, there are no standardized solutions that allow for mapping of prefixes, so it will most likely be necessary to explicitly map every individual (/128) GACP device address to an IPv4 address. Such an approach should use automation/scripting where these address translation entries are created dynamically whenever a GACP device is enrolled or first connected to the GACP network. Overall, the use of NAT is especially subject to the ROI (Return On Investment) considerations, but the methods described here may not be Eckert & Behringer Expires July 20, 2018 [Page 11] Internet-Draft AN Stable Connectivity OAM January 2018 too different from the same problems encountered totally independent of GACP when some parts of the network are to introduce IPv6 but NMS hosts are not (yet) upgradeable. 2.1.5. Path Selection Policies As mentioned above, a GACP is not expected to have high performance because its primary goal is connectivity and security, and for existing network device platforms this often means that it is a lot more effort to implement that additional connectivity with hardware acceleration than without - especially because of the desire to support full encryption across the GACP to achieve the desired security. Some of these issues may go away in the future with further adoption of a GACP and network device designs that better tender to the needs of a separate OAM plane, but it is wise to plan for even long-term designs of the solution that does NOT depend on high-performance of the GACP. This is opposite to the expectation that future NMS hosts will have IPv6, so that any considerations for IPv4/NAT in this solution are temporary. To solve the expected performance limitations of the GACP, we do expect to have the above describe dual-connectivity via both GACP and data-plane between NOC application devices and devices with GACP. The GACP connectivity is expected to always be there (as soon as a device is enrolled), but the data-plane connectivity is only present under normal operations but will not be present during e.g. early stages of device bootstrap, failures, provisioning mistakes or during network configuration changes. The desired policy is therefore as follows: In the absence of further security considerations (see below), traffic between NMS hosts and GACP devices should prefer data-plane connectivity and resort only to using the GACP when necessary, unless it is an operation known to be so much tied to the cases where the GACP is necessary that it makes no sense to try using the data-plane. An example are SSH connections from the NOC into a network device to troubleshoot network connectivity. This could easily always rely on the GACP. Likewise, if an NMS host is known to transmit large amounts of data, and it uses the GACP, then its performance need to be controlled so that it will not overload the GACP performance. Typical examples of this are software downloads. There is a wide range of methods to build up these policies. We describe a few: Eckert & Behringer Expires July 20, 2018 [Page 12] Internet-Draft AN Stable Connectivity OAM January 2018 Ideally, a NOC system would learn and keep track of all addresses of a device (GACP and the various data-plane addresses). Every action of the NOC system would indicate via a "path-policy" what type of connection it needs (e.g. only data-plane, GACP-only, default to data-plane, fallback to GACP,...). A connection policy manager would then build connection to the target using the right address(es). Shorter term, a common practice is to identify different paths to a device via different names (e.g. loopback vs. interface addresses). This approach can be expanded to GACP uses, whether it uses NOC system local names or DNS. We describe example schemes using DNS: DNS can be used to set up names for the same network devices but with different addresses assigned: One name (name.noc.example.com) with only the data-plane address(es) (IPv4 and/or IPv6) to be used for probing connectivity or performing routine software downloads that may stall/fail when there are connectivity issues. One name (name- acp.noc.example.com) with only the GACP reachable address of the device for troubleshooting and probing/discovery that is desired to always only use the GACP. One name with data-plane and GACP addresses (name-both.noc.example.com). Traffic policing and/or shaping at the GACP edge in the NOC can be used to throttle applications such as software download into the GACP. MPTCP (Multipath TCP -see [RFC6824]) with additional future work could be a very attractive candidate to automate the use of both data-plane and GACP and minimize or fully avoid the need for the above mentioned logical names to pre-set the desired connectivity (data-plane-only, GACP only, both). MPTCP supports signaling between the peers with which additional available addresses can be exchanged and then additional subflows between those addresses can be built. MPTCP also allows to declare subflows to be backup so they do not carry traffic unless non-backup subflows fail. Taken together, this could be used to automatically leverage the stable connectivity of the GACP and the potentially higher throughput of the data plane and simplify the DNS setup: (Only) the GACP address of GACP devices could to be put into DNS. When MPTCP builds the first subflow across the GACP, it exchanges the data-plane address. A data-plane subflow is built. Once this is successful, the peers declare the GACP subflow to be a backup flow and only the data-plane subflow is used to carry traffic. Making this idea work requires additional specification work outside the scope of this document. This includes but is not necessarily limited to defining the following aspects: Eckert & Behringer Expires July 20, 2018 [Page 13] Internet-Draft AN Stable Connectivity OAM January 2018 The above described behavior/policy for MPTCP must be controllable by applications or libraries acting on behalf of applications. APIs and/or data models for this control need to be defined. As outlined above, applications for example may choose to only perform transfers if the data-plane is actually available because of performance limitations of the GACP, so the application needs to be made aware if the setup of the data-plane subflow fails. Or transfers may want to only use the GACP because the connection performs configuration changes that are likely known to bring down the data-plane. It should be sufficient to make these policies work on a per connection basis, and not change during the lifetime of a connection for different data items. GACP and data-plane addresses exist in different VRFs. The MPTCP stack needs to be able to access multiple VRFs and know which local address is existing in which VRF. There also needs to be a logic to determine which peer address is expected to be reachable via which local VRF/address. Because the GACP is an isolated network, data- plane addresses are not reachable via the GACP therefore devices can determine if a peer address is a data-plane address. 2.1.6. Autonomic NOC Device/Applications Setting up connectivity between the NOC and autonomic devices when the NOC device itself is non-autonomic is as mentioned in the beginning a security issue. It also results as shown in the previous paragraphs in a range of connectivity considerations, some of which may be quite undesirable or complex to operationalize. Making NMS hosts autonomic and having them participate in the GACP is therefore not only a highly desirable solution to the security issues, but can also provide a likely easier operationalization of the GACP because it minimizes NOC-special edge considerations - the GACP is simply built all the way automatically, even inside the NOC and only authorized and authenticate NOC devices/applications will have access to it. Supporting the ACP according to [I-D.ietf-anima-autonomic-control-plane] all the way into an application device requires implementing the following aspects in it: AN bootstrap/enrollment mechanisms, the secure channel for the ACP and at least the host side of IPv6 routing setup for the ACP. Minimally this could all be implemented as an application and be made available to the host OS via e.g. a tap driver to make the ACP show up as another IPv6 enabled interface. Having said this: If the structure of NMS hosts is transformed through virtualization anyhow, then it may be considered equally Eckert & Behringer Expires July 20, 2018 [Page 14] Internet-Draft AN Stable Connectivity OAM January 2018 secure and appropriate to construct (physical) NMS host system by combining a virtual GACP enabled router with non-GACP enabled NOC- application VMs via a hypervisor, leveraging the configuration options described in the previous sections but just virtualizing them. 2.1.7. Encryption of data-plane connections When combining GACP and data-plane connectivity for availability and performance reasons, this too has an impact on security: When using the GACP, the traffic will be mostly encryption protected, especially when considering the above described use of application devices with GACP. If instead the data-plane is used, then this is not the case anymore unless it is done by the application. The simplest solution for this problem exists when using GACP capable NMS hosts, because in that case the communicating GACP capable NMS host and the GACP network device have credentials they can mutually trust (same GACP domain). In result, data-plane connectivity that does support this can simply leverage TLS/DTLS ([RFC5246])/([RFC6347]) with those GACP credentials for mutual authentication - and does not incur new key management. If this automatic security benefit is seen as most important, but a "full" GACP stack into the NMS host is unfeasible, then it would still be possible to design a stripped down version of GACP functionality for such NOC hosts that only provides enrollment of the NOC host with the GACP cryptographic credentials but without directly participating in the GACP encryption method. Instead, the host would just leverage TLS/DTLS using its GACP credentials via the data-plane with GACP network devices as well as indirectly via the GACP with the above mentioned GACP connect into the GACP. When using the GACP itself, TLS/DTLS for the transport layer between NMS hosts and network device is somewhat of a double price to pay (GACP also encrypts) and could potentially be optimized away, but given the assumed lower performance of the GACP, it seems that this is an unnecessary optimization. 2.1.8. Long Term Direction of the Solution If we consider what potentially could be the most lightweight and autonomic long term solution based on the technologies described above, we see the following direction: 1. NMS hosts should at least support IPv6. IPv4/IPv6 NAT in the network to enable use of a GACP is long term undesirable. Having IPv4 only applications automatically leverage IPv6 connectivity Eckert & Behringer Expires July 20, 2018 [Page 15] Internet-Draft AN Stable Connectivity OAM January 2018 via host-stack translation may be an option but this has not been investigated yet. 2. Build the GACP as a lightweight application for NMS hosts so GACP extends all the way into the actual NMS hosts. 3. Leverage and as necessary enhance MPTCP with automatic dual- connectivity: If an MPTCP unaware application is using GACP connectivity, the policies used should add subflow(s) via the data-plane and prefer them. 4. Consider how to best map NMS host desires to underlying transport mechanisms: With the above mentioned 3 points, not all options are covered. Depending on the OAM, one may still want only GACP, only data-plane, or automatically prefer one over the other and/ or use the GACP with low performance or high-performance (for emergency OAM such as countering DDoS). It is as of today not clear what the simplest set of tools is to enable explicitly the choice of desired behavior of each OAM. The use of the above mentioned DNS and MPTCP mechanisms is a start, but this will require additional thoughts. This is likely a specific case of the more generic scope of TAPS. 2.2. Stable Connectivity for Distributed Network/OAM Today, many distributed protocols implement their own unique security mechanisms. KARP (Keying and Authentication for Routing Protocols, see [RFC6518]) has tried to start to provide common directions and therefore reduce the re-invention of at least some of the security aspects, but it only covers routing-protocols and it is unclear how well it is applicable to a potentially wider range of network distributed agents such as those performing distributed OAM. The common security of a GACP can help in these cases. Furthermore, GRASP ([I-D.ietf-anima-grasp]) can run on top of a GACP as a security and transport substrate and provide common local & remote neighbor discovery and peer negotiation mechanism, further allowing to unifying & reuse future protocol designs. 3. Architectural Considerations 3.1. No IPv4 for GACP The GACP is intended to be IPv6 only, and the prior explanations in this document show that this can lead to some complexity when having to connect IPv4 only NOC solutions, and that it will be impossible to Eckert & Behringer Expires July 20, 2018 [Page 16] Internet-Draft AN Stable Connectivity OAM January 2018 leverage the GACP when the OAM agents on a GACP network device do not support IPv6. Therefore, the question was raised whether the GACP should optionally also support IPv4. The decision not to include IPv4 for GACP as something that is considered in the use cases in this document is because of the following reasons: In SP networks that have started to support IPv6, often the next planned step is to consider moving out IPv4 from a native transport to just a service on the edge. There is no benefit/need for multiple parallel transport families within the network, and standardizing on one reduces OPEX and improves reliability. This evolution in the data-plane makes it highly unlikely that investing development cycles into IPv4 support for GACP will have a longer term benefit or enough critical short-term use-cases. Support for IPv6-only for GACP is purely a strategic choice to focus on the known important long term goals. In other types of networks as well, we think that efforts to support autonomic networking is better spent in ensuring that one address family will be supported so all use cases will long-term work with it, instead of duplicating effort into IPv4. Especially because auto-addressing for the GACP with IPv4 would be more complex than in IPv6 due to the IPv4 addressing space. 4. Security Considerations In this section, we discuss only security considerations not covered in the appropriate sub-sections of the solutions described. Even though GACPs are meant to be isolated, explicit operator misconfiguration to connect to insecure OAM equipment and/or bugs in GACP devices may cause leakage into places where it is not expected. Mergers/Acquisitions and other complex network reconfigurations affecting the NOC are typical examples. GACP addresses are ULA addresses. Using these addresses also for NOC devices as proposed in this document is not only necessary for above explained simple routing functionality but it is also more secure than global IPv6 addresses. ULA addresses are not routed in the global Internet and will therefore be subject to more filtering even in places where specific ULA addresses are being used. Packets are therefore less likely to leak to be successfully injected into the isolated GACP environment. The random nature of a ULA prefix provides strong protection against address collision even though there is no central assignment Eckert & Behringer Expires July 20, 2018 [Page 17] Internet-Draft AN Stable Connectivity OAM January 2018 authority. This is helped by the expectation that GACPs are never expected to connect all together, but only few GACPs may ever need to connect together, e.g. when mergers and acquisitions occur. Note that the GACP constraints demands that only packets from connected subnet prefixes are permitted from GACP connect interfaces, limiting the scope of non-cryptographically secured transport to a subnet within a NOC that instead has to rely on physical security (only connect trusted NOC devices to it). To help diagnose packets that unexpectedly leaked for example from another GACP (that was meant to be deployed separately), it can be useful to voluntarily list your own the ULA GACP prefixes on some site(s) on the Internet and hope that other users of GACPs do the same so that you can look up unknown ULA prefix packets seen in your network. Note that this does not constitute registration. https://www.sixxs.net/tools/grh/ula/ was a site to list ULA prefixes but it is not open for new listings anymore since the mid of 2017. The authors are not aware of other active Internet sites to list ULA use. Note that there is a provision in [RFC4193] for non-locally assigned address space (L bit = 0), but there is no existing standardization for this, so these ULA prefixes must not be used. According to [RFC4193] section 4.4, PTR records for ULA addresses should not be installed into the global DNS (no guaranteed ownership). Hence also the need to rely on voluntary lists (and in prior paragraph) to make the use of an ULA prefix globally known. Nevertheless, some legacy OAM applications running across the GACP may rely on reverse DNS lookup for authentication of requests (e.g.: TFTP for download of network firmware/config/software). Operators may therefore need to use a private DNS setup for the GACP ULA addresses. This is the same setup that would be necessary for using RFC1918 addresses in DNS. See for example [RFC1918] section 5, last paragraph. In [RFC6950] section 4, these setups are discussed in more detail. Any current and future protocols must rely on secure end-to-end communications (TLS/DTLS) and identification and authentication via the certificates assigned to both ends. This is enabled by the cryptographic credentials mechanisms of the GACP. If DNS and especially reverse DNS are set up, then it should be set up in an automated fashion when the GACP address for devices are assigned. In the case of the ACP, DNS resource record creation can be linked to the autonomic registrar backend so that the DNS and Eckert & Behringer Expires July 20, 2018 [Page 18] Internet-Draft AN Stable Connectivity OAM January 2018 reverse DNS records are actually derived from the subject name elements of the ACP device certificates in the same way as the autonomic devices themselves will derive their ULA addresses from their certificates to ensure correct and consistent DNS entries. If an operator feels that reverse DNS records are beneficial to its own operations but that they should not be made available publically for "security" by concealment reasons, then the case of GACP DNS entries is probably one of the least problematic use cases for split- DNS: The GACP DNS names are only needed for the NMS hosts intending to use the GACP - but not network wide across the enterprise. 5. IANA Considerations This document requests no action by IANA. 6. Acknowledgements This work originated from an Autonomic Networking project at cisco Systems, which started in early 2010 including customers involved in the design and early testing. Many people contributed to the aspects described in this document, including in alphabetical order: BL Balaji, Steinthor Bjarnason, Yves Herthoghs, Sebastian Meissner, Ravi Kumar Vadapalli. The author would also like to thank Michael Richardson, James Woodyatt and Brian Carpenter for their review and comments. Special thanks to Sheng Jiang and Mohamed Boucadair for their thorough review. 7. Change log [RFC Editor: Please remove] 08: IESG review fixes. * Spell check. * https://raw.githubusercontent.com/anima-wg/autonomic-control- plane/01908364cfc7259009603bf2b261354b0cc26913/draft-ietf- anima-stable-connectivity/draft-ietf-anima-stable-connectivity- 08.txt * Eric Rescorla (comments):Typos, listing ULA on internet benefits. Other comments from Eric where addressed via commits for other reviewers already. * https://raw.githubusercontent.com/anima-wg/autonomic-control- plane/c02252710fbd7aea15aff550fb393eb36584658b/draft-ietf- anima-stable-connectivity/draft-ietf-anima-stable-connectivity- 08.txt Eckert & Behringer Expires July 20, 2018 [Page 19] Internet-Draft AN Stable Connectivity OAM January 2018 * Mirja Kuehlewind (discuss) / Yoshifumi Nishida: Fixed and Rewrote MPTCP text to be more explanatory, answering questions raised in disucss. * https://raw.githubusercontent.com/anima-wg/autonomic-control- plane/14d5f9b66b8318bc160cee74ad152c0b926b4042/draft-ietf- anima-stable-connectivity/draft-ietf-anima-stable-connectivity- 08.txt * Matthew Miller/Alissa Cooper: syntactic nits. * https://raw.githubusercontent.com/anima-wg/autonomic-control- plane/9bff109281e8b3c22522c3144cbf0f13e5000498/draft-ietf- anima-stable-connectivity/draft-ietf-anima-stable-connectivity- 08.txt * Suresh Krishnan (comment): rewrote first paragraph of 2.1.4 (incomprehensible). * https://raw.githubusercontent.com/anima-wg/autonomic-control- plane/f2d8a85c2cc65ca7f823abac0f57d19c744f9b65/draft-ietf- anima-stable-connectivity/draft-ietf-anima-stable-connectivity- 08.txt * Alvaro Retana: Made references normative where the authors think is is important to understand all or parts for the mechanisms described in this document. * Alvaro Retana: Clarified that the discussions in this document are not specific to the ANI ACP, but instead rely primarily on a set of design constraints for any type of autonomic inband management network. Called this the GACP (generalized ACP). Mayor add: explanation of those design constraints in section 1.3. Textual fixes ACP -> GACP throughout the document, but without semantic changes. * https://raw.githubusercontent.com/anima-wg/autonomic-control- plane/d26df831da2953779c3b3ac41ec118cbbe43373e/draft-ietf- anima-stable-connectivity/draft-ietf-anima-stable-connectivity- 08.txt * Co-author organization fix. 07: Fixed ID nits. 06: changed "split-horizon" term to "private-DNS" and reworded the paragraph about it. Eckert & Behringer Expires July 20, 2018 [Page 20] Internet-Draft AN Stable Connectivity OAM January 2018 05: Integrated fixes from Brian Carpenters review. See github draft-ietf-anima-stable-connectivity/04-brian-carpenter-review- reply.txt. Details on semantic/structural changes: * Folded most comments back into draft-ietf-anima-autonomic- control-plane-09 because this stable connectivity draft was suggesting things that are better written out and standardized in the ACP document. * Section on simultaneous ACP and data-plane connectivity section reduced/rewritten because of this. * Re-emphasized security model of ACP - ACP-connect can not arbitrarily extend ACP routing domain. * Re-wrote much of NMS section to be less suggestive and more descriptive, avoiding the term NAT and referring to relevant RFCs (SIIT etc.). * Main additional text in IPv4 section is about explaining how we suggest to use EAM (Explicit Address Mapping) which actuall would well work with the Zone and Vlong Addressing Sub-Schemes of ACP. * Moved, but not changed section of "why no IPv4 in ACP" before IANA considerations to make structure of document more logical. * Refined security considerations: explained how optional ULA prefix listing on Internet is only for diagnostic purposes. Explained how this is useful because DNS must not be used. Explained how split horizon DNS can be used nevertheless. 04: Integrated fixes from Mohamed Boucadairs review (thorough textual review). 03: Integrated fixes from thorough Shepherd review (Sheng Jiang). 01: Refresh timeout. Stable document, change in author association. 01: Refresh timeout. Stable document, no changes. 00: Changed title/dates. individual-02: Updated references. Eckert & Behringer Expires July 20, 2018 [Page 21] Internet-Draft AN Stable Connectivity OAM January 2018 individual-03: Modified ULA text to not suggest ULA-C as much better anymore, but still mention it. individual-02: Added explanation why no IPv4 for ACP. individual-01: Added security section discussing the role of address prefix selection and DNS for ACP. Title change to emphasize focus on OAM. Expanded abstract. individual-00: Initial version. 8. References 8.1. Normative References [RFC1918] Rekhter, Y., Moskowitz, B., Karrenberg, D., de Groot, G., and E. Lear, "Address Allocation for Private Internets", BCP 5, RFC 1918, DOI 10.17487/RFC1918, February 1996, . [RFC4191] Draves, R. and D. Thaler, "Default Router Preferences and More-Specific Routes", RFC 4191, DOI 10.17487/RFC4191, November 2005, . [RFC4193] Hinden, R. and B. Haberman, "Unique Local IPv6 Unicast Addresses", RFC 4193, DOI 10.17487/RFC4193, October 2005, . [RFC6724] Thaler, D., Ed., Draves, R., Matsumoto, A., and T. Chown, "Default Address Selection for Internet Protocol Version 6 (IPv6)", RFC 6724, DOI 10.17487/RFC6724, September 2012, . [RFC6824] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure, "TCP Extensions for Multipath Operation with Multiple Addresses", RFC 6824, DOI 10.17487/RFC6824, January 2013, . [RFC7575] Behringer, M., Pritikin, M., Bjarnason, S., Clemm, A., Carpenter, B., Jiang, S., and L. Ciavaglia, "Autonomic Networking: Definitions and Design Goals", RFC 7575, DOI 10.17487/RFC7575, June 2015, . [RFC7757] Anderson, T. and A. Leiva Popper, "Explicit Address Mappings for Stateless IP/ICMP Translation", RFC 7757, DOI 10.17487/RFC7757, February 2016, . Eckert & Behringer Expires July 20, 2018 [Page 22] Internet-Draft AN Stable Connectivity OAM January 2018 [RFC7915] Bao, C., Li, X., Baker, F., Anderson, T., and F. Gont, "IP/ICMP Translation Algorithm", RFC 7915, DOI 10.17487/RFC7915, June 2016, . 8.2. Informative References [I-D.ietf-anima-autonomic-control-plane] Eckert, T., Behringer, M., and S. Bjarnason, "An Autonomic Control Plane (ACP)", draft-ietf-anima-autonomic-control- plane-13 (work in progress), December 2017. [I-D.ietf-anima-bootstrapping-keyinfra] Pritikin, M., Richardson, M., Behringer, M., Bjarnason, S., and K. Watsen, "Bootstrapping Remote Secure Key Infrastructures (BRSKI)", draft-ietf-anima-bootstrapping- keyinfra-09 (work in progress), October 2017. [I-D.ietf-anima-grasp] Bormann, C., Carpenter, B., and B. Liu, "A Generic Autonomic Signaling Protocol (GRASP)", draft-ietf-anima- grasp-15 (work in progress), July 2017. [I-D.ietf-anima-reference-model] Behringer, M., Carpenter, B., Eckert, T., Ciavaglia, L., Pierre, P., Liu, B., Nobre, J., and J. Strassner, "A Reference Model for Autonomic Networking", draft-ietf- anima-reference-model-05 (work in progress), October 2017. [IEEE802.1Q] International Telecommunication Union, "802.1Q-2014 - IEEE Standard for Local and metropolitan area networks - Bridges and Bridged Networks", 2014. [ITUT] International Telecommunication Union, "Architecture and specification of data communication network", ITU-T Recommendation G.7712/Y.1703, Noevember 2001. This is the earliest but superceeded version of the series. See "https://www.itu.int/rec/T-REC-G.7712/en" for current versions. [RFC1034] Mockapetris, P., "Domain names - concepts and facilities", STD 13, RFC 1034, DOI 10.17487/RFC1034, November 1987, . Eckert & Behringer Expires July 20, 2018 [Page 23] Internet-Draft AN Stable Connectivity OAM January 2018 [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security (TLS) Protocol Version 1.2", RFC 5246, DOI 10.17487/RFC5246, August 2008, . [RFC6146] Bagnulo, M., Matthews, P., and I. van Beijnum, "Stateful NAT64: Network Address and Protocol Translation from IPv6 Clients to IPv4 Servers", RFC 6146, DOI 10.17487/RFC6146, April 2011, . [RFC6291] Andersson, L., van Helvoort, H., Bonica, R., Romascanu, D., and S. Mansfield, "Guidelines for the Use of the "OAM" Acronym in the IETF", BCP 161, RFC 6291, DOI 10.17487/RFC6291, June 2011, . [RFC6347] Rescorla, E. and N. Modadugu, "Datagram Transport Layer Security Version 1.2", RFC 6347, DOI 10.17487/RFC6347, January 2012, . [RFC6434] Jankiewicz, E., Loughney, J., and T. Narten, "IPv6 Node Requirements", RFC 6434, DOI 10.17487/RFC6434, December 2011, . [RFC6518] Lebovitz, G. and M. Bhatia, "Keying and Authentication for Routing Protocols (KARP) Design Guidelines", RFC 6518, DOI 10.17487/RFC6518, February 2012, . [RFC6950] Peterson, J., Kolkman, O., Tschofenig, H., and B. Aboba, "Architectural Considerations on Application Features in the DNS", RFC 6950, DOI 10.17487/RFC6950, October 2013, . Authors' Addresses Toerless Eckert (editor) Futurewei Technologies Inc. 2330 Central Expy Santa Clara 95050 USA Email: tte+ietf@cs.fau.de Michael H. Behringer Email: michael.h.behringer@gmail.com Eckert & Behringer Expires July 20, 2018 [Page 24]