RAW F. Theoleyre Internet-Draft CNRS Intended status: Standards Track G. Papadopoulos Expires: January 6, 2020 IMT Atlantique July 5, 2019 Operations, Administration and Maintenance (OAM) features for RAW draft-theoleyre-raw-oam-support-00 Abstract The wireless medium presents significant specific challenges to achieve properties similar to those of wired deterministic networks. At the same time, a number of use cases cannot be solved with wires and justify the extra effort of going wireless. This document presents some of these use-cases. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on January 6, 2020. Copyright Notice Copyright (c) 2019 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of Theoleyre & Papadopoulos Expires January 6, 2020 [Page 1] Internet-Draft OAM features for RAW July 2019 the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 2. OAM to provision appropriately the resources . . . . . . . . 3 3. Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 3 4. Administration . . . . . . . . . . . . . . . . . . . . . . . 4 4.1. Worst-case constraint . . . . . . . . . . . . . . . . . . 5 4.2. Energy efficiency constraint . . . . . . . . . . . . . . 5 5. Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . 5 6. Informative References . . . . . . . . . . . . . . . . . . . 6 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 6 1. Introduction RAW (Reliable and Available Wireless) is an effort to provide deterministic behavior over a network that includes a wireless physical layer. Enabling the wireless communication reliable and available is even more challenging than it is with wires, due to the numerous causes of loss in transmission that add up to the congestion losses and the delays caused by overbooked shared resources. To provide quality of service along a multihop path that is composed of wired and wireless hops, additional methods needs to be considered to leverage the potential lossy wireless communication. Traceability belongs to Operations, Administration, and Maintenance (OAM) which is the toolset for fault detection and isolation, and for performance measurement. More can be found on OAM Tools in . The main purpose of this document is to details the requirements of the OAM features recommended to construct a predictable communication infrastructure on top of a collection of wireless networks. In particular, we expect to provide packet loss evaluation, self-testing and automated adaptation to enable trade-offs between resilience and energy consumption. This document describes the benefits, problems, and trade-offs for using OAM in wireless networks to provide availability and predictability. In this document, the term OAM will be used according to its definition specified in [RFC6291]. We expect to implement an OAM framework in RAW networks to maintain a real-time view of the network infrastructure, and its ability to respect the Service Level Agreements (delay, reliability) assigned to each data flow. Theoleyre & Papadopoulos Expires January 6, 2020 [Page 2] Internet-Draft OAM features for RAW July 2019 1.1. Terminology o OAM entity: a data flow to be controled; o OAM end-devices: the source or destination of a data flow; o defect: a temporary change in the network characteristics (e.g. link quality degradation because of temporary external interference, a mobile obstacle) o fault: a definite change which may affect the network performance, e.g. a node runs out of energy, 2. OAM to provision appropriately the resources RAW networks expect to make the communications predictable on top of a wireless network infrastructure. Most critical applications will define a Service Level Agreeemnt to respect for the data flows it generates. Thus, the wireless networks have to be dimensionned to respect these SLAs. To respect strict guarantees, RAW relies on a PCE which has to schedule the transmissions in the different wireless networks. Thus, resources have to be provisionned to handle any defect. OAM represents the core of the overprovisonning process, and maintains the network operational by updating the schedule dynmically. Fault-tolerance also assumes that multiple path have to be provisionned so that an end-to-end circuit keeps on existing whatever the conditions. OAM is in charge of controling the replication/ process To be energy-efficient, reserving some dedicated out-of-band resources for OAM seems ireealistic, and only in-band solutions are considered here. 3. Operation RAW expects to operate fault-tolerant networks. Thus, we need mechanisms able to detect faults, before they impact the network performance. We make a distinction between the two following complementary mechanisms: o Detection: the network detects that a fault occured, i.e. the network has deviated from its expected behavior. While the network must report an alarm, the cause may not be identified Theoleyre & Papadopoulos Expires January 6, 2020 [Page 3] Internet-Draft OAM features for RAW July 2019 precisely. For instance, the end-to-end reliability has decreased significantly, or a buffer overflow occurs; o Identification: the network has isolated and identified the cause of the fault. For instance, the quality of a specific link has decreased, requiring more retransmissions, or the level of external interference has locally increased. These two-steps identification is required since RAW expects to rely on wireless networks. Thus, we have to minimize the amount of statistics / measurements to exchange: o energy efficiency: low-power devices have to limit the volume of monitoring information since every bit consumes energy. o bandwidth: wireless networks exhibit a bandwidth significantly lower than wired, best-effort networks. Thus, localized and centralized mechanisms have to be combined together, and additionnal control packets have to be triggered only after a fault detection. 4. Administration To take proper decisions, the network has to expose a collection of metrics, including: o Packet losses: the time-window average and maximum values of the number of packet losses has to be measured. Many critical applications stop to work if a few consecutive packets are dropped; o Received Signal Strength Indicator (RSSI) is a very common metric in wireless to denote the link quality. The radio chipset is in charge of translating a received signal strngth into a normalized quality indicator; o Delay: the time elapsed between a packet generation / enqueuing and its reception by the next hop; o Buffer occupancy: the number of packets present in the buffer, for each the existing flows. These metrics should be collected: o per virtual circuit to measure the end-to-end performance for a given flow. Each of the paths has to be isolated in multipath strategies; Theoleyre & Papadopoulos Expires January 6, 2020 [Page 4] Internet-Draft OAM features for RAW July 2019 o per radio channel to measure e.g. the level of external interfence, and to be able to apply counter-measures (e.g. blacklisting) o per device to detect misbehaving node, when it relays the packets of several flows. 4.1. Worst-case constraint RAW aims to enable real-time communications on top of an heterogeneous architecture. Since wireless networks are known to be lossy, RAW has to implement strategies to improve the reliability on top of unreliable links. Hybrid Automatic Repeat reQuest (ARQ) has typically to enable retransmissions based on the end-to-end reliability and latency requirements. To take correct decisions, the controller needs to know the distribution of packet losses for each flow, and for each hop of the paths. In other words, average end-to-end statistics are not enough. They must allow the controller to predict the worst-case. 4.2. Energy efficiency constraint RAW targets also low-power wireless networks, where energy represents a key constraint. Thus, we have to cake care of the energy and bandwidth consumption. The following techniques aim to reduce the cost of such maintenance: piggybacking: some control information has inserted in the data packets if they don't fragment the packet (i.e. the MTU is not exceeded). Information Elements represent a standardized way to handle such information; flags/fields: we have to set-up flags in the packets to monitor to be able to monitor them accurately. A sequence number field may help to detect packet losses. Similarly, path inference tools such as [ipath] insert additionnal information in the headers to identify the path followed by a packet a posteriori. 5. Maintenance RAW needs to implement a self-healing and self-optimization approach. The network must continuously retrieve the state of the network, to judge about the relevance of a reconfiguration, quantifying: the cost of the sub-optimality: resources may not be used optimally (e.g. a better path exists); Theoleyre & Papadopoulos Expires January 6, 2020 [Page 5] Internet-Draft OAM features for RAW July 2019 the reconfiguration cost: the controller needs to trigger some reconfigurations. For this transient period, resources may be twice reserved, control packets have to be transmitted. Thus, reconfiguration may only be triggered if the gain is significant. Since RAW expects to support real-time flows, we have to soft- reconfiguration, where the novel ressources are reserved before the ancient ones are released. Some mechanisms have to be proposed so that packets are forwarded through the novel track only when the resources are ready to be used, while maintaining the global state consistent (no packet re-ordering, duplication, etc.) In particular, RAW has to support the following modifications: patching a schedule, relocating some radion resources (radio channel, timeslots); a device can be reset (e.g. firmware upgrade) safely, all the flows being forwarded temporarly through alternative paths; a better path (delay, reliability, energy consumption) has been identified. 6. Informative References [ipath] Gao, Y., Dong, W., Chen, C., Bu, J., Wu, W., and X. Liu, "iPath: path inference in wireless sensor networks.", 2016, . [RFC6291] Andersson, L., van Helvoort, H., Bonica, R., Romascanu, D., and S. Mansfield, "Guidelines for the Use of the "OAM" Acronym in the IETF", BCP 161, RFC 6291, DOI 10.17487/RFC6291, June 2011, . Authors' Addresses Theoleyre & Papadopoulos Expires January 6, 2020 [Page 6] Internet-Draft OAM features for RAW July 2019 Fabrice Theoleyre CNRS Building B 300 boulevard Sebastien Brant - CS 10413 Illkirch - Strasbourg 67400 FRANCE Phone: +33 368 85 45 33 Email: theoleyre@unistra.fr URI: http://www.theoleyre.eu Georgios Z. Papadopoulos IMT Atlantique Office B00 - 102A 2 Rue de la Chataigneraie Cesson-Sevigne - Rennes 35510 FRANCE Phone: +33 299 12 70 04 Email: georgios.papadopoulos@imt-atlantique.fr Theoleyre & Papadopoulos Expires January 6, 2020 [Page 7]