Network Working Group K. Larsson Internet-Draft Deutsche Telekom Intended status: Standards Track 18 October 2023 Expires: 20 April 2024 Mapping YANG Data to Label-Set Time Series draft-kll-yang-label-tsdb-00 Abstract This document proposes a standardized approach for representing YANG- modeled configuration and state data, for storage in Time Series Databases (TSDBs) that identify time series using a label-set. It outlines procedures for translating YANG data representations to fit within the label-centric structures of TSDBs and vice versa. This mapping ensures clear and efficient storage and querying of YANG- modeled data in TSDBs. Discussion Venues This note is to be removed before publishing as an RFC. Source for this draft and an issue tracker can be found at https://github.com/plajjan/draft-kll-yang-label-tsdb. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 20 April 2024. Copyright Notice Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved. Larsson Expires 20 April 2024 [Page 1] Internet-Draft yang-label-tsdb October 2023 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Specification of the Mapping Procedure . . . . . . . . . . . 3 2.1. Example: Packet Counters in IETF Interfaces Model . . . . 3 2.2. Mapping values . . . . . . . . . . . . . . . . . . . . . 4 2.3. Choice . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.4. Host / device name . . . . . . . . . . . . . . . . . . . 4 3. Querying YANG modeled time series data . . . . . . . . . . . 5 3.1. 1. *Basic Queries* . . . . . . . . . . . . . . . . . . . 5 3.2. 2. *Filtering by Labels* . . . . . . . . . . . . . . . . 5 3.3. 3. *Time-based Queries* . . . . . . . . . . . . . . . . . 6 3.4. 4. *Aggregations* . . . . . . . . . . . . . . . . . . . . 6 3.5. 5. *Combining Filters* . . . . . . . . . . . . . . . . . 6 3.6. 6. *Querying Enumeration Types* . . . . . . . . . . . . . 6 4. Requirements on time series databases . . . . . . . . . . . . 7 4.1. Support for String Values . . . . . . . . . . . . . . . . 7 4.2. Sufficient Path Length . . . . . . . . . . . . . . . . . 7 4.3. High Cardinality . . . . . . . . . . . . . . . . . . . . 8 5. Normative References . . . . . . . . . . . . . . . . . . . . 8 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 8 1. Introduction The aim of this document is to define rules for representing configuration and state data defined using the YANG data modeling language [RFC7950] as time series using a label-centric model. The majority of modern Time Series Databases (TSDBs) employ a label- centric model. In this structure, time series are identified by a set of labels, each consisting of a key-value pair. These labels facilitate efficient querying, aggregation, and filtering of data over time intervals. Such a model contrasts with the hierarchical nature of YANG-modeled data. The challenge, therefore, lies in ensuring that YANG-defined data, with its inherent structure and depth, can be seamlessly integrated into the flat, label-based structure of most contemporary TSDBs. Larsson Expires 20 April 2024 [Page 2] Internet-Draft yang-label-tsdb October 2023 This document seeks to bridge this structural gap, laying out rules and guidelines to ensure that YANG-modeled configuration and state data can be effectively stored, queried, and analyzed within label- centric TSDBs. 2. Specification of the Mapping Procedure Instances of YANG data nodes are mapped to metrics. Only nodes that carry a value are mapped. This includes leafs and presence containers. The hierarchical path to a value, including non-presence containers and lists, form the path that is used as the name of the metric. The path is formed by joining YANG data nodes using _. Special symbols, e.g. -, in node names are replaced with _. List keys are mapped into labels. The path to the list key is transformed in the same way as the primary name of the metric. Compound keys have each key part as a separate label. 2.1. Example: Packet Counters in IETF Interfaces Model Consider the in-unicast-pkts leaf from the IETF interfaces model that captures the number of incoming unicast packets on an interface: Original YANG Instance-Identifier: yang /interfaces/interface[name='eth0']/statistics/in-unicast-pkts Following the mapping rules defined: 1. The path components, including containers and list names, are transformed into the metric name by joining the node names with _. Special symbols, e.g. - are replaced with _. Resulting Metric Name: interfaces_interface_statistics_in_unicast_pkts 1. The list key "predicate", which in this case is the interface name (eth0), is extracted and stored as a separate label. The label key represents the complete path to the key. Resulting Label: interfaces_interface_name = eth0 1. The leaf value, which represents the actual packet counter, remains unchanged and is directly mapped to the value in the time series database. For instance, if the packet counter reads 5,432,100 packets: Value: 5432100 Larsson Expires 20 April 2024 [Page 3] Internet-Draft yang-label-tsdb October 2023 1. As part of the standard labels, a server identification string is also included. A typical choice of identifier might be the hostname. For this example, let's assume the device name is router-01: Label: host = router-01 Final Mapping in the TSDB: * Metric: interfaces_interface_statistics_in_unicast_pkts * Value: 5432100 * Labels: - host = router-01 - interfaces_interface_name = eth0 2.2. Mapping values Leaf values are mapped based on their intrinsic type: * All integer types are mapped to integers and retain their native representation - some implementations only support floats for numeric values * decimal64 values are mapped to floats and the value should be rounded and truncated as to minimize the loss of information * Enumeration types are mapped using their string representation. * String types remain unchanged. 2.3. Choice Choice constructs from YANG are disregarded and not enforced during the mapping process. Given the temporal nature of TSDBs, where data spans across time, different choice branches could be active in a single data set, rendering validation and storage restrictions impractical. 2.4. Host / device name There is an implicit host label identifying the server, typically set to the name of the host originating the time series data. Larsson Expires 20 April 2024 [Page 4] Internet-Draft yang-label-tsdb October 2023 Instance data retrieved from YANG-based servers do not generally identify the server it originates from. As a time series database is likely going to contain data from multiple servers, the host label is used to identify the source of the data. 3. Querying YANG modeled time series data The process of storing YANG-modeled data in label-centric TSDBs, as defined in the previous sections, inherently structures the data in a way that leverages the querying capabilities of modern TSDBs. This chapter provides guidelines on how to construct queries to retrieve this data effectively. 3.1. 1. *Basic Queries* To retrieve all data points related to incoming unicast packets from the IETF interfaces model: * *InfluxQL*: sql SELECT * FROM interfaces_interface_statistics_in_unicast_pkts * *PromQL*: promql interfaces_interface_statistics_in_unicast_pkts 3.2. 2. *Filtering by Labels* To retrieve incoming unicast packets specifically for the interface eth0: * *InfluxQL*: sql SELECT * FROM interfaces_interface_statistics_in_unicast_pkts WHERE interfaces_interface_name = 'eth0' * *PromQL*: promql interfaces_interface_statistics_in_unicast_pkts{i nterfaces_interface_name="eth0"} Similarly, to filter by device / host name: * *InfluxQL*: sql SELECT * FROM interfaces_interface_statistics_in_unicast_pkts WHERE host = 'router-01' * *PromQL*: promql interfaces_interface_statistics_in_unicast_pkts{host="router-01"} Larsson Expires 20 April 2024 [Page 5] Internet-Draft yang-label-tsdb October 2023 3.3. 3. *Time-based Queries* * *InfluxQL*: sql SELECT * FROM interfaces_interface_statistics_in_unicast_pkts WHERE time > now() - 24h Prometheus fetches data based on the configured scrape interval and retention policies, so time-based filters in PromQL often center around the range vectors. For data over the last 24 hours: * *PromQL*: promql interfaces_interface_statistics_in_unicast_pkts[24h] 3.4. 4. *Aggregations* To get the average number of incoming unicast packets over the last hour: * *InfluxQL*: sql SELECT MEAN(value) FROM interfaces_interface_statistics_in_unicast_pkts WHERE time > now() - 1h GROUP BY time(10m) * *PromQL*: promql avg_over_time(interfaces_interface_statistics_in_unicast_pkts[1h]) 3.5. 5. *Combining Filters* To retrieve the sum of incoming unicast packets for eth0 on router-01 over the last day: * *InfluxQL*: sql SELECT SUM(value) FROM interfaces_interface_statistics_in_unicast_pkts WHERE interfaces_interface_name = 'eth0' AND host = 'router-01' AND time > now() - 24h * *PromQL*: promql sum(interfaces_interface_statistics_in_unicast_pk ts{interfaces_interface_name="eth0", host="router-01"})[24h] 3.6. 6. *Querying Enumeration Types* In YANG models, enumerations are defined types with a set of named values. The oper-status leaf in the IETF interfaces model is an example of such an enumeration, representing the operational status of an interface. For instance, the oper-status might have values such as up, down, or testing. Larsson Expires 20 April 2024 [Page 6] Internet-Draft yang-label-tsdb October 2023 To query interfaces that have an oper-status of up: * *InfluxQL*: sql SELECT * FROM interfaces_interface_oper_status WHERE value = 'up' * *PromQL*: promql interfaces_interface_oper_status{value="up"} Similarly, to filter interfaces with oper-status of down: * *InfluxQL*: sql SELECT * FROM interfaces_interface_oper_status WHERE value = 'down' * *PromQL*: promql interfaces_interface_oper_status{value="down"} This approach allows us to effectively query interfaces based on their operational status, leveraging the enumeration mapping within the TSDB. 4. Requirements on time series databases This document specifies a mapping to a conceptual representation, not a particular concrete interface. To effectively support the mapping of YANG-modeled data into a label-centric model, certain requirements must be met by the Time Series Databases (TSDBs). These requirements ensure that the data is stored and retrieved in a consistent and efficient manner. 4.1. Support for String Values Several YANG leaf types carry string values, including the string type itself and all its descendants as well as enumerations which are saved using their string representation. The chosen TSDB must support the storage and querying of string values. Not all TSDBs inherently offer this capability, and thus, it's imperative to ensure compatibility. 4.2. Sufficient Path Length YANG data nodes, especially when representing deep hierarchical structures, can result in long paths. When transformed into metric names or labels within the TSDB, these paths might exceed typical character limits imposed by some databases. It's essential for the TSDB to accommodate these potentially long names to ensure data fidelity and avoid truncation or loss of information. Larsson Expires 20 April 2024 [Page 7] Internet-Draft yang-label-tsdb October 2023 4.3. High Cardinality Given the possibility of numerous unique label combinations (especially with dynamic values like interface names, device names, etc.), the chosen TSDB should handle high cardinality efficiently. High cardinality can impact database performance and query times, so it's essential for the TSDB to have mechanisms to manage this efficiently. 5. Normative References [RFC7950] Bjorklund, M., Ed., "The YANG 1.1 Data Modeling Language", RFC 7950, DOI 10.17487/RFC7950, August 2016, . Author's Address Kristian Larsson Deutsche Telekom Email: kristian@spritelink.net Larsson Expires 20 April 2024 [Page 8]