core A. Bierman Internet-Draft YumaWorks Intended status: Standards Track P. van der Stok Expires: August 13, 2016 consultant February 10, 2016 YANG Hash draft-bierman-core-yang-hash-00 Abstract This document describes an encoding of YANG names to 30 bit hashes. This document extends the CoMI draft to reduce payload and URI of CoMI network requests. The technique can be applied to other YANG based applications to reduce the payload by replacing the YANG names with 30 bit numbers. Note Discussion and suggestions for improvement are requested, and should be sent to core@ietf.org. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on August 13, 2016. Copyright Notice Copyright (c) 2016 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of Bierman & van der Stok Expires August 13, 2016 [Page 1] Internet-Draft Yhash February 2016 publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.1. Tree Diagrams . . . . . . . . . . . . . . . . . . . . 4 2. YANG Hash Generation . . . . . . . . . . . . . . . . . . . . 4 3. Re-Hash Error Procedure . . . . . . . . . . . . . . . . . . . 5 4. Re-Hashing Names Procedure . . . . . . . . . . . . . . . . . 6 5. ietf-yang-hash YANG Module . . . . . . . . . . . . . . . . . 7 6. YANG Re-Hash Examples . . . . . . . . . . . . . . . . . . . . 9 6.1. Multiple Modules . . . . . . . . . . . . . . . . . . . . 11 6.2. Same Module . . . . . . . . . . . . . . . . . . . . . . . 11 7. Retrieval of Rehashed Data . . . . . . . . . . . . . . . . . 12 8. YANG Hash representations . . . . . . . . . . . . . . . . . . 13 8.1. YANG Hash in payload . . . . . . . . . . . . . . . . . . 13 8.2. YANG Hash in URL . . . . . . . . . . . . . . . . . . . . 13 9. YANG Hash Examples . . . . . . . . . . . . . . . . . . . . . 14 10. Security Considerations . . . . . . . . . . . . . . . . . . . 16 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 16 13. Changelog . . . . . . . . . . . . . . . . . . . . . . . . . . 17 14. References . . . . . . . . . . . . . . . . . . . . . . . . . 17 14.1. Normative References . . . . . . . . . . . . . . . . . . 17 14.2. Informative References . . . . . . . . . . . . . . . . . 18 Appendix A. Hash clash probability . . . . . . . . . . . . . . . 18 Appendix B. Hash clash storage overhead . . . . . . . . . . . . 21 B.1. Server tables . . . . . . . . . . . . . . . . . . . . . . 21 B.2. Client tables . . . . . . . . . . . . . . . . . . . . . . 22 B.3. Table summary . . . . . . . . . . . . . . . . . . . . . . 22 Appendix C. payload reduction . . . . . . . . . . . . . . . . . 23 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 27 1. Introduction The Constrained Application Protocol (CoAP) [RFC7252] is designed for Machine to Machine (M2M) applications such as smart energy and building control. Constrained devices need to be managed in an automatic fashion to handle the large quantities of devices that are expected in future installations. The messages between devices need to be as small and infrequent as possible. The implementation complexity and runtime resources need to be as small as possible. Bierman & van der Stok Expires August 13, 2016 [Page 2] Internet-Draft Yhash February 2016 The drafts [I-D.ietf-netconf-restconf] and [I-D.vanderstok-core-comi] describe REST-like interfaces to access structured data defined in YANG [RFC6020]. The payload format CBOR [RFC7049] can be used to reduce the size of the transported payload. In that case the size of the payload depends for a large part on the YANG names. Reducing the names significantly reduces the payload size further (see Appendix C). This draft proposes a hashing technique to encode the YANG names into 30 bit numbers. 1.1. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. Readers of this specification should be familiar with all the terms and concepts discussed in [RFC2578]. The following terms are defined in the NETCONF protocol [RFC6241]: client, configuration data, data-store, and server. The following terms are defined in the YANG data modelling language [RFC6020]: container, data node, key, key leaf, leaf, leaf-list, and list. The following terms are defined in RESTCONF protocol [I-D.ietf-netconf-restconf]: data resource, data-store resource, edit operation, query parameter, target resource, and unified data-store. The following terms are defined in this document: YANG hash: CoMI object identifier, which is a 30-bit numeric hash of the YANG object identifier string for the object. Rehash bit: Bit 31. If a particular YANG hash value is a re-hash for an identifier, then the rehash bit will be set in the object identifier. This allows the server to return descendant nodes that have been rehashed, instead of returning an error for an entire GET request. Data-node instance: An instance of a data-node specified in a YANG module present in the server. The instance is stored in the memory of the server. Notification-node instance: An instance of a schema node of type notification, specified in a YANG module present in the server. Bierman & van der Stok Expires August 13, 2016 [Page 3] Internet-Draft Yhash February 2016 The instance is generated in the server at the occurrence of the corresponding event and appended to a stream. 1.1.1. Tree Diagrams A simplified graphical representation of the data model is used in the YANG modules specified in this document. The meaning of the symbols in these diagrams is as follows: Brackets "[" and "]" enclose list keys. Abbreviations before data node names: "rw" means configuration data (read-write) and "ro" state data (read-only). Symbols after data node names: "?" means an optional node, "!" means a presence container, and "*" denotes a list and leaf-list. Parentheses enclose choice and case nodes, and case nodes are also marked with a colon (":"). Ellipsis ("...") stands for contents of subtrees that are not shown. 2. YANG Hash Generation The association between string value and string number is done through a hash algorithm. The 30 least significant bits of the "murmur3" 32-bit hash algorithm are used. This hash algorithm is described online at [murmur3]. Implementations are available online [murmur-imp]. When converting 4 input bytes to a 32-bit integer in the hash algorithm, the Little-Endian convention MUST be used. The "murmur3_32" hash function is executed for the entire path string. The value '42' is used as the seed for the hash function. The YANG hash is subsequently calculated by taking the 30 least significant bits. The resulting 30-bit number is used by the server, unless the value is already being used for a different object by the server. In this case, the re-hash procedure in Section 3 is executed. The hash is generated for the string representing the object path identifier. A canonical representation of the path identifier is used. The module name is used to identify the namespace of the object node. The prefix cannot be used because it is allowed to change over time. The module name is never allowed to change. Bierman & van der Stok Expires August 13, 2016 [Page 4] Internet-Draft Yhash February 2016 The module name MUST be present in the identifier for the first node in the object path identifier. If a child node in the object path identifier is from the same module namespace as its parent, then the module-name MUST NOT be used in the identifier. If a child node in the object path identifier is not from the same module namespace as its parent, then the module-name MUST be used in the identifier. Choice and case node names are not included in the path expression. Only 'container', 'list', 'leaf', 'leaf-list', and 'anyxml' nodes are listed in the path expression. The YANG Hash value is calculated for all data nodes in the module, even if the server only implements a subset of these objects. This includes all "data-def", "rpc", "notification", and external data nodes derived from "augment" statements. Example: the following canonical identifier is used for the 'mtu' leaf in the ietf-interfaces module: /ietf-interfaces:interfaces/interface/mtu Example: the following canonical identifier is used for the 'ipv4' container in the ietf-ip module, which augments the 'interface' list in the ietf-interfaces module: /ietf-interfaces:interfaces/interface/ietf-ip:ipv4 3. Re-Hash Error Procedure In most cases, the hash function is expected to produce unique values for all the node names supported by a constrained server. Given a known set of YANG modules, both server and client can calculate the YANG hashes independently, and offline. Even though collisions are expected to happen rather rarely, they need to be considered (see Appendix A for clash probabilities). Collisions can be detected before deployment, if the vendor knows which modules are supported by the server, and hence all YANG hashes can be calculated. Collisions occur at a given server dependent on the set of modules supported by the server. The client needs to discover any re-hash mappings on a per server basis. Bierman & van der Stok Expires August 13, 2016 [Page 5] Internet-Draft Yhash February 2016 If the server needs to re-hash any YANG name, then it MUST create a "rehash" entry for all its rehashed node names, as described in Section 4. A re-hashed object identifier has the rehash bit set in the identifier, every time it is sent from the server to the client. This allows the client to identify nodes for which a "reverse rehash" entry may need to be retrieved (see Section 6). A client does not need to retrieve the rehash map before retrieving or configuring rehashed data nodes. If any node identifier provided by the client is not available because it has been rehashed, the server MUST return a rehash error, containing the 'rehash' entries for all the invalid nodes which were specified by the client. It is possible that none of the node identifiers provided by the client in a GET method are invalid and rehashed, but rather one or more descendant nodes within the selected subtree(s) have been rehashed. In this case, a rehash error is not returned. Instead the requested subtree(s) are returned, and the rehash bit is set for any descendant node(s) that have been rehashed. The client will strip off the rehash bit and retrieve the 'revhash' entry for these nodes (if not already done). 4. Re-Hashing Names Procedure A hash collision occurs if two different path identifier strings have the same hash value. If the server has around 1000 node names in its YANG modules, then the probability of a collision is a half per mil (see Appendix A). If a hash collision occurs on the server, then the node name that is causing the conflict has to be altered, such that the new hash value does not conflict with any value already in use by the server. For example, rehashing could be done by prefixing a "~" character in front of the clashing name and execute murmur3 on the thus modified name. If necessary the prefixing can be done multiple times until the clashes are resolved. Using the prefixing is not needed from an inter-operability point of view but provides a procedure for client and server to calculate the rehash without reading the "rehash" entry. Bierman & van der Stok Expires August 13, 2016 [Page 6] Internet-Draft Yhash February 2016 5. ietf-yang-hash YANG Module The "ietf-yang-hash" YANG module is used by the server to report any objects that have been mapped to produce a new hash value that does not conflict with any other YANG hash values used by the server. YANG tree diagram for "ietf-yang-hash" module: +--ro yang-hash +--ro rehash* [hash] +--ro hash uint32 +--ro object* +--ro module string +--ro newhash uint32 +--ro path? string file "ietf-yang-hash@2016-02-10.yang" module ietf-yang-hash { namespace "urn:ietf:params:xml:ns:yang:ietf-yang-hash"; prefix "yh"; organization "IETF CORE (Constrained RESTful Environments) Working Group"; contact "WG Web: WG List: WG Chair: Carsten Bormann WG Chair: Andrew McGregor Editor: Peter van der Stok Editor: Andy Bierman description "This module contains re-hash information for the CoMI protocol. Copyright (c) 2016 IETF Trust and the persons identified as Bierman & van der Stok Expires August 13, 2016 [Page 7] Internet-Draft Yhash February 2016 authors of the code. All rights reserved. Redistribution and use in source and binary forms, with or without modification, is permitted pursuant to, and subject to the license terms contained in, the Simplified BSD License set forth in Section 4.c of the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info). This version of this YANG module is part of RFC XXXX; see the RFC itself for full legal notices."; // RFC Ed.: replace XXXX with actual RFC number and remove this // note. // RFC Ed.: remove this note // Note: extracted from draft-bierman-core-yang-hash-00.txt // RFC Ed.: update the date below with the date of RFC publication // and remove this note. revision 2016-02-10 { description "Initial revision."; reference "RFC XXXX: YANG Hash."; } container yang-hash { config false; description "Contains information on the YANG Hash values used by the server."; list rehash { key hash; description "Each entry describes an re-hash mapping in use by the server."; leaf hash { type uint32; description "The hash value that has a collision. This hash value cannot be used on the server. The rehashed value for each affected object must be used instead."; } list object { Bierman & van der Stok Expires August 13, 2016 [Page 8] Internet-Draft Yhash February 2016 min-elements 2; description "Each entry identifies one of the objects involved in the hash collision and contains the rehash information for that object."; leaf module { type string; mandatory true; description "The module name identifying the module namespace for this object."; } leaf newhash { type uint32; mandatory true; description "The new hash value for this object. The rehash bit is not set in this value."; } leaf path { type string; description "The object path identifier string used in the original YANG hash calculation. This object MUST be included for any objects in the rehash entry with the same 'module' value."; } } } } } 6. YANG Re-Hash Examples In this example there are two YANG modules, "foo" and "bar". module foo { namespace "http://example.com/ns/foo"; prefix "f"; Bierman & van der Stok Expires August 13, 2016 [Page 9] Internet-Draft Yhash February 2016 revision 2015-06-07; container A { list B { key name; leaf name { type string; } leaf counter1 { type uint32; } } } } module bar { namespace "http://example.com/ns/bar"; prefix "b"; import foo { prefix f; } revision 2015-06-07; augment /f:A/f:B { leaf counter2 { type uint32; } } } This set of 3 YANG modules containing a total of 7 objects produces the following object list. Note that actual hash values are not shown, since these modules do not actually cause the YANG Hash clashes described in the examples. Object Path Hash foo: container /foo:A h1 list /foo:A/B h2 leaf /foo:A/B/name h3 leaf /foo:A/B/counter1 h4 bar: leaf /foo:A/B/bar1:counter2 h5 Bierman & van der Stok Expires August 13, 2016 [Page 10] Internet-Draft Yhash February 2016 6.1. Multiple Modules In this example, assume that the 'B' and 'counter2' objects produce the same hash value, so 'h2' and 'h5' both have the same value (e.g. '1234'): The client might retrieve an entry from the list "/foo:A/B", which would cause this subtree to be returned. Instead, the server will return a message with the resource type "core.mg.yang-hash", representing the "yang-hash" data structure. Only the entry for the requested identifier is returned, even if multiple 'rehash' list entries exist. REQ: GET example.com/mg/h2?keys="entry2" RES: 4.00 "Bad Request" (Content-Format: application/cbor) { "ietf-yang-hash:yang-hash" : { "rehash" : [ { "hash" : 1234, "object" : [ { "module" : "foo", "newhash" : 5678 }, { "module" : "bar", "newhash" : 8182 } ] } ] } } 6.2. Same Module In this example, assume that the 'B', 'counter1', and 'counter2' objects produce the same hash value, so 'h2', 'h4', and 'h5' objects all have the same value (e.g. '1234'): The client might retrieve an entry from the list "/foo:A/B", which would cause this subtree to be returned. Instead, the server will return a message with the resource type "core.mg.yang-hash", representing the "yang-hash" data structure. Only the entry for the Bierman & van der Stok Expires August 13, 2016 [Page 11] Internet-Draft Yhash February 2016 requested identifier is returned, even if multiple 'rehash' list entries exist. REQ: GET example.com/mg/h2?keys="entry2" RES: 4.00 "Bad Request" (Content-Format: application/cbor) { "ietf-yang-hash:yang-hash" : { "rehash" : [ { "hash" : 1234, "object" : [ { "module" : "foo", "newhash" : 5678, "path" : "/foo:A/B" }, { "module" : "foo", "newhash" : 2134, "path" : "/foo:A/B/counter1" }, { "module" : "bar", "newhash" : 8182, "path" : "/foo:A/B/bar:counter2" } ] } ] } } 7. Retrieval of Rehashed Data In this example, assume that the 'B', 'counter1', and 'counter2' objects produce the same hash value, so 'h2', 'h4', and 'h5' objects all have the same value (e.g. '1234'): The client might retrieve the top-level container "/foo:A", which would cause this subtree to be returned. Since the identifier (h1) has not been re-hashed, the server will return the requested data. The new hashes for 'h2', 'h4',and 'h5' will be returned, except the rehash bit will be set for these identifiers. Bierman & van der Stok Expires August 13, 2016 [Page 12] Internet-Draft Yhash February 2016 The notation "R+" indicates that the rehash bit is set. REQ: GET example.com/mg/h1 RES: 2.05 Content (Content-Format: application/cbor) { h1 : { R+5678 : { { h3 : "entry1"}: {R+2134: 615, R+8182: 7}, { h3 : "entry2"}: {R+2134: 491, R+8182: 26} } } } The client will notice that the rehash bit is set for 3 nodes. The client will need to retrieve the full "yang-hash" container at this point, if that has not already been done. The rehashed identifiers will be in "rehash" list, contained in the "newhash" leaf for the "object" list. 8. YANG Hash representations YANG hashes are represented in two fashions. 8.1. YANG Hash in payload When a YANG hash value is printed in the payload, error-path or other string, then the lowercase hexadecimal representation is used. Leading zeros are used so the value uses 8 hex characters. 8.2. YANG Hash in URL When a URL contains a YANG hash, it is encoded using base64url "URL and Filename safe" encoding as specified in [RFC4648]. The hash H is represented as a 30-bit integer, divided into five 6-bit integers as follows: B1 = (H & 0x3f000000) >> 24 B2 = (H & 0xfc0000) >> 18 B3 = (H & 0x03f000) >> 12 B4 = (H & 0x000fc0) >> 6 B5 = H & 0x00003f Bierman & van der Stok Expires August 13, 2016 [Page 13] Internet-Draft Yhash February 2016 Subsequently, each 6-bit integer Bx is translated into a character Cx using Table 2 from [RFC4648], and a string is formed by concatenating the characters in the order C1, C2, C3, C4, C5. For example, the YANG hash 0x29abdcca is encoded as "pq9zK". 9. YANG Hash Examples The YANG hash value for 'current-datetime' is calculated by constructing the schema node identifier for the object: /ietf-system:system-state/clock/current-datetime The 30 bit murmur3 hash value (see Section 2) is calculated on this string with hash: 0x047c468b and EfEaM. The request using this hash value is shown below: REQ: GET example.com/mg/EfEaM RES: 2.05 Content (Content-Format: application/cbor) { 0x047c468b : "2014-10-26T12:16:31Z" } The YANG hash values for 'clock', 'current-datetime', and 'boot- datetime' are calculated by constructing the schema node identifier for the objects, and then calculating the 30 bit murmur3 hash values (shown in parenthesis): /ietf-system:system-state/clock (0x021ca491 and CDKSQ) /ietf-system:system-state/clock/current-datetime (0x047c468b) /ietf-system:system-state/clock/boot-datetime (0x1fb5f4f8) The YANG hash values for 'neighbor', 'ip', and 'link-layer-address' are calculated by constructing the schema node identifier for the objects, and then calculating the 30 bit murmur3 hash values (shown in parenthesis): /ietf-interfaces:interfaces/interface/ietf-ip:ipv6/neighbor (0x2445e478 and kReR4) /ietf-interfaces:interfaces/interface/ietf-ip:ipv6/neighbor/ip (0x2283ed40 and ig-la) /ietf-interfaces:interfaces/interface/ietf-ip:ipv6/neighbor/ link-layer-address (0x3d6915c7) The YANG translation of the SMI specifying the ipNetToMediaTable [RFC4293] yields: Bierman & van der Stok Expires August 13, 2016 [Page 14] Internet-Draft Yhash February 2016 container IP-MIB { container ipNetToPhysicalTable { list ipNetToPhysicalEntry { key "ipNetToPhysicalIfIndex ipNetToPhysicalNetAddressType ipNetToPhysicalNetAddress"; leaf ipNetToMediaIfIndex { type: int32; } leaf ipNetToPhysicalIfIndex { type if-mib:InterfaceIndex; } leaf ipNetToPhysicalNetAddressType { type inet-address:InetAddressType; } leaf ipNetToPhysicalNetAddress { type inet-address:InetAddress; } leaf ipNetToPhysicalPhysAddress { type yang:phys-address { length "0..65535"; } } leaf ipNetToPhysicalLastUpdated { type yang:timestamp; } leaf ipNetToPhysicalType { type enumeration { ... } } leaf ipNetToPhysicalState { type enumeration { ... } } leaf ipNetToPhysicalRowStatus { type snmpv2-tc:RowStatus; } } } The YANG hash values for 'ipNetToPhysicalEntry' and its child nodes are calculated by constructing the schema node identifier for the objects, and then calculating the 30 bit murmur3 hash values (shown in parenthesis): /IP-MIB:IP-MIB/ipNetToPhysicalTable (0x0aba15cc and kuhXM) /IP-MIB:IP-MIB/ipNetToPhysicalTable/ipNetToPhysicalEntry (0xo6aaddbc and Gqt28) /IP-MIB:IP-MIB/ipNetToPhysicalTable/ipNetToPhysicalEntry/ ipNetToPhysicalIfIndex (0x346b3071) Bierman & van der Stok Expires August 13, 2016 [Page 15] Internet-Draft Yhash February 2016 /IP-MIB:IP-MIB/ipNetToPhysicalTable/ipNetToPhysicalEntry/ ipNetToPhysicalNetAddressType (0x3650bb64) /IP-MIB:IP-MIB/ipNetToPhysicalTable/ipNetToPhysicalEntry/ ipNetToPhysicalNetAddress (0x06fd4d91) /IP-MIB/ipNetToPhysicalTable/ipNetToPhysicalEntry/ ipNetToPhysicalPhysAddress (0x26180bcb) /IP-MIB:IP-MIB/ipNetToPhysicalTable/ipNetToPhysicalEntry/ ipNetToPhysicalLastUpdated (0x3d6bbe90) /IP-MIB:IP-MIB/ipNetToPhysicalTable/ipNetToPhysicalEntry/ ipNetToPhysicalType (0x35ecbb3d) /IP-MIB:IP-MIB/ipNetToPhysicalTable/ipNetToPhysicalEntry/ ipNetToPhysicalState (0x13038bb5) /IP-MIB:IP-MIB/ipNetToPhysicalTable/ipNetToPhysicalEntry/ ipNetToPhysicalRowStatus (0x09e1fa37) The YANG Hash values for the YANG Patch request objects are calculated as follows: 0x2c3f93c7: /ietf-yang-patch:yang-patch 0x2fb8873e: /ietf-yang-patch:yang-patch/patch-id 0x011640f0: /ietf-yang-patch:yang-patch/comment 0x16804b72: /ietf-yang-patch:yang-patch/edit 0x2bd93228: /ietf-yang-patch:yang-patch/edit/edit-id 0x1959d8c9: /ietf-yang-patch:yang-patch/edit/operation 0x1346e0aa: /ietf-yang-patch:yang-patch/edit/target 0x0750e196: /ietf-yang-patch:yang-patch/edit/point 0x0b45277e: /ietf-yang-patch:yang-patch/edit/where 0x2822c407: /ietf-yang-patch:yang-patch/edit/value 10. Security Considerations The replacement of name-strings by numbers does not affect the security of the transmitted requests. 11. IANA Considerations No considerations for IANA apply. 12. Acknowledgements We are very grateful to Bert Greevenbosch who suggested the use of hashes and specified the use of murmur3. Many thanks for their contributions go to Alexander Pelov, Juergen Schonwalder, Anuj Sehgal and Michel Veillette. Bierman & van der Stok Expires August 13, 2016 [Page 16] Internet-Draft Yhash February 2016 This material is based upon work supported by the The Space & Terrestrial Communications Directorate (S&TCD) under Contract No. W15P7T-13-C-A616. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of The Space & Terrestrial Communications Directorate (S&TCD) . 13. Changelog Version 0 is extracted from comi draft version 8 [I-D.vanderstok-core-comi]. Changed Appendix A, and added Appendix C. 14. References 14.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/ RFC2119, March 1997, . [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006, . [RFC6020] Bjorklund, M., Ed., "YANG - A Data Modeling Language for the Network Configuration Protocol (NETCONF)", RFC 6020, DOI 10.17487/RFC6020, October 2010, . [RFC7049] Bormann, C. and P. Hoffman, "Concise Binary Object Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049, October 2013, . [RFC7252] Shelby, Z., Hartke, K., and C. Bormann, "The Constrained Application Protocol (CoAP)", RFC 7252, DOI 10.17487/ RFC7252, June 2014, . [murmur3] , "murmurhash family", Web http://en.wikipedia.org/wiki/MurmurHash, . [murmur-imp] , "murmurhash implementation", Web https://code.google.com /p/smhasher/, . Bierman & van der Stok Expires August 13, 2016 [Page 17] Internet-Draft Yhash February 2016 14.2. Informative References [RFC2578] McCloghrie, K., Ed., Perkins, D., Ed., and J. Schoenwaelder, Ed., "Structure of Management Information Version 2 (SMIv2)", STD 58, RFC 2578, DOI 10.17487/ RFC2578, April 1999, . [RFC4293] Routhier, S., Ed., "Management Information Base for the Internet Protocol (IP)", RFC 4293, DOI 10.17487/RFC4293, April 2006, . [RFC6241] Enns, R., Ed., Bjorklund, M., Ed., Schoenwaelder, J., Ed., and A. Bierman, Ed., "Network Configuration Protocol (NETCONF)", RFC 6241, DOI 10.17487/RFC6241, June 2011, . [I-D.ietf-netconf-restconf] Bierman, A., Bjorklund, M., and K. Watsen, "RESTCONF Protocol", draft-ietf-netconf-restconf-07 (work in progress), July 2015. [I-D.vanderstok-core-comi] Stok, P., Bierman, A., Schoenwaelder, J., and A. Sehgal, "CoAP Management Interface", draft-vanderstok-core-comi-08 (work in progress), October 2015. [coll-prob] Preshing, j., "Hash collision probabilities", Web http://preshing.com/20110504/hash-collision-probabilities, May 2011. [birthday] Wikipedia, , "Birthday problem", Web https:// en.wikipedia.org/wiki/Birthday_problem, . Appendix A. Hash clash probability +--------+---------+---------+---------+---------+---------+--------+ | Number | 28 bits | 29 bits | 30 bits | 31 bits | 32 bits | 33 | | of | | | | | | bits | | names | | | | | | | +--------+---------+---------+---------+---------+---------+--------+ | 10 | 1,7E-07 | 8,4E-08 | 4,2E-08 | 2,1E-08 | 1,1E-08 | 5,2E-0 | | | | | | | | 9 | | | | | | | | | | 100 | 1,8E-05 | 9,2E-06 | 4,6E-06 | 2,3E-06 | 1,2E-06 | 5,8E-0 | | | | | | | | 7 | Bierman & van der Stok Expires August 13, 2016 [Page 18] Internet-Draft Yhash February 2016 | | | | | | | | | 200 | 7,4E-05 | 3,7E-05 | 1,9E-05 | 9,3E-06 | 4,6E-06 | 2,3E-0 | | | | | | | | 6 | | | | | | | | | | 10^3 | 1,9E-03 | 9,3E-04 | 4,7E-04 | 2,3E-04 | 1,2E-04 | 5,8E-0 | | | | | | | | 5 | | | | | | | | | | 4000 | 3,0E-02 | 1,5E-02 | 7,5E-03 | 3,7E-03 | 1,9E-03 | 9,3E-0 | | | | | | | | 4 | | | | | | | | | | 10^4 | 1,9E-01 | 9,3E-02 | 4,6E-02 | 2,3E-02 | 1,2E-02 | 5,8E-0 | | | | | | | | 3 | +--------+---------+---------+---------+---------+---------+--------+ Table 1: Probability of one or more clashes This appendix calculates the probability of a hash clash as function of the hash size and the number of YANG names. The standard way to calculate the probability of a clash is to calculate the probability that no clashes occur [birthday], [coll-prob]. The probability of no clashes when generating k numbers with a hash size of N=2^bits is given by: D(N,k) = ((N-1)/N)*((N-2)/N)*....(N-(k-1))/N (1) which can be approximated with: exp(-k*(k-1)/2N) (2) The probability that one or more clashes occur is given by: 1 - exp(-k*(k-1)/2N) ~ k*(k-1)/2N (3) Table 1 shows the probabilities for a given set of values of N=2^bits and number of YANG node names k. Probabilities which are larger than 0.5 are not shown because the used approximations are not accurate any more. Bierman & van der Stok Expires August 13, 2016 [Page 19] Internet-Draft Yhash February 2016 The overhead in servers and clients depends on the number of clashes. Therefore it is interesting to know the probability that more than one clash occurs. The probability of generating k numbers with a hash size of N=2^bits, where 2 numbers are identical and all the rest is different, is composed of the following parts. The probability that the second number is equal to the first is 1/N. The possible number of configurations of 2 equal numbers out of k is given by SUM_i=1,k-1 (i). The probability of k-1 different numbers is given by D(N,k-1). The probability of generating exactly one clash of two numbers is given by: (SUM_1,k-1 (i))*D(N,k-1)/N Where we used formula (1). Working out the summation and using (2), the probability that exactly one pair of hashes clashes is given by: (k*(k-1)/2N)*exp(-(k-1)*(k-2)/2N) The probability that more than one pair clashes is given by the probability that a clash occurs minus the probability that only one pair clashes. This leads to: 1 - exp(-k*(k-1)/2N) - (k*(k-1)/2N)*exp(-(k-1)*(k-2)/2N) Substituting formula 3, gives: k*(k-1)/2N - k*(k-1)/2N + (k*(k-1)^2*(k-2)/4N^2 = (k*(k-1)^2*(k-2)/4N^2 +--------+---------+---------+---------+---------+---------+--------+ | Number | 28 bits | 29 bits | 30 bits | 31 bits | 32 bits | 33 | | of | | | | | | bits | | names | | | | | | | +--------+---------+---------+---------+---------+---------+--------+ | 10 | 2,3E-14 | 5,6E-15 | 1,4E-15 | 3,5E-16 | 8,8E-17 | 2,2E-1 | | | | | | | | 7 | | | | | | | | | | 100 | 3,3E-10 | 8,3E-11 | 2,1E-11 | 5,2E-12 | 1,3E-12 | 3,3E-1 | | | | | | | | 3 | | | | | | | | | | 200 | 5,4E-09 | 1,4E-09 | 3,4E-10 | 8,5E-1 | 2,1E-11 | 5,3E-1 | | | | | | | | 2 | | | | | | | | | | 10^3 | 3,5E-06 | 8,6E-07 | 2,2E-07 | 5,4E-08 | 1,4E-08 | 3,4E-0 | | | | | | | | 9 | | | | | | | | | | 4000 | 8,9E-04 | 2,2E-04 | 5,6E-05 | 1,4E-05 | 3,5E-06 | 8,7E-0 | Bierman & van der Stok Expires August 13, 2016 [Page 20] Internet-Draft Yhash February 2016 | | | | | | | 7 | | | | | | | | | | 10^4 | 3,5E-02 | 8,7E-03 | 2,2E-03 | 5,4E-04 | 1,4E-04 | 3,4E-0 | | | | | | | | 5 | +--------+---------+---------+---------+---------+---------+--------+ Table 2: Probability of more than 2 entries equal clashes The corresponding probabilities are shown in Table 2. Assuming a hash size of 2^30, and about 1000 YANG nodes in a server, the probability of one clashing pair is 0.5*10^-3, and the probability that more clashes occur is 2*10^-7. Appendix B. Hash clash storage overhead Clashes may occur in servers dynamically during the operation of their clients, and clashes must be handled on a per server basis in the client. When rehashing is possible, clashing names on a given server are prefixed with a character (for example "~") and are rehashed, thus leading to hash values which uniquely identify the data nodes in the server. This appendix calculates the storage space needed when a clash occurs in a set of servers running the same server code. Appendix A shows that more than one clash in a server set is exceptional, which suggests at most two clashing object names in a given server. The sizes of server and client tables needed to handle the clashes in client and server are calculated separately, because they differ significantly. B.1. Server tables When a request arrives at the server, the server must relate the incoming hash value to the memory locations where the related values are stored. In the server a translation table must be provided that relates a hash value to a memory address where either the raw data or a description of the data (as prescribed by the YANG compiler) are stored. The required storage space is a sequence of (32 bit yang hash, 64 bit memory address) for every YANG data node. The translation table size in a server is 12 bytes times the number of YANG data nodes in the server. For every clashing hash value the following server clash table entries are needed: Clashed hash value, module name, and new hash. To reduce table size in the client, module name can be replaced with a 1 byte module identifier. The module identifier represents the index value of an array of module names. Server clash table size is: 2 hashes (8 bytes) + 1 module identifier (1 byte) Bierman & van der Stok Expires August 13, 2016 [Page 21] Internet-Draft Yhash February 2016 B.2. Client tables In the client, the compiled code must refer to a hash value. To cope with on-the-fly rehashing, the compiled code needs to invoke a procedure that returns the possibly rehashed value as function of the original hash value, module name, and server address. The client needs to store a client clash table containing: the clashed hash value, module name, server IPv6 address (or name), and rehash value for as many rehashes occurring in a given server. Many servers contain an identical set of YANG modules. The servers containing the same module set belong to the same server type. The server type is used to administrate the hash clash occurrence. To reduce client clash table size, module name can be replaced with a 1 byte module identifier. The module identifier represents the index value of an array of module names. A table of IPv6 server addresses must already exist in the client. To reduce client clash table size further, the server IPv6 address can be replaced with a 1 byte server type identifier. The server table can be ordered according to server type. A table with server type and pointer to sub-table start suffices to find all IPv6 addresses belonging to a server type. The client clash table reduces to clashed hash value (4 bytes), module identifier (1 byte), server type identifier (1 byte) and rehash value (4 bytes). B.3. Table summary Sizes of all the tables are: Server clash table: 9 bytes per clashing object name. Client clash table: 10 bytes per server type, per clashing object name. Array of module names: Sum of module name sizes. Server identifier table: 1 byte server type + 4 bytes pointer per server type. The existence of the translation table in a server is required independent of rehashing. The table sizes calculated to estimate the storage requirements coming from CoMI clashes. Assume the following numbers: o 500 data nodes per server o 10 server types Bierman & van der Stok Expires August 13, 2016 [Page 22] Internet-Draft Yhash February 2016 o 30 modules o Module name is on average 20 bytes o Maximum of 2 clashing object names occurring in 2 server types This yields the following overhead estimates: Server tables size: * Server clash table: 2*9 bytes represents 18 bytes * Module name array: 30*20 represents 600 bytes Client table sizes: * Client clash table: 10*2*2 represents 40 bytes for 2 object names in 2 server types. * Module name array: 30*20 represents 600 bytes. * Server identifier table: 10*5 = 50 bytes In conclusion: 1. Storage space size in client is independent of number of servers but depends on number of server types. 2. There is a common storage size for the module array of 600 bytes. 3. Assuming 2 clashing object names in 2 server types, additional storage space in client is 40 bytes and in server 18 bytes. 4. When the module array is suppressed (removing 600 bytes storage space), the server clash table and the client clash table increase with 40 bytes and 80 bytes respectively. Appendix C. payload reduction The hashing of the YANG identifier in the transported payload reduces the size of the YANG objects transported in the payload. This note calculates the payload size reduction and the number of YANG objects that can be transported in a single 802.154 CoAP packet. The payload is assumed to be a sequence of maps, where the entry ("ident" : value) is referred to as a map, as shown below. The value can be an integer or a string. Bierman & van der Stok Expires August 13, 2016 [Page 23] Internet-Draft Yhash February 2016 { "ident1" : value 1, "ident2" : value 2. Etc ..... } In general, we can assume that the payload size (SI) of a YANG identifier string lies between 6 to 128 bytes with an average of 30-40 bytes. The payload size (SV) of a value can range between 1 byte to 128 bytes. The transport format is CBOR. The overhead coming from CBOR is composed of: o One byte to indicate the number of maps ( > 2 maps in the example above). o Two bytes per map: one describing the identifier, and one describing the value. When the CBOR byte describes an integer (major type 0), the size of the following integer is 0, when integer < 24. Otherwise the size of the following integer is 1 to 8 bytes. The size of the string remains unchanged when preceded by a CBOR byte (major type 2). When the string size < 24 , no extra bytes are needed; when the string size lies between 24 and 128 one extra CBOR byte overhead is needed. The parameters SE, SV and SI are introduced with the following meaning: o SE is the size of the identifier encoded by a hash or other unsigned integer, with 0 < SE < 5; because the hash size is 4 bytes and the minimum managed encoded identifier is assumed to take 1 byte. o SV is the size of the value, with 0 <= SV < 128. o SI is the size of the original YANG hash identifier string, with 4 < SI < 64. The impact of the conversion from YANG identifier string to unsigned integer is straightforward to calculate per map. It is assumed that one CBOR byte or two CBOR bytes are needed dependent on the sizes SI and SV. A map size with the original YANG identifier string is given by: o Map size is: 2 + SI + SV, when SI, SV < 24; o Map size is: 3 + SI + SV, when SI < 24 and SV > 23, or SI > 23 and SV < 24; Bierman & van der Stok Expires August 13, 2016 [Page 24] Internet-Draft Yhash February 2016 o Map size is: 4 + SI + SV, when SI, SV > 23. The map size when SI is converted to an unsigned integer with size SE is given by: o Encoded map size is: 2 + SE + SV, when SV < 24; o Encoded map size is: 3 + SE + SV, when SV > 23. The improvement of the conversion can be written as o (2+SE+SV)/(2+SI+SV) when SI, SV < 24; o (2+SE+SV)(3+SI+SV) when SI > 23 and SV < 24; o (3+SE+SV)/(3+SI+SV) when SI < 24, and SV > 23; o (3+SE+SV)/(4+SI+SV) when SI, SV > 23. Table 3 shows the size reduction for different SI and SV values and SE = 4: +-------+------+------+------+------+------+-------+------+ | SV-> | 0 | 4 | 10 | 20 | 30 | 40 | 60 | +-------+------+------+------+------+------+-------+------+ | SI=6 | 0,75 | 0,83 | 0,89 | 0,93 | 0,95 | 0,96 | 0,97 | | | | | | | | | | | SI=10 | 0,83 | 0,88 | 0,91 | 0,94 | 0,95 | 0,96 | 0,97 | | | | | | | | | | | SI=20 | 0,91 | 0,92 | 0,94 | 0,95 | 0,96 | 0,97 | 0,98 | | | | | | | | | | | SI=30 | 0,97 | 0,97 | 0,98 | 0,98 | 0,98 | 0,99 | 0,99 | | | | | | | | | | | SI=40 | 0,98 | 0,98 | 0,98 | 0,98 | 0,99 | 0,99 | 0,99 | +-------+------+------+------+------+------+-------+------+ Table 3: Payload size reduction as function of SI and SV Another way to look at it is to see how many maps fit in a packet. Assuming a 802.15.4 packet with short addresses, the IEEE header is 13 bytes. The CoAP header assuming only mesh traffic takes 6 bytes. The URI is composed of 3 options, where each option header takes 1 byte: total of 3 bytes. The URI is assumed to be split up in the following 3 parts: o URI-host "example.net" takes 13 bytes, which necessitates 1 length byte: total of 14 bytes. Bierman & van der Stok Expires August 13, 2016 [Page 25] Internet-Draft Yhash February 2016 o URI-path = "mg" takes 4 bytes. o URI-path = "hash value" takes 7 bytes. Total URI takes 3+14+4+7 = 28 bytes. For the CoMI payload, there remains 127 -13 - 6 - 28 = 80 bytes. Subtracting the byte indicating the number of CBOR maps, the payload size for the maps is 79 bytes. N.B. no security overhead is included. When the YANG string identifier needs to be stored, the number of storable maps is given by: o 79/(2+SI+SV) for SI, SV < 24; o 79/(3+SI+SV) for SI < 24 and SV > 23, or SI > 23 and SV < 24; o 79/(4+SI+SV) for SI, SV > 23. Table 4 shows the number of transported maps for different values of SI and SV. +-------+----+---+-----+-----+-----+-----+-----+ | SV-> | 0 | 4 | 10 | 20 | 30 | 40 | 60 | +-------+----+---+-----+-----+-----+-----+-----+ | SI=6 | 9 | 6 | 4 | 2 | 2 | 1 | 1 | | | | | | | | | | | SI=10 | 6 | 4 | 3 | 2 | 1 | 1 | 1 | | | | | | | | | | | SI=20 | 3 | 3 | 2 | 1 | 1 | 1 | 0 | | | | | | | | | | | SI=30 | 2 | 2 | 1 | 1 | 1 | 1 | 0 | | | | | | | | | | | SI=40 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | +-------+----+---+-----+-----+-----+-----+-----+ Table 4: Nr of transported maps as function of SI and SV Assuming the encoding of the YANG identifier to an unsigned integer with size 0 < SE < 5, the number of storable maps is given by: o 79/(2+SE+SV) for SV < 24; o 79/(3+SE+SV) for SV > 23. Table 5 shows the number of transported maps for different values of SE and SV, independent of SI. Bierman & van der Stok Expires August 13, 2016 [Page 26] Internet-Draft Yhash February 2016 +------+----+----+-----+-----+-----+-----+-----+ | SV-> | 0 | 4 | 10 | 20 | 30 | 40 | 60 | +------+----+----+-----+-----+-----+-----+-----+ | SE=1 | 26 | 11 | 6 | 3 | 2 | 1 | 1 | | | | | | | | | | | SE=2 | 19 | 9 | 5 | 3 | 2 | 1 | 1 | | | | | | | | | | | SE=3 | 15 | 8 | 5 | 3 | 2 | 1 | 1 | | | | | | | | | | | SE=4 | 13 | 7 | 4 | 3 | 2 | 1 | 1 | +------+----+----+-----+-----+-----+-----+-----+ Table 5: Nr of transported maps as function of SE and SV, The value of SI for the average YANG string identifier is 30. When the size of the value part of the map is less than 10 (SV < 10) and SI=30, the impact of the hashing is significant: 10 maps can be transported instead of 1 or 2. Further reducing the value of SE from 4 to 1 increases the number of transported maps with a factor 2 to 1,2. Authors' Addresses Andy Bierman YumaWorks 685 Cochran St. Suite #160 Simi Valley, CA 93065 USA Email: andy@yumaworks.com Peter van der Stok consultant Phone: +31-492474673 (Netherlands), +33-966015248 (France) Email: consultancy@vanderstok.org URI: www.vanderstok.org Bierman & van der Stok Expires August 13, 2016 [Page 27]