Thing-to-Thing Research Group K. Hartke Internet-Draft Ericsson Intended status: Experimental October 22, 2018 Expires: April 25, 2019 Constrained Internationalized Resource Identifiers draft-hartke-t2trg-ciri-00 Abstract This document specifies Constrained Internationalized Resource Identifier References, a serialization of Internationalized Resource Identifier (IRI) references that encodes the IRI components as Concise Binary Object Representation (CBOR) data items rather than a string of characters. This intends to simplify parsing, reference resolution, and comparison of IRIs in Constrained RESTful Environments (CoRE). Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on April 25, 2019. Copyright Notice Copyright (c) 2018 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of Hartke Expires April 25, 2019 [Page 1] Internet-Draft Constrained IRIs October 2018 the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1. Requirements Notation . . . . . . . . . . . . . . . . . . 3 2. Data Model . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1. Options . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2. Option Sequences . . . . . . . . . . . . . . . . . . . . 4 3. CBOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4. Python . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4.1. Reference Resolution . . . . . . . . . . . . . . . . . . 8 4.2. IRI Recomposition . . . . . . . . . . . . . . . . . . . . 9 4.3. CoAP Encoding . . . . . . . . . . . . . . . . . . . . . . 12 5. Security Considerations . . . . . . . . . . . . . . . . . . . 13 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 14 7.1. Normative References . . . . . . . . . . . . . . . . . . 14 7.2. Informative References . . . . . . . . . . . . . . . . . 14 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 15 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 15 1. Introduction URI references [RFC3986] and, albeit less prevalently, IRI references [RFC3987] are the standard way to link to resources in hypertext formats like HTML [W3C.REC-html52-20171214] and the CoRE Link Format [RFC6690]. They encode the components of a resource reference either as an absolute URI/IRI or as a relative reference that must be resolved against a base URI/IRI to receive an absolute URI/IRI. URI and IRI references are strings of characters where the characters are chosen from a limited subset of the repertoires of US-ASCII and Unicode characters, respectively. The individual components of a URI or IRI reference are delimited by a number of reserved characters, which necessitates the use of percent-encoding when these reserved characters are used in a non-delimiting function. The resolution of references involves parsing URI/IRI references into their components, combining those components with the components of the base URI/IRI, merging paths, removing dot segments, and recomposing the result back into a character string. Altogether, the proper processing of URIs is quite complex. This can be a problem particularly in constrained environments [RFC7228], where devices often have severe code size limitations. As a result, many implementations in these environments choose to implement only Hartke Expires April 25, 2019 [Page 2] Internet-Draft Constrained IRIs October 2018 an ad-hoc, informally-specified, bug-ridden, non-interoperable subset of less than half of RFC 3986. This document specifies Constrained IRI References, a serialization format for IRI references that encodes the IRI components as Concise Binary Object Representation (CBOR) [RFC7049] data items rather than as a string of characters. Assuming that a CBOR implementation is already present on a device, typical operations on Constrained IRI references such as parsing, reference resolution, and comparison can be implemented much more easily than with the original format. A full implementation that covers all corner cases of the specification can be implemented in a relatively small amount of code. As a result of the simplification, Constrained IRI References are not capable of expressing all IRI references that are permitted by the syntax of RFC 3987. The supported subset includes all Constrained Application Protocol (CoAP) URIs [RFC7252], most Hypertext Transfer Protocol (HTTP) URIs [RFC7230], and many other URIs and IRIs. 1.1. Requirements Notation The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. 2. Data Model The data model for Constrained IRI References is very similar to the encoding of a request URI in CoAP messages [RFC7252]: The components of an IRI reference are encoded as a sequence of _options_. Each option consists of an _option number_ identifying the type of option (scheme, host name, etc.) and the _option value_. 2.1. Options The following types of options are defined: scheme Specifies the IRI scheme. Since IRI schemes have the same syntax as URI schemes, the option value MUST match the "scheme" rule defined in Section 3.1 of RFC 3986. host.name Specifies the host of the IRI authority as a registered name. host.ip Hartke Expires April 25, 2019 [Page 3] Internet-Draft Constrained IRIs October 2018 Specifies the host of the IRI authority as an IPv4 address (4 bytes) or an IPv6 address (16 bytes). port Specifies the port number. The option value MUST be an unsigned integer in the range 0 to 65535 (inclusive). path.type Specifies the type of the IRI path for reference resolution. Possible values are 0 (absolute-path), 1 (append-path), 2 (relative-path), and 3 (append-relation). path Specifies one segment of the IRI path. This option can occur more than once. query Specifies one argument of the IRI query. This option can occur more than once. fragment Specifies the fragment identifier. The value of a "host.name", "path", "query", and "fragment" option can be any Unicode string. No percent-encoding is performed. 2.2. Option Sequences _ host.name _ ____ scheme __/ \___ port _ \ \________/ \__ host.ip __/ / \ \__________________________/ ________/ \ / ________ _________ \ / / \ / \ \__________ path.type __\_\_ path _/__\_ query _/__ fragment __ \___________/ \________/ \_________/ \__________/ Figure 1: Structure of a Well-Formed Sequence of Options A sequence of options is considered _well-formed_ if: o the sequence of options is empty or starts with a "scheme", "host.name", "host.ip", "port", "path.type", "path", "query", or "fragment" option; o any "scheme" option is followed by either a "host.name" or a "host.ip" option; Hartke Expires April 25, 2019 [Page 4] Internet-Draft Constrained IRIs October 2018 o any "host.name" option is followed by a "port" option; o any "host.ip" option is followed by a "port" option; o any "port" option is followed by a "path", "query", or "fragment" option or is at the end of the sequence; o any "path.type" option is followed by a "path", "query", or "fragment" option or is at the end of the sequence; o any "path" option is followed by a "path", "query", or "fragment" option or is at the end of the sequence; o any "query" option is followed by a "query" or "fragment" option or is at the end of the sequence; and o any "fragment" option is at the end of the sequence. A well-formed sequence of options is considered _absolute_ if the sequence of options starts with a "scheme" option. A well-formed sequence of options is considered _relative_ if the sequence of options is empty or starts with an option other than the "scheme" option. An absolute sequence of options is considered _normalized_ if the result of resolving the sequence of options against any base IRI reference is equal to the input. (It doesn't matter what base IRI it is resolved against, since it is already absolute.) The following operations can be performed on a sequence of options: resolve(href, base) Resolves a well-formed sequence of options `href` against an absolute sequence of options `base`. This operation MUST be performed by applying any algorithm that is functionally equivalent to the reference implementation in Section 4.1 of this document. recompose(href) Recomposes an IRI reference string from an absolute sequence of options `href`. This operation MUST be performed by applying any algorithm that is functionally equivalent to the reference implementation in Section 4.2 of this document. To reduce variability, it is RECOMMENDED to uppercase the letters in the hexadecimal notation when percent-encoding octets [RFC3986] Hartke Expires April 25, 2019 [Page 5] Internet-Draft Constrained IRIs October 2018 and to follow the recommendations of Section 4 of RFC 5952 for the text representation of IPv6 addresses [RFC5952]. decompose(iriref) Decomposes a IRI reference string into a sequence of options. This operation MUST be performed by applying any algorithm that returns a sequence of options such that `recompose(decompose(x))` is equivalent to `x`. coap(href) Constructs CoAP options from an absolute, normalized sequence of options. This operation MUST be performed by recomposing the sequence of options to an IRI reference string as described above, mapping the IRI to a URI as specified in Section 3.1 of RFC 3987, and decomposing the URI into CoAP options as specified in Section 6.4 of RFC 7252. A concise implementation is illustrated in Section 4.3 of this document. 3. CBOR A sequence of options is serialized as an array in Concise Binary Object Representation (CBOR) [RFC7049] as follows. The structure is presented in the Concise Data Definition Language (CDDL) [I-D.ietf-cbor-cddl]. ciri = [?(scheme: 1, text .regexp "[A-Za-z][A-Za-z0-9+.-]*"), ?(host.name: 2, text // host.ip: 3, bytes .size 4 / bytes .size 16), ?(port: 4, uint .size 2), ?(path.type: 5, path-type), *(path: 6, text), *(query: 7, text), ?(fragment: 8, text)] path-type = &(absolute-path: 0, append-path: 1, relative-path: 2, append-relation: 3) 4. Python The following Python 3.6 code shows how to work with a sequence of options. import enum Hartke Expires April 25, 2019 [Page 6] Internet-Draft Constrained IRIs October 2018 class Option(enum.IntEnum): _BEGIN = 0 SCHEME = 1 HOST_NAME = 2 HOST_IP = 3 PORT = 4 PATH_TYPE = 5 PATH = 6 QUERY = 7 FRAGMENT = 8 _END = 9 class PathType(enum.IntEnum): ABSOLUTE_PATH = 0 APPEND_PATH = 1 RELATIVE_PATH = 2 APPEND_RELATION = 3 _TRANSITIONS = [(Option.SCHEME, Option.HOST_NAME, Option.HOST_IP, Option.PORT, Option.PATH_TYPE, Option.PATH, Option.QUERY, Option.FRAGMENT, Option._END), (Option.HOST_NAME, Option.HOST_IP), (Option.PORT,), (Option.PORT,), (Option.PATH, Option.QUERY, Option.FRAGMENT, Option._END), (Option.PATH, Option.QUERY, Option.FRAGMENT, Option._END), (Option.PATH, Option.QUERY, Option.FRAGMENT, Option._END), (Option.QUERY, Option.FRAGMENT, Option._END), (Option._END,)] def is_well_formed(href): previous = Option._BEGIN for option, _ in href: if option not in _TRANSITIONS[previous]: return False previous = option if Option._END not in _TRANSITIONS[previous]: return False return True def is_absolute(href): return is_well_formed(href) and \ (len(href) != 0 and href[0][0] == Option.SCHEME) def is_relative(href): return is_well_formed(href) and \ (len(href) == 0 or href[0][0] != Option.SCHEME) Hartke Expires April 25, 2019 [Page 7] Internet-Draft Constrained IRIs October 2018 4.1. Reference Resolution The following Python 3.6 code defines how to resolve a sequence of options that might be relative to a given base IRI. def resolve(base, href, relation=0): if not is_absolute(base) or not is_well_formed(href): return None result = [] type = PathType.RELATIVE_PATH option = href[0][0] if href else Option._END if option == Option.HOST_IP: option = Option.HOST_NAME elif option == Option.PATH_TYPE: type = href[0][1] href = href[1:] option = Option.PATH if option != Option.PATH or type == PathType.ABSOLUTE_PATH: _copy_until(base, result, option) else: _copy_until(base, result, Option.QUERY) if type == PathType.APPEND_RELATION: _append_and_normalize(result, Option.PATH, str(relation)) elif type == PathType.RELATIVE_PATH: _remove_last_path_segment(result) _copy_until(href, result, Option._END) _append_and_normalize(result, Option._END, None) return result def _copy_until(input, output, end): for option, value in input: if option >= end: break _append_and_normalize(output, option, value) def _append_and_normalize(output, option, value): if option == Option.PATH: if value == '.': return if value == '..': _remove_last_path_segment(output) return elif option > Option.PATH: if len(output) >= 2 and \ Hartke Expires April 25, 2019 [Page 8] Internet-Draft Constrained IRIs October 2018 output[-1] == (Option.PATH, '') and ( output[-2][0] < Option.PATH_TYPE or ( output[-2][0] == Option.PATH_TYPE and output[-2][1] == PathType.ABSOLUTE_PATH)): _remove_last_path_segment(output) if option > Option.FRAGMENT: return output.append((option, value)) def _remove_last_path_segment(output): if len(output) >= 1 and output[-1][0] == Option.PATH: del output[-1] 4.2. IRI Recomposition The following Python 3.6 code defines how to recompose an IRI from a sequence of options that encodes an absolute IRI reference. def recompose(href): if not is_absolute(href): return None result = '' no_path = True first_query = True for option, value in href: if option == Option.SCHEME: result += value + ':' elif option == Option.HOST_NAME: result += '//' + _encode_ireg_name(value) elif option == Option.HOST_IP: result += '//' + _encode_ip_address(value) elif option == Option.PORT: result += ':' + str(value) elif option == Option.PATH: result += '/' + _encode_path_segment(value) no_path = False elif option == Option.QUERY: if no_path: result += '/' no_path = False result += '?' if first_query else '&' result += _encode_query_argument(value) first_query = False elif option == Option.FRAGMENT: Hartke Expires April 25, 2019 [Page 9] Internet-Draft Constrained IRIs October 2018 if no_path: result += '/' no_path = False result += '#' + _encode_fragment(value) if no_path: result += '/' no_path = False return result def _encode_ireg_name(s): return ''.join(c if _is_ireg_name_char(c) else _encode_pct(c) for c in s) def _encode_ip_address(b): if len(b) == 4: return '.'.join(str(c) for c in b) elif len(b) == 16: return '[' + ... + ']' # see RFC 5952 def _encode_path_segment(s): return ''.join(c if _is_isegment_char(c) else _encode_pct(c) for c in s) def _encode_query_argument(s): return ''.join(c if _is_iquery_char(c) and c != '&' else _encode_pct(c) for c in s) def _encode_fragment(s): return ''.join(c if _is_ifragment_char(c) else _encode_pct(c) for c in s) def _encode_pct(s): return ''.join('%{0:0>2X}'.format(c) for c in s.encode('utf-8')) def _is_ireg_name_char(c): return _is_iunreserved(c) or _is_sub_delim(c) def _is_isegment_char(c): return _is_ipchar(c) def _is_iquery_char(c): return _is_ipchar(c) or _is_iprivate(c) or c == '/' or c == '?' def _is_ifragment_char(c): return _is_ipchar(c) or c == '/' or c == '?' def _is_ipchar(c): return _is_iunreserved(c) or _is_sub_delim(c) or \ Hartke Expires April 25, 2019 [Page 10] Internet-Draft Constrained IRIs October 2018 c == ':' or c == '@' def _is_iunreserved(c): return _is_alpha(c) or _is_digit(c) or \ c == '-' or c == '.' or c == '_' or c == '~' or \ _is_ucschar(c) def _is_alpha(c): return c >= 'A' and c <= 'Z' or c >= 'a' and c <= 'z' def _is_digit(c): return c >= '0' and c <= '9' def _is_sub_delim(c): return c == '!' or c == '$' or c == '&' or c == '\'' or \ c == '(' or c == ')' or c == '*' or c == '+' or \ c == ',' or c == ';' or c == '=' def _is_ucschar(c): return c >= '\U000000A0' and c <= '\U0000D7FF' or \ c >= '\U0000F900' and c <= '\U0000FDCF' or \ c >= '\U0000FDF0' and c <= '\U0000FFEF' or \ c >= '\U00010000' and c <= '\U0001FFFD' or \ c >= '\U00020000' and c <= '\U0002FFFD' or \ c >= '\U00030000' and c <= '\U0003FFFD' or \ c >= '\U00040000' and c <= '\U0004FFFD' or \ c >= '\U00050000' and c <= '\U0005FFFD' or \ c >= '\U00060000' and c <= '\U0006FFFD' or \ c >= '\U00070000' and c <= '\U0007FFFD' or \ c >= '\U00080000' and c <= '\U0008FFFD' or \ c >= '\U00090000' and c <= '\U0009FFFD' or \ c >= '\U000A0000' and c <= '\U000AFFFD' or \ c >= '\U000B0000' and c <= '\U000BFFFD' or \ c >= '\U000C0000' and c <= '\U000CFFFD' or \ c >= '\U000D0000' and c <= '\U000DFFFD' or \ c >= '\U000E1000' and c <= '\U000EFFFD' def _is_iprivate(c): return c >= '\U0000E000' and c <= '\U0000F8FF' or \ c >= '\U000F0000' and c <= '\U000FFFFD' or \ c >= '\U00100000' and c <= '\U0010FFFD' Hartke Expires April 25, 2019 [Page 11] Internet-Draft Constrained IRIs October 2018 4.3. CoAP Encoding The following Python 3.6 code shows how to construct CoAP options from an absolute sequence of options. For simplicity, the code does not omit CoAP options with their default value in a CoAP request. def coap(href, to_proxy=False): if not is_absolute(href): return None result = b'' previous = 0 for option, value in href: if option == Option.SCHEME: pass elif option == Option.HOST_NAME: opt = 3 # Uri-Host val = value.encode('utf-8') result += _encode_coap_option(opt - previous, val) previous = opt elif option == Option.HOST_IP: opt = 3 # Uri-Host if len(value) == 4: val = '.'.join(str(c) for c in value).encode('utf-8') elif len(value) == 16: val = b'[' + ... + b']' # see RFC 5952 result += _encode_coap_option(opt - previous, val) previous = opt elif option == Option.PORT: opt = 7 # Uri-Port val = value.to_bytes((value.bit_length() + 7) // 8, 'big') result += _encode_coap_option(opt - previous, val) previous = opt elif option == Option.PATH: opt = 11 # Uri-Path val = value.encode('utf-8') result += _encode_coap_option(opt - previous, val) previous = opt elif option == Option.QUERY: opt = 15 # Uri-Query val = value.encode('utf-8') result += _encode_coap_option(opt - previous, val) previous = opt elif option == Option.FRAGMENT: pass if to_proxy: (option, value) = href[0] Hartke Expires April 25, 2019 [Page 12] Internet-Draft Constrained IRIs October 2018 opt = 39 # Proxy-Scheme val = value.encode('utf-8') result += _encode_coap_option(opt - previous, val) previous = opt return result def _encode_coap_option(delta, value): length = len(value) delta_nibble = _encode_coap_option_nibble(delta) length_nibble = _encode_coap_option_nibble(length) result = bytes([delta_nibble << 4 | length_nibble]) if delta_nibble == 13: delta -= 13 result += bytes([delta]) elif delta_nibble == 14: delta -= 256 + 13 result += bytes([delta >> 8, delta & 255]) if length_nibble == 13: length -= 13 result += bytes([length]) elif length_nibble == 14: length -= 256 + 13 result += bytes([length >> 8, length & 255]) result += value return result def _encode_coap_option_nibble(n): if n < 13: return n elif n < 256 + 13: return 13 elif n < 65536 + 256 + 13: return 14 5. Security Considerations Parsers of Constrained IRI References must operate on input that is assumed to be untrusted. This means that parsers MUST fail gracefully in the face of malicious inputs. Additionally, parsers MUST be prepared to deal with resource exhaustion (e.g., resulting from the allocation of big data items) or exhaustion of the call stack (stack overflow). See Section 8 of RFC 7049 [RFC7049] for security considerations relating to parsing CBOR. Hartke Expires April 25, 2019 [Page 13] Internet-Draft Constrained IRIs October 2018 The security considerations discussed in Section 7 of RFC 3986 [RFC3986] and Section 8 of RFC 3987 [RFC3987] also apply to Constrained IRI References. 6. IANA Considerations This document has no IANA actions. 7. References 7.1. Normative References [I-D.ietf-cbor-cddl] Birkholz, H., Vigano, C., and C. Bormann, "Concise data definition language (CDDL): a notational convention to express CBOR and JSON data structures", draft-ietf-cbor- cddl-05 (work in progress), August 2018. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, DOI 10.17487/RFC3986, January 2005, . [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource Identifiers (IRIs)", RFC 3987, DOI 10.17487/RFC3987, January 2005, . [RFC7049] Bormann, C. and P. Hoffman, "Concise Binary Object Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049, October 2013, . [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, . 7.2. Informative References [RFC5952] Kawamura, S. and M. Kawashima, "A Recommendation for IPv6 Address Text Representation", RFC 5952, DOI 10.17487/RFC5952, August 2010, . Hartke Expires April 25, 2019 [Page 14] Internet-Draft Constrained IRIs October 2018 [RFC6690] Shelby, Z., "Constrained RESTful Environments (CoRE) Link Format", RFC 6690, DOI 10.17487/RFC6690, August 2012, . [RFC7228] Bormann, C., Ersue, M., and A. Keranen, "Terminology for Constrained-Node Networks", RFC 7228, DOI 10.17487/RFC7228, May 2014, . [RFC7230] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing", RFC 7230, DOI 10.17487/RFC7230, June 2014, . [RFC7252] Shelby, Z., Hartke, K., and C. Bormann, "The Constrained Application Protocol (CoAP)", RFC 7252, DOI 10.17487/RFC7252, June 2014, . [W3C.REC-html52-20171214] Faulkner, S., Eicholz, A., Leithead, T., Danilo, A., and S. Moon, "HTML 5.2", World Wide Web Consortium Recommendation REC-html52-20171214, December 2017, . Acknowledgements Thanks to Christian Amsuess and Ari Keranen for helpful comments and discussions that have shaped the document. Author's Address Klaus Hartke Ericsson Torshamnsgatan 23 Stockholm SE-16483 Sweden Email: klaus.hartke@ericsson.com Hartke Expires April 25, 2019 [Page 15]