json N. Williams Internet-Draft Cryptonector Intended status: Standards Track May 9, 2014 Expires: November 10, 2014 JavaScript Object Notation (JSON) Text Sequences draft-ietf-json-text-sequence-02 Abstract This document describes the JSON text sequence format and associated media type. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on November 10, 2014. Copyright Notice Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Williams Expires November 10, 2014 [Page 1] Internet-Draft JSON Text Sequences May 2014 Table of Contents 1. Introduction and Motivation . . . . . . . . . . . . . . . . 3 1.1. Conventions used in this document . . . . . . . . . . . . . 3 2. JSON Text Sequence Format . . . . . . . . . . . . . . . . . 4 3. Use for Logfiles, or How to Resynchronize Following Truncated entries . . . . . . . . . . . . . . . . . . . . . 5 4. Security Considerations . . . . . . . . . . . . . . . . . . 6 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . 7 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 8 7. Normative References . . . . . . . . . . . . . . . . . . . . 9 Author's Address . . . . . . . . . . . . . . . . . . . . . . 10 Williams Expires November 10, 2014 [Page 2] Internet-Draft JSON Text Sequences May 2014 1. Introduction and Motivation The JavaScript Object Notation (JSON) [RFC7159] is a very handy serialization format. However, when serializing a large sequence of values as an array, or a possibly indeterminate-length or never- ending sequence of values, JSON becomes difficult to work with. Consider a sequence of one million values, each possibly 1 kilobyte when encoded, which would be roughly one gigabyte. If processing such a dataset requires first parsing it entirely, then the result is very inefficient and the processing will be limited by virtual memory. "Online" (a.k.a., "streaming") parsers help, but they are neither widely available or widely used, nor are they easy to use. Ideally such datasets could be parsed and processed one element at a time. Even if each element must be parsed in a not-online manner due to local choice of parser, the result will usually be sufficiently online: limited by the size of the biggest element in the sequence rather than by the size of the sequence. This document describes the concept and format of "JSON text sequences", which are specifically not JSON texts themselves but are composed of JSON texts. 1.1. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. Williams Expires November 10, 2014 [Page 3] Internet-Draft JSON Text Sequences May 2014 2. JSON Text Sequence Format The ABNF [RFC5234] for the JSON text sequence format is as follows: JSON-sequence = *ws1 *(JSON-text *ws2 %x0A *ws1) ws1 = %x20 / %x09 / %x0A / %x0D ws2 = %x20 / %x09 / %x0D JSON-text = Figure 1: JSON text sequence ABNF A JSON text sequence is a sequence of zero or more JSON texts, each surrounded by any number of JSON whitespace characters and always followed by a newline. Requirements: o JSON text sequence encoders MUST emit a newline after any JSON text. An input of 'truefalse' is not a valid sequence of two JSON values, true and false! Neither is 'true0' a valid sequence of true and zero. Some existing JSON parsers that might be used to construct sequence parsers might in fact accept such sequences, resulting in erroneous parsing of sequences of two or more numbers. E.g., a sequence of two numbers, 4 and 2, encoded without the required whitespace between them would parse incorrectly as the number 42. This ambiguity is resolved by requiring that encoders emit a whitespace separator (specifically: a newline) after each text. Williams Expires November 10, 2014 [Page 4] Internet-Draft JSON Text Sequences May 2014 3. Use for Logfiles, or How to Resynchronize Following Truncated entries The JSON Text Sequence format is useful for logfiles, as those are generally (and atomically) appended to on an ongoing basis. I.e., logfiles are of indeterminate length, at least right up until they closed. A problem comes up with this use case: it is difficult to guarantee that append writes will complete. Therefore it's possible (if unlikely) to end up with truncated log entries -which may fail to parse as JSON texts- followed by other entries. The mechanics of such failures are not explained here (but consider power failures). Fortunately, as long as all texts in the logfile sequence are followed by a newline, it is possible to detect a subsequent entry written after an entry that fails to parse. Figure 2 shows an ABNF rule for detecting the boundary between a non-truncated [and some truncated] JSON text and the next JSON text in a sequence. boundary = endchar *ws2 %0xA *ws1 startchar endchar = ( "}" / "]" / %x22 / "e" / "l" / DIGIT ) startchar = ( "{" / "[" / %x22 / "t" / "f" / "n" / "-" / DIGIT ) Figure 2: ABNF for resynchronization To resynchronize after failing to parse a JSON text, simply search for a boundary as described in figure 2. A boundary found this way might be the boundary between the truncated entry and the subsequent entry, or it might be a subsequent boundary. Scanning backwards may for boundaries will not work reliably unless JSON texts written to logfiles are stripped of internal newlines! Williams Expires November 10, 2014 [Page 5] Internet-Draft JSON Text Sequences May 2014 4. Security Considerations All the security considerations of JSON [RFC7159] apply. There is no end of sequence indicator. This means that "end of file", "end of transmission", and so on, can be indistinguishable from a logical end of sequence. Applications where this matters should denote end of sequence by convention (e.g., Content-Length in HTTP). JSON text sequence parsers based on non-incremental, non-online JSON text parsers will not be able to efficiently parser JSON texts in which newlines appear; attempting to parse such sequences with non- incremental, non-online JSON text parsers creates a compute resource exhaustion vulnerability. The first requirement given in Section 2 (otherwise-ambiguous JSON texts must be separated by whitespace) is critical and must be adhered to. It is best to always emit a whitespace separator after every JSON text emitted. The resynchronization heuristic for logfiles is imperfect and might skip a valid entry following a truncated one. Purposefully appending a truncated (or invalid) JSON text to a JSON text sequence logfile can cause the subsequent entry to be invisible. Logfile writers SHOULD validate (parse) any untrusted JSON text inputs and SHOULD remove internal newlines from them, thus enabling reliable backwards scanning for sequence element boundaries. Williams Expires November 10, 2014 [Page 6] Internet-Draft JSON Text Sequences May 2014 5. IANA Considerations The MIME media type for JSON text sequences is application/json-seq. Type name: application Subtype name: json-seq Required parameters: n/a Optional parameters: n/a Encoding considerations: binary Security considerations: See , Section 4. Interoperability considerations: Described herein. Published specification: . Applications that use this media type: JSON text sequences have been used in applications written with the jq programming language. Williams Expires November 10, 2014 [Page 7] Internet-Draft JSON Text Sequences May 2014 6. Acknowledgements Phillip Hallam-Baker proposed the use of JSON text sequences for logfiles and pointed out the need for resynchronization. James Manger contributed the ABNF for resynchronization. Williams Expires November 10, 2014 [Page 8] Internet-Draft JSON Text Sequences May 2014 7. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", STD 68, RFC 5234, January 2008. [RFC7159] Bray, T., "The JavaScript Object Notation (JSON) Data Interchange Format", RFC 7159, March 2014. Williams Expires November 10, 2014 [Page 9] Internet-Draft JSON Text Sequences May 2014 Author's Address Nicolas Williams Cryptonector, LLC Email: nico@cryptonector.com Williams Expires November 10, 2014 [Page 10]