Network Working Group H. Levkowetz Internet-Draft Elf Tools AB Intended status: Informational September 18, 2018 Expires: March 22, 2019 Implementation notes for RFC 7991, "The 'xml2rfc' Version 3 Vocabulary" draft-levkowetz-xml2rfc-v3-implementation-notes-01 Abstract This memo documents issues and observations found while implementing RFC 7991. Individual notes are organised into separate sections, depending on their character. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on March 22, 2019. Copyright Notice Copyright (c) 2018 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Levkowetz Expires March 22, 2019 [Page 1] Internet-Draft RFC7991 Implementation Notes September 2018 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Fitness for Purpose . . . . . . . . . . . . . . . . . . . . . 4 2.1. Degraded Table of Contents . . . . . . . . . . . . . . . 4 2.2. Justification of Tables and Artwork . . . . . . . . . . . 4 2.3. RFC Publication Date Policy . . . . . . . . . . . . . . . 5 3. Issues with the Schema . . . . . . . . . . . . . . . . . . . 5 3.1. RFC 7991 . . . . . . . . . . . . . . . . . . . . . . . . 5 3.1.1. In Section 2.5.5, "name" Attribute . . . . . . . . . 5 3.1.2. In Section 2.12,
. . . . . . . . . . . . . . . . 5 3.1.3. In Section 2.20,
. . . . . . . . . . . . . . . . 6 3.1.4. New Section 2.20.4, "indent" Attribute . . . . . . . 6 3.1.5. In Section 2.29,
  • . . . . . . . . . . . . . . . . 7 3.1.6. In Section 2.32, . . . . . . . . . . . . . . . 7 3.1.7. In Section 2.42, . . . . . . . . . . . . 7 3.1.8. In Section 2.45.1, "category" Attribute . . . . . . . 8 3.1.9. In Section 2.45.7, "number" Attribute . . . . . . . . 8 3.1.10. In Section 2.53.3 and 2.53.4. . . . . . . . . . . . . 8 3.1.11. In Section 2.63.2,
      "empty" attribute . . . . . . 9 3.1.12. In Section 3.4.2, "hangIndent" Attribute . . . . . . 9 3.1.13. In Appendix C. Relax NG schema . . . . . . . . . . . 9 3.1.14. Use of the term 'counter'. . . . . . . . . . . . . . 10 3.2. RFC 7998 . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2.1. In Section 5.2.6, Attribute Default Value Insertion . 10 3.2.2. In Section 5.4.2.1, Compare "submissionType" and "stream". . . . . . . . . . . . . . 10 3.2.3. In Section 5.4.6, "pn" Numbering. . . . . . . . . . . 10 4. Non-Schema Issues . . . . . . . . . . . . . . . . . . . . . . 11 4.1. RFC 7991 . . . . . . . . . . . . . . . . . . . . . . . . 11 4.1.1. In Section 2.17, . . . . . . . . . . . . . . . 11 4.1.2. In Section 2.47, . . . . . . . . . . . . 12 4.1.3. In Appendix A.1.1: TLP switch-over date discrepancies 12 4.1.4. Index . . . . . . . . . . . . . . . . . . . . . . . . 12 4.1.5. Anchors . . . . . . . . . . . . . . . . . . . . . . . 13 4.1.6. In Section 2.5.7, "type" Attribute . . . . . . . . . 13 4.2. RFC 7992 . . . . . . . . . . . . . . . . . . . . . . . . 13 4.2.1. In Section 8.1.1, Index Contents . . . . . . . . . . 14 4.3. RFC 7994 . . . . . . . . . . . . . . . . . . . . . . . . 14 4.3.1. Additional Guidance . . . . . . . . . . . . . . . . . 14 4.4. RFC 7998 . . . . . . . . . . . . . . . . . . . . . . . . 14 4.4.1. In Section 5.2.3, Insertion . . . . . . . . . 14 4.4.2. In Section 5.2.4, "prepTime" Insertion . . . . . . . 15 4.4.3. In Section 5.2.6, Attribute Default Value Insertion . 15 4.4.4. In Section 5.2.7, "toc" Attribute . . . . . . . . . . 15 4.4.5. In Section 5.2.8, "removeInRFC" Warning Paragraph . . 15 4.4.6. In Section 5.3.1, "month" Attribute . . . . . . . . . 16 4.4.7. In Section 5.3.2, ASCII Attribute Processing . . . . 16 Levkowetz Expires March 22, 2019 [Page 2] Internet-Draft RFC7991 Implementation Notes September 2018 4.4.8. New Section: "keepWithNext" Normalisation . . . . . . 16 4.4.9. In Section 5.4.2, Insertion . . . . . . 16 4.4.10. In Section 5.4.2.1, Compare submissionType and "stream". . . . . . . . . . . . . . . . 17 4.4.11. In Section 5.4.2.2, "Status of this Memo" Insertion . 17 4.4.12. In Section 5.4.3, "target" Insertion . . 18 4.4.13. In Section 5.4.4, Slugification . . . . . . . 18 4.4.14. In Section 5.4.6, "pn" Numbering. . . . . . . . . . . 18 4.4.15. In Section 5.4.7, Numbering . . . . . . . . . 19 4.4.16. In Section 5.4.8.2, "derivedContent" Insertion (without Content) . . . . . . . . . . . . . . . . . . 20 4.4.17. In Section 5.5.1, Processing . . . . . . . 20 4.4.18. In Section 5.5.2, Processing . . . . . . 20 4.4.19. In Section 5.4.8.2, "derivedContent" Insertion. . . . 20 4.4.20. In Section 5.4.9, Processing . . . . . . . . 21 4.4.21. In Section 5.6.3, Processing . . . . . . . . 21 4.4.22. New Section for Index . . . . . . . . . . . . . . . . 21 5. Security Considerations . . . . . . . . . . . . . . . . . . . 22 6. Informative References . . . . . . . . . . . . . . . . . . . 22 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 23 1. Introduction Implementation of tool support for [RFC7991] and related specifications has been done during 2017 and 2018, split in the following individual parts, all implemented as individual modes of the python-based xml2rfc processor [XML2RFC]: * An XML converter from vocabulary version 2 [RFC7749] to version 3 [RFC7991] * A Normalisation processor, "PrepTool", [RFC7997] * An XML to plain text converter [RFC7994] for the version 3 vocabulary * An XML to html converter [RFC7992] for the version 3 vocabulary (pending as of 08 Jul 2018) * A HTML to PDF converter [RFC7995] for the version 3 vocabulary (pending as of 08 Jul 2018) During the implementation work, a number of issues with the specification has been found (this was expected at the outset by all parties) and a number of observations has been made about limitations of the specification and vocabulary version 3 schema, and also limitations in the specification of the work to be done. Levkowetz Expires March 22, 2019 [Page 3] Internet-Draft RFC7991 Implementation Notes September 2018 The purpose of this memo is to collect those issues and observations in one place. 2. Fitness for Purpose The introduction to [RFC7991] states: "This document defines the "xml2rfc" version 3 vocabulary: an XML- based language used for writing RFCs and Internet-Drafts. It is heavily derived from the version 2 vocabulary that is also under discussion. This document obsoletes the v2 grammar described in RFC 7749." However, an unstated assumption seems to have been that the new tools and formatters would be used primarily to produce HTML output, in order to transition to publication of renderings of RFCs in more modern formats than plain-text ASCII. This is a reasonable and worthwhile goal, but as a result, the schema as specified in [RFC7991] has some drawbacks compared with the version 2 vocabulary when used to produce Internet-Drafts in the text format common within the IETF (Internet Engineering Task Force) at this time. 2.1. Degraded Table of Contents Lack of pagination has little impact on direct online readability, but when comparing the output of the new text formatter with the old one, one aspect leaps out: Since there is no pagination, the table of contents simply lists the section headers to a certain depth, without any accompanying page numbers. This makes a surprising difference in how useful the table of contents is in getting an initial feel for the document. The at-a-glance information which lets a reader know if this is a document of 10 pages or 100 is simply lacking. Recommendation: Add support for pagination in a future version of the text formatter. 2.2. Justification of Tables and Artwork The version 3 schema deprecates the previously available 'align' attribute for artwork and tables, and the PrepTool will remove these attributes if used. This makes a previous feature that was appreciated by some authors unavailable. In the text formatter, the effect is simply to make all tables and artwork left-aligned, which may not be the most readable and polished output, but for the HTML formatter it also potentially removes the option of letting text flow around smaller artwork and tables in a controlled way. Levkowetz Expires March 22, 2019 [Page 4] Internet-Draft RFC7991 Implementation Notes September 2018 Recommendation: Make the 'align' attribute for artwork and tables available again. (The current text formatter code already has support for the 'align' attribute for these elements; but since the attribute is stripped away by the PrepTool, the code is never invoked.) 2.3. RFC Publication Date Policy The specification [RFC7998] says that an error should be generated if a specification is found with missing elements; but the RFC Editor publishes documents (except for April 1st RFCs) with only year and month, no day of month. The specification disallows this, and in effect makes it impossible for the RFC Editor to publish documents according to the current policy regarding publication date format. Recommendation: Revert to to the old behaviour, where the tool in RFC mode would issue a date with or without day depending on whether the element had a day attribute or not. 3. Issues with the Schema 3.1. RFC 7991 3.1.1. In Section 2.5.5, "name" Attribute "A filename suitable for the contents (such as for extraction to a local file)." Given the existing use of "name" on seriesInfo, this attribute name has a semantic dissonance. Recommendation: Deprecate "name" for use on and , and instead use "file", which for will be explicitly rendered, as established as best current practice for YANG modules (see for instance RFC 6087 [RFC6087]) 3.1.2. In Section 2.12,
      A number of elements permits a mixed content model (see Section "Mixed Content Model"):
    • ,
      ,
      , , and . However, when using the simpler of the two content schemas, two of them ( and ) permit inline line breaks through the use of
      elements; the others do not. This seems terribly arbitrary. Recommendation: Remove the
      element completely. Alternatively, permit it to be used all places that 'text' and non-block elements may be used (that is, in inline context). The current v3 text Levkowetz Expires March 22, 2019 [Page 5] Internet-Draft RFC7991 Implementation Notes September 2018 renderer implementation renders
      as a newline in all inline contexts. 3.1.3. In Section 2.20,
      The current specification says: "The "hanging" attribute defines whether or not the term appears on the same line as the definition. hanging="true" indicates that the term is to the left of the definition, while hanging="false" indicates that the term will be on a separate line." This does not match established typographic terminology. In typographic terminology, "hanging indent" describes the case where the indentation of the second and subsequent lines of a paragraph is greater than the indentation of the first line. Whether the definition in a definition list starts on the first line or not has nothing to do with the presence of hanging indent; our definition lists will _always_ have hanging indent. The 'hanging' attribute also describes something different from what the term has been used to describe in the version 2 vocabulary. This will be confusing to users. A more descriptive name for the attribute we're talking about would be 'start-definition-on-first-line', but that's unwieldy. Maybe 'newline="false"' to start the definition on the first line, or something like 'definition-start="first"'? Recommendation: Change this to a different term that is more descriptive and does not use typographically incorrect terminology. 3.1.4. New Section 2.20.4, "indent" Attribute The deprecation of the "hangIndent" attribute on leaves no opportunity to control the size of the hanging indent. In some definition lists, it is desirable to have a wide indentation, in order to clearly show the terms, in other cases it is more important to allow for a larger text volume than the width of the terms would allow. Recommendation: Add an "indent" attribute on
      to control the size of the hanging indent. Levkowetz Expires March 22, 2019 [Page 6] Internet-Draft RFC7991 Implementation Notes September 2018 3.1.5. In Section 2.29,
    • 3.1.5.1. Unordered lists with arbitrary symbols When
    • is used with
        , the rendering is under- specified (the specification say 'no label will be shown", but doesn't say whether list indentation (leading white-space) should be eliminated or not. If the intention is to make it possible to render unordered lists with arbitrary symbols, chosen on a per-list-item basis, the current attributes of
      • are insufficient to indent and line-wrap list items properly with
          . It is not possible, for instance, to use
            lists to generate XML for a table of content, since if the with of the bullet (the section number, in this case) is unknown, the proper indentation and line wrapping cannot be determined. Recommendation: Add an explicit "bullet" attribute to support this use case. 3.1.5.2. Mixed Content Model The mixed content model for
          • --- either text and inline elements like sub, sup, bcp14, _or_ ,
              ,
              etc, is non-intuitive and may be hard for users to keep straight. Recommendation: Consider simplifying the schema by requiring that text and inline elements always are placed within a element. This would apply also to other elements that today have alternative content models:
              ,
              , , and . 3.1.6. In Section 2.32, So the element can contain text or , and can contain other markup like and etc., but why cannot contain etc. directly? 3.1.7. In Section 2.42, The v3 schema cannot properly model multiple reference subsections contained within one numbered section. The v2 formatter handled this by silently inserting a containing section, but with the introduction of the preptool, which in theory should produce a master file from which various formatters would produce equivalent results, this becomes troublesome, as the automatic insertion of a container Levkowetz Expires March 22, 2019 [Page 7] Internet-Draft RFC7991 Implementation Notes September 2018 section is specified for the html formatter, in section 9.8. of RFC 7992, but not for the text formatter. It would be much better to make the prepped xml explicitly show exactly what should be rendered, and not rely on formatters silently insert elements. Recommendation: Update the schema to make it possible for to contain , and have the prepped xml explicitly show both the encapsulating section and the subsections. The current preptool implementation does this. 3.1.8. In Section 2.45.1, "category" Attribute Changing the "category" attribute of to a name value in an additional makes it much harder than it needs to be to look it up. It also makes the semantics of less clear. Recommendation: Remove this, and keep the "category" attribute on 3.1.9. In Section 2.45.7, "number" Attribute The RFC number attribute in the element is used as a switch to control whether an RFC or an Internet-Draft is produced. Moving what is effectively an important controlling switch for the operation of the formatters from the main element down into what is arguably an obscure combination of attribute values on a element several levels down from the main element feels wrong. Recommendation: Don't deprecate the number attribute on , but require that the preptool checks that the number attribute matches what's in the set. Explicitly mention that the presence of the number attribute on causes the generation of an RFC rather than an Internet-Draft by the formatters. 3.1.10. In Section 2.53.3 and 2.53.4. 3.1.10.1. Unnecessary limitation on where the "keepWithNext" attribute can be used Why keepWithNext only on ? It would be very natural to expect to be able to say keepWithNext for 2 tables, or 2 figures, or 2 lists? Recommendation: Permit keepWithNext on all elements that can be siblings to . Levkowetz Expires March 22, 2019 [Page 8] Internet-Draft RFC7991 Implementation Notes September 2018 3.1.10.2. Violation of KISS and DRY principles keepWithNext on one element is equivalent with keepWithPrevious on the following element, provided the following element can have a keepWithPrevious attribute. Providing both violates both KISS and DRY. Recommendation: Keep only one of these two attributes, preferably keepWithNext. 3.1.11. In Section 2.63.2,
                "empty" attribute In v2, this results in a list using space as the bullet, thus each list entry is indented as with other bullet symbols. However, this leaves no way to get list entries with arbitrary text that are not indented, in order to produce lists such as that used in Table of Content and Index. Furthermore, the specification does not indicate if
                  should be rendered with space as a bullet, or without any bullet and indentation. A clarification would be good. The current implementation introduces a new attribute "bare" with the possible values "false" | "true" to signal this. The default is "true" (which differs from the default v2 implementation). Using the extra attribute "bare" works, but is maybe clumsier than necessary. 3.1.12. In Section 3.4.2, "hangIndent" Attribute "Deprecated. Use
                  instead." This causes capability loss. The "hangIndent" attribute not only signalled that hanging indent should be used, but also gave the size of the indent. No equivalent control has been provided for the
                  element in the version 3 vocabulary. 3.1.13. In Appendix C. Relax NG schema The "colspan" attribute is given a default value of "0", this should be "1". "0" is not otherwise defined in the text, and the only reasonable interpretation would be to hide the cell (make it occupy zero columns). The "rowspan" attribute is given a default value of "0", this should be "1". "0" is not otherwise defined in the text, and the only reasonable interpretation would be to hide the cell (make it occupy zero rows). Levkowetz Expires March 22, 2019 [Page 9] Internet-Draft RFC7991 Implementation Notes September 2018 3.1.14. Use of the term 'counter'. The classical meaning of this term is a a monotonically increasing sequence of integers, globally unique or unique within a context. In this document, it is instead meant to indicate section, table, figure numbers, which for sections are not plain counters. To make more interesting, in other contexts in the document, the notation "-nnn", which also would normally indicate a dash followed by digits, i.e., a counter, is also re-interpreted to include section numbers; strings of numbers including embedded period signs. This is bad terminology. Recommendation: Instead of "counter", use "number" as the attribute value, and explicitly say "Section number, Figure number, Table number or ordered list labels" in the description. Use "-n.n" instead of "-nnn". 3.2. RFC 7998 3.2.1. In Section 5.2.6, Attribute Default Value Insertion The "stream" attribute has a default value of "IETF". The effect of setting default values after the XInclude processing is to set stream="IETF" on all reference which don't have a stream set. This is probably not right. The current implementation removes the default value for the "stream" attribute from the schema. 3.2.2. In Section 5.4.2.1, Compare "submissionType" and "stream". It doesn't seem like a good fit to have tag attributes that all have to be set to the same value. This is not DRY, and unnecessarily introduces the possibility of conflict, as a result of multiple elements being permitted (Relevant to the v3 schema, not the preptool). 3.2.3. In Section 5.4.6, "pn" Numbering. The list of elements that are given p- or paragraph tags is severely limited, and since the presence of a pn= attribute is required in order to make internal instances work, this limits the elements to which it is possible to reference with html fragment identifiers. Why? Why is
                  and
                • present, but not
                    ,
                    ,
                      ? Levkowetz Expires March 22, 2019 [Page 10] Internet-Draft RFC7991 Implementation Notes September 2018 The current implementation adds p- numbering to ,
                      ,
                      ,
                        ,
                          , which all are allowed to have pn= attributes according to the schema. 4. Non-Schema Issues 4.1. RFC 7991 4.1.1. In Section 2.17, 4.1.1.1. Current Date Requirement "When the prep tool is used to create Internet-Drafts, it will reject a submitted Internet-Draft that has a element in the boilerplate for itself that is anything other than today." It is not up to the format definition to set policy for acceptance or rejection of draft submissions. The matter is more complex than the text assumes, see for instance datatracker issue #2422. In addition to being inappropriate, this text also quietly changes policy from +/- 3 days to +/- 0 days, without saying that it updates RFC 4228 [RFC4228], which is the current specification of permissible dates in draft submissions. Finally, enforcing this would cause _a lot_ of grief and problems. This specification item has been ignored in the implementation. 4.1.1.2. Date Specification in References "Bibliographic references: In dates in elements, the date information can have prose text for the month or year. For example, vague dates (year="ca. 2000"), date ranges (year="2012-2013"), non-specific months (month="Second quarter"), and so on are allowed." The text regarding prose text for month and year in bibliographic references is not workable. How should month and year be combined? Some bibliographic references may have date text which requires year first, others year last, and so on. Mixing the described fuzziness into the otherwise strict year, month, date format makes little sense when the result of combining the year, month and date attributes cannot be predictably and correctly rendered. Recommendation: Instead of the current specification, permit either that the element may have text content, or an alternative attribute to be used for rendering if year, month, or day cannot be specified exactly. Levkowetz Expires March 22, 2019 [Page 11] Internet-Draft RFC7991 Implementation Notes September 2018 4.1.2. In Section 2.47, The possible and forbidden combinations of attributes for this element has now become so convoluted that it's really hard to understand how to use it correctly. This needs a serious reconsideration. The 'name' attribute is mandatory, and only 3 values are permitted: "RFC", "Interned-Draft", and "DOI". But it is also mandatory to set the name to "" for a with a status attribute. Hmm... So there are 4, not 3 permitted values: "RFC", "Internet-Draft", "DOI", and "". This means that all reference files which has things like name="ISO", name="W3C Recommendation", etc., etc., have become illegal. This limitation on "name" attributes has not been enforced in the current implementation. 4.1.3. In Appendix A.1.1: TLP switch-over date discrepancies There are discrepancies between the specified switch-over dates in the specification, and those given by the Trust statements: * TLP3.0: The specification says 2009-11-01 but the TLP statement says effective date 2009-09-12. * TLP4.0: The specification says 2010-04-01 but the TLP statement says effective date 2009-12-28. The dates on which TLP 4 started to be use in published RFCs seems to match the stated effective date of 2009-12-28, based on a scan of some RFCs around that date. The current implementation uses the official dates in the preptool, not the dates in RFC 7991. RFC 7991 also states this about the pre5378 text: this text appears under "Copyright Notice", unless the document was published before November 2009, in which case it appears under "Status of This Memo". This does not agree at all with what actual RFCs contain; they seem to consistently have this text under Copyright Notice. 4.1.4. Index There is no guidance on the structure of an index, if one is to be generated by the preptool. Levkowetz Expires March 22, 2019 [Page 12] Internet-Draft RFC7991 Implementation Notes September 2018 4.1.5. Anchors Section 5.1 of RFC 7992 says in part: "The prep tool produces XML with anchor attributes in all elements that need them." This is rather vital information regarding the content of the prepped xml when building a formatter, unfortunately it is not mentioned in RFC 7991. 4.1.6. In Section 2.5.7, "type" Attribute 4.1.6.1. How should a 'src' attribute be handled when no 'type' is given. The v3 schema does not require the 'type' attribute on to have a value, which makes sense when there's no 'src' attribute to include. But if there is a 'src' attribute, but no value for 'type', how should the 'src' value be handled? The easiest and most explicit handling would be to require a 'type' value if there is a 'src' attribute; a more doubtful alternative would be to use something like the Linux file magic command to try to guess at the content type that 'src' points at. Recommendation: Warn if there is a 'src' and no 'type' value, and ignore the 'src' in that case. 4.1.6.2. Missing information on how to handle various types "The RFC Series Editor will maintain a complete list of the preferred values on the RFC Editor web site, and that list is expected to be updated over time. Thus, a consumer of v3 XML should not cause a failure when it encounters an unexpected type or no type is specified. The table will also indicate which type of art can appear in plain-text output (for example, type="svg" cannot)." The RFC Series Editor has not yet provided such a table. It is definitely desired, in order to be able to deal correctly with plain- text output. 4.2. RFC 7992 Levkowetz Expires March 22, 2019 [Page 13] Internet-Draft RFC7991 Implementation Notes September 2018 4.2.1. In Section 8.1.1, Index Contents The index has an extra
                          enclosing the contents, starting directly after

                          , while sections explicitly does not have a div here. This irregularity seems quite unnecessary, but makes the formatter code more complex than need be. Could we please align the two? 4.3. RFC 7994 4.3.1. Additional Guidance *