Network Working Group H. Levkowetz
Internet-Draft Elf Tools AB
Intended status: Informational July 16, 2018
Expires: January 17, 2019
Implementation notes for RFC 7991, "The 'xml2rfc' Version 3 Vocabulary"
draft-levkowetz-xml2rfc-v3-implementation-notes-00
Abstract
This memo documents issues and observations found while implementing
RFC 7991. Individual notes are organised into separate sections,
depending on their characters.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 17, 2019.
Copyright Notice
Copyright (c) 2018 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Levkowetz Expires January 17, 2019 [Page 1]
Internet-Draft RFC7991 Implementation Notes July 2018
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Fitness for Purpose . . . . . . . . . . . . . . . . . . . . . 4
2.1. Degraded Table of Contents . . . . . . . . . . . . . . . 4
2.2. Justification of Tables and Artwork . . . . . . . . . . . 4
2.3. RFC Publication Date Policy . . . . . . . . . . . . . . . 5
3. Issues with the Schema . . . . . . . . . . . . . . . . . . . 5
3.1. RFC 7991 . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1.1. In Section 2.5.5, "name" Attribute . . . . . . . . . 5
3.1.2. In Section 2.20,
"empty" attribute . . . . . . 8
3.1.10. In Section 3.4.2, "hangIndent" Attribute . . . . . . 8
3.1.11. In Appendix C. Relax NG schema . . . . . . . . . . . 8
3.1.12. Use of the term 'counter'. . . . . . . . . . . . . . 9
3.2. RFC 7998 . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2.1. In Section 5.2.6, Attribute Default Value Insertion . 9
3.2.2. In Section 5.4.2.1, Compare "submissionType"
and "stream". . . . . . . . . . . . . . 9
3.2.3. In Section 5.4.6, "pn" Numbering. . . . . . . . . . . 10
4. Non-Schema Issues . . . . . . . . . . . . . . . . . . . . . . 10
4.1. RFC 7991 . . . . . . . . . . . . . . . . . . . . . . . . 10
4.1.1. In Section 2.17, . . . . . . . . . . . . . . . 10
4.1.2. In Section 2.47, . . . . . . . . . . . . 11
4.1.3. In Appendix A.1.1: TLP switch-over date discrepancies 11
4.1.4. Index . . . . . . . . . . . . . . . . . . . . . . . . 12
4.1.5. Anchors . . . . . . . . . . . . . . . . . . . . . . . 12
4.2. RFC 7992 . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2.1. In Section 8.1.1, Index Contents . . . . . . . . . . 12
4.3. RFC 7994 . . . . . . . . . . . . . . . . . . . . . . . . 12
4.3.1. Additional Guidance . . . . . . . . . . . . . . . . . 12
4.4. RFC 7998 . . . . . . . . . . . . . . . . . . . . . . . . 13
4.4.1. In Section 5.2.3, Insertion . . . . . . . . . 13
4.4.2. In Section 5.2.4, "prepTime" Insertion . . . . . . . 13
4.4.3. In Section 5.2.6, Attribute Default Value Insertion . 13
4.4.4. In Section 5.2.7, "toc" Attribute . . . . . . . . . . 14
4.4.5. In Section 5.2.8, "removeInRFC" Warning Paragraph . . 14
4.4.6. In Section 5.3.1, "month" Attribute . . . . . . . . . 14
4.4.7. In Section 5.3.2, ASCII Attribute Processing . . . . 14
4.4.8. New Section: "keepWithNext" Normalisation . . . . . . 15
4.4.9. In Section 5.4.2, Insertion . . . . . . 15
4.4.10. In Section 5.4.2.1, Compare submissionType and
Levkowetz Expires January 17, 2019 [Page 2]
Internet-Draft RFC7991 Implementation Notes July 2018
"stream". . . . . . . . . . . . . . . . 15
4.4.11. In Section 5.4.2.2, "Status of this Memo" Insertion . 16
4.4.12. In Section 5.4.3, "target" Insertion . . 16
4.4.13. In Section 5.4.4, Slugification . . . . . . . 16
4.4.14. In Section 5.4.6, "pn" Numbering. . . . . . . . . . . 17
4.4.15. In Section 5.4.7, Numbering . . . . . . . . . 18
4.4.16. In Section 5.4.8.2, "derivedContent" Insertion
(without Content) . . . . . . . . . . . . . . . . . . 18
4.4.17. In Section 5.5.1, Processing . . . . . . . 18
4.4.18. In Section 5.5.2, Processing . . . . . . 18
4.4.19. In Section 5.4.8.2, "derivedContent" Insertion. . . . 19
4.4.20. In Section 5.4.9, Processing . . . . . . . . 19
4.4.21. In Section 5.6.3, Processing . . . . . . . . 20
4.4.22. New Section for Index . . . . . . . . . . . . . . . . 20
5. Informative References . . . . . . . . . . . . . . . . . . . 20
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 21
1. Introduction
Implementation of tool support for [RFC7991] and related
specifications has been done during 2017 and 2018, split in the
following individual parts, all implemented as individual modes of
the python-based xml2rfc processor [XML2RFC]:
* An XML converter from vocabulary version 2 [RFC7749] to version 3
[RFC7991]
* A Normalisation processor, "PrepTool", [RFC7997]
* An XML to plain text converter [RFC7994] for the version 3
vocabulary
* An XML to html converter [RFC7992] for the version 3 vocabulary
(pending as of 08 Jul 2018)
* A HTML to PDF converter [RFC7995] for the version 3 vocabulary
(pending as of 08 Jul 2018)
During the implementation work, a number of issues with the
specification has been found (this was expected at the outset by all
parties) and a number of observations has been made about limitations
of the specification and vocabulary version 3 schema, and also
limitations in the specification of the work to be done.
The purpose of this memo is to collect those issues and observations
in one place.
Levkowetz Expires January 17, 2019 [Page 3]
Internet-Draft RFC7991 Implementation Notes July 2018
2. Fitness for Purpose
The introduction to [RFC7991] states:
"This document defines the "xml2rfc" version 3 vocabulary: an XML-
based language used for writing RFCs and Internet-Drafts. It is
heavily derived from the version 2 vocabulary that is also under
discussion. This document obsoletes the v2 grammar described in
RFC 7749."
However, an unstated assumption seems to have been that the new tools
and formatters would be used primarily to produce HTML output, in
order to transition to publication of renderings of RFCs in more
modern formats than plain-text ASCII.
This is a reasonable and worthwhile goal, but as a result, the schema
as specified in [RFC7991] has some drawbacks compared with the
version 2 vocabulary when used to produce Internet-Drafts in the text
format common within the IETF (Internet Engineering Task Force) at
this time.
2.1. Degraded Table of Contents
Lack of pagination has little impact on direct online readability,
but when comparing the output of the new text formatter with the old
one, one aspect leaps out: Since there is no pagination, the table of
contents simply lists the section headers to a certain depth, without
any accompanying page numbers. This makes a surprising difference in
how useful the table of contents is in getting an initial feel for
the document. The at-a-glance information which lets a reader know
if this is a document of 10 pages or 100 is simply lacking.
Recommendation: Add support for pagination in a future version of
the text formatter.
2.2. Justification of Tables and Artwork
The version 3 schema deprecates the previously available 'align'
attribute for artwork and tables, and the PrepTool will remove these
attributes if used. This makes a previous feature that was
appreciated by some authors unavailable. In the text formatter, the
effect is simply to make all tables and artwork left-aligned, which
may not be the most readable and polished output, but for the HTML
formatter it also potentially removes the option of letting text flow
around smaller artwork and tables in a controlled way.
Recommendation: Make the 'align' attribute for artwork and tables
available again. (The current text formatter code already has
Levkowetz Expires January 17, 2019 [Page 4]
Internet-Draft RFC7991 Implementation Notes July 2018
support for the 'align' attribute for these elements; but since
the attribute is stripped away by the PrepTool, the code is never
invoked.)
2.3. RFC Publication Date Policy
The specification [RFC7998] says that an error should be generated if
a specification is found with missing elements; but the RFC
Editor publishes documents (except for April 1st RFCs) with only year
and month, no day of month. The specification disallows this, and in
effect makes it impossible for the RFC Editor to publish documents
according to the current policy regarding publication date format.
Recommendation: Revert to to the old behaviour, where the tool in
RFC mode would issue a date with or without day depending on
whether the element had a day attribute or not.
3. Issues with the Schema
3.1. RFC 7991
3.1.1. In Section 2.5.5, "name" Attribute
"A filename suitable for the contents (such as for extraction to a
local file)."
Given the existing use of "name" on seriesInfo, this attribute name
has a semantic dissonance.
Recommendation: Deprecate "name" for use on and
, and instead use "file", which for will
be explicitly rendered, as established as best current practice
for YANG modules (see for instance RFC 6087 [RFC6087])
3.1.2. In Section 2.20,
The current specification says:
"The "hanging" attribute defines whether or not the term appears
on the same line as the definition. hanging="true" indicates that
the term is to the left of the definition, while hanging="false"
indicates that the term will be on a separate line."
This does not match established typographic terminology. In
typographic terminology, "hanging indent" describes the case where
the indentation of the second and subsequent lines of a paragraph is
greater than the indentation of the first line. Whether the
definition in a definition list starts on the first line or not has
Levkowetz Expires January 17, 2019 [Page 5]
Internet-Draft RFC7991 Implementation Notes July 2018
nothing to do with the presence of hanging indent; our definition
lists will _always_ have hanging indent.
The 'hanging' attribute also describes something different from what
the term has been used to describe in the version 2 vocabulary. This
will be confusing to users.
A more descriptive name for the attribute we're talking about would
be 'start-definition-on-first-line', but that's unwieldy. Maybe
'newline="false"' to start the definition on the first line, or
something like 'definition-start="first"'?
Recommendation: Change this to a different term that is more
descriptive and does not use typographically incorrect
terminology.
3.1.3. New Section 2.20.4, "indent" Attribute
The deprecation of the "hangIndent" attribute on leaves no
opportunity to control the size of the hanging indent. In some
definition lists, it is desirable to have a wide indentation, in
order to clearly show the terms, in other cases it is more important
to allow for a larger text volume than the width of the terms would
allow.
Recommendation: Add an "indent" attribute on
to control the
size of the hanging indent.
3.1.4. In Section 2.29,
3.1.4.1. Unordered lists with arbitrary symbols
When
is used with
, the rendering is under-
specified (the specification say 'no label will be show", but doesn't
say whether list indentation (leading white-space) should be
eliminated or not.
If the intention is to make it possible to render unordered lists
with arbitrary symbols, chosen on a per-list-item basis, the current
attributes of
are insufficient to indent and line-wrap list
items properly with
.
It is not possible, for instance, to use
lists to generate XML
for a table of content, since if the with of the bullet (the section
number, in this case) is unknown, the proper indentation and line
wrapping cannot be determined.
Levkowetz Expires January 17, 2019 [Page 6]
Internet-Draft RFC7991 Implementation Notes July 2018
Recommendation: Add an explicit "bullet" attribute to support this
use case.
3.1.4.2. Mixed Content Model
The mixed content model for
--- either text and inline elements
like sub, sup, bcp14, _or_ ,
, etc, is non-intuitive
and may be hard for users to keep straight.
Recommendation: Consider simplifying the schema by requiring that
text and inline elements always are placed within a element.
This would apply also to other elements that today have alternative
content models:
,
,
, and
.
3.1.5. In Section 2.32,
So the element can contain text or , and can contain
other markup like and etc., but why cannot contain
etc. directly?
3.1.6. In Section 2.42,
The v3 schema cannot properly model multiple reference subsections
contained within one numbered section. The v2 formatter handled this
by silently inserting a containing section, but with the introduction
of the preptool, which in theory should produce a master file from
which various formatters would produce equivalent results, this
becomes troublesome, as the automatic insertion of a container
section is specified for the html formatter, in section 9.8. of RFC
7992, but not for the text formatter. It would be much better to
make the prepped xml explicitly show exactly what should be rendered,
and not rely on formatters silently insert elements.
Recommendation: Update the schema to make it possible for
to contain , and have the prepped xml
explicitly show both the encapsulating section and the
subsections. The current preptool implementation does this.
3.1.7. In Section 2.45.1, "category" Attribute
Changing the "category" attribute of to a name value in an
additional makes it much harder than it needs to be to
look it up. It also makes the semantics of less clear.
Recommendation: Remove this, and keep the "category" attribute on
Levkowetz Expires January 17, 2019 [Page 7]
Internet-Draft RFC7991 Implementation Notes July 2018
3.1.8. In Section 2.53.3 and 2.53.4.
3.1.8.1. Unnecessary limitation on where the "keepWithNext" attribute
can be used
Why keepWithNext only on ? It would be very natural to expect to
be able to say keepWithNext for 2 tables, or 2 figures, or 2 lists?
Recommendation: Permit keepWithNext on all elements that can be
siblings to .
3.1.8.2. Violation of KISS and DRY principles
keepWithNext on one element is equivalent with keepWithPrevious on
the following element, provided the following element can have a
keepWithPrevious attribute. Providing both violates both KISS and
DRY.
Recommendation: Keep only one of these two attributes, preferably
keepWithNext.
3.1.9. In Section 2.63.2,
"empty" attribute
In v2, this results in a list using space as the bullet, thus each
list entry is indented as with other bullet symbols. However, this
leaves no way to get list entries with arbitrary text that are not
indented, in order to produce lists such as that used in Table of
Content and Index.
The current implementation introduces a new attribute "bare" with the
possible values "false" | "true" to signal this. This works, but is
maybe clumsier than necessary.
3.1.10. In Section 3.4.2, "hangIndent" Attribute
"Deprecated. Use
instead."
This causes capability loss. The "hangIndent" attribute not only
signalled that hanging indent should be used, but also gave the size
of the indent. No equivalent control has been provided for the
element in the version 3 vocabulary.
3.1.11. In Appendix C. Relax NG schema
The "colspan" attribute is given a default value of "0", this should
be "1". "0" is not otherwise defined in the text, and the only
reasonable interpretation would be to hide the cell (make it occupy
zero columns).
Levkowetz Expires January 17, 2019 [Page 8]
Internet-Draft RFC7991 Implementation Notes July 2018
The "rowspan" attribute is given a default value of "0", this should
be "1". "0" is not otherwise defined in the text, and the only
reasonable interpretation would be to hide the cell (make it occupy
zero rows).
3.1.12. Use of the term 'counter'.
The classical meaning of this term is a a monotonically increasing
sequence of integers, globally unique or unique within a context. In
this document, it is instead meant to indicate section, table, figure
numbers, which for sections are not plain counters.
To make more interesting, in other contexts in the document, the
notation "-nnn", which also would normally indicate a dash followed
by digits, i.e., a counter, is also re-interpreted to include section
numbers; strings of numbers including embedded period signs. This is
bad terminology.
Recommendation: Instead of "counter", use "number" as the attribute
value, and explicitly say "Section number, Figure number,
Table number or ordered list labels" in the description. Use
"-n.n" instead of "-nnn".
3.2. RFC 7998
3.2.1. In Section 5.2.6, Attribute Default Value Insertion
The "stream" attribute has a default value of "IETF".
The effect of setting default values after the XInclude processing is
to set stream="IETF" on all reference which don't have a
stream set. This is probably not right.
The current implementation removes the default value for the "stream"
attribute from the schema.
3.2.2. In Section 5.4.2.1, Compare "submissionType" and
"stream".
It doesn't seem like a good fit to have tag attributes that all have
to be set to the same value. This is not DRY, and unnecessarily
introduces the possibility of conflict, as a result of multiple
elements being permitted (Relevant to the v3 schema, not
the preptool).
Levkowetz Expires January 17, 2019 [Page 9]
Internet-Draft RFC7991 Implementation Notes July 2018
3.2.3. In Section 5.4.6, "pn" Numbering.
The list of elements that are given p- or paragraph tags is severely
limited, and since the presence of a pn= attribute is required in
order to make internal instances work, this limits the
elements to which it is possible to reference with html fragment
identifiers. Why?
Why is
and
present, but not ,
,
?
The current implementation adds p- numbering to ,
,
,
,
, which all are allowed to have pn= attributes according to
the schema.
4. Non-Schema Issues
4.1. RFC 7991
4.1.1. In Section 2.17,
4.1.1.1. Current Date Requirement
"When the prep tool is used to create Internet-Drafts, it will
reject a submitted Internet-Draft that has a element in the
boilerplate for itself that is anything other than today."
It is not up to the format definition to set policy for acceptance or
rejection of draft submissions. The matter is more complex than the
text assumes, see for instance datatracker issue #2422. In addition
to being inappropriate, this text also quietly changes policy from
+/- 3 days to +/- 0 days, without saying that it updates RFC 4228
[RFC4228], which is the current specification of permissible dates in
draft submissions. Finally, enforcing this would cause _a lot_ of
grief and problems.
This specification item has been ignored in the implementation.
4.1.1.2. Date Specification in References
"Bibliographic references: In dates in elements, the
date information can have prose text for the month or year. For
example, vague dates (year="ca. 2000"), date ranges
(year="2012-2013"), non-specific months (month="Second quarter"),
and so on are allowed."
The text regarding prose text for month and year in bibliographic
references is not workable. How should month and year be combined?
Some bibliographic references may have date text which requires year
first, others year last, and so on. Mixing the described fuzziness
Levkowetz Expires January 17, 2019 [Page 10]
Internet-Draft RFC7991 Implementation Notes July 2018
into the otherwise strict year, month, date format makes little sense
when the result of combining the year, month and date attributes
cannot be predictably and correctly rendered.
Recommendation: Instead of the current specification, permit either
that the element may have text content, or an alternative
attribute to be used for rendering if year, month, or day cannot
be specified exactly.
4.1.2. In Section 2.47,
The possible and forbidden combinations of attributes for this
element has now become so convoluted that it's really hard to
understand how to use it correctly. This needs a serious
reconsideration.
The 'name' attribute is mandatory, and only 3 values are permitted:
"RFC", "Interned-Draft", and "DOI". But it is also mandatory to set
the name to "" for a with a status attribute. Hmm...
So there are 4, not 3 permitted values: "RFC", "Internet-Draft",
"DOI", and "".
This means that all reference files which has things like name="ISO",
name="W3C Recommendation", etc., etc., have become illegal.
This limitation on "name" attributes has not been
enforced in the current implementation.
4.1.3. In Appendix A.1.1: TLP switch-over date discrepancies
There are discrepancies between the specified switch-over dates in
the specification, and those given by the Trust statements:
* TLP3.0: The specification says 2009-11-01 but the TLP statement
says effective date 2009-09-12.
* TLP4.0: The specification says 2010-04-01 but the TLP statement
says effective date 2009-12-28. The dates on which TLP 4 started
to be use in published RFCs seems to match the stated effective
date of 2009-12-28, based on a scan of some RFCs around that date.
The current implementation uses the official dates in the preptool,
not the dates in RFC 7991.
RFC 7991 also states this about the pre5378 text: this text appears
under "Copyright Notice", unless the document was published before
November 2009, in which case it appears under "Status of This Memo".
Levkowetz Expires January 17, 2019 [Page 11]
Internet-Draft RFC7991 Implementation Notes July 2018
This does not agree at all with what actual RFCs contain; they seem
to consistently have this text under Copyright Notice.
4.1.4. Index
There is no guidance on the structure of an index, if one is to be
generated by the preptool.
4.1.5. Anchors
Section 5.1 of RFC 7992 says in part:
"The prep tool produces XML with anchor attributes in all elements
that need them."
This is rather vital information regarding the content of the prepped
xml when building a formatter, unfortunately it is not mentioned in
RFC 7991.
4.2. RFC 7992
4.2.1. In Section 8.1.1, Index Contents
The index has an extra
enclosing the contents, starting
directly after
, while sections explicitly does not have a div
here. This irregularity seems quite unnecessary, but makes the
formatter code more complex than need be. Could we please align the
two?
4.3. RFC 7994
4.3.1. Additional Guidance
*