Internet DRAFT - draft-seantek-constrained-abnf
draft-seantek-constrained-abnf
Network Working Group S. Leonard
Internet-Draft Penango, Inc.
Updates: 5234 (if approved) P. Kyzivat
Intended Status: Standards Track
Expires: September 14, 2017 March 13, 2017
Constrained ABNF
draft-seantek-constrained-abnf-02
Abstract
This document extends the base definition of ABNF (Augmented Backus-
Naur Form) to express a rule that is constrained by another rule. If
a rule B is constrained by rule A, then every production generated by
rule B must also be generated by rule A. By creating subordinate
production forms, ABNF-using specifications can formally denote the
relationship between a general rule and specific subsets of that
rule, while preserving ABNF's context-free nature.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute working
documents as Internet-Drafts. The list of current Internet-Drafts is
at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on September 14, 2017.
Copyright Notice
Copyright (c) 2017 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
Leonard & Kyzivat Standards Track [Page 1]
Internet-Draft Constrained ABNF March 2017
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
1. Introduction
Augmented Backus-Naur Form (ABNF) [RFC5234] is a formal syntax that
is popular among many Internet specifications. As a context-free
grammar, all rules heretofore are expressible in the form:
rule = elements
where a rule can be applied regardless of the context of the
elements.
Many Internet documents employ this syntax. However, many
specifications define protocols with extension points where certain
rules identify a field that can take on different productions with
different semantics. For example, a <value> field might generally
permit any <VCHAR>, but when the <value> field has "IPv6:" *VCHAR, it
has a peculiar meaning. The traditional ABNF approach is to enumerate
a long list of production rules that comply with the general pattern,
in the alternative, and then to tack on the generic pattern named as
the <other-value>. This syntax works okay for a base specification
but makes it difficult to extend a rule in subsequent specifications
in a way that formally names the conformance to the <other-value> (as
opposed to extending the rule with novel syntax). Furthermore, since
ABNF does not imply an order of operations, a production that matches
a specific rule will also match the generic catch-all rule. The
traditional approach constructs an ambiguous grammar, even though the
standards authors do not intend the grammar to be ambiguous. These
limitations hamper computational ABNF parsers as well as ABNF efforts
for services such as syntax highlighting, automatic grammar checking,
and compiling into target computer languages.
2. Constrained Grammar
This document provides a syntax for an ABNF rule that is constrained
by another rule. We observe the following relation:
If rule B is constrained by rule A, then every production
generated by rule B must also be generated by rule A.
A few comments are in order with this proposal. First of all, ABNF is
a context-free grammar; this proposal attempts to preserve this
Leonard & Kyzivat Standards Track [Page 2]
Internet-Draft Constrained ABNF March 2017
nature. There are other ways to express "constraints" that are
context-sensitive, but extending ABNF in such ways would make it a
context-sensitive grammar (which implies more capable automata for
parsing and generating such languages).
Second of all, the relation "A constrains B" can, in other contexts,
be understood as a "subclassing", "subtyping", or "conjunctive"
operation. As with conjunction, the underlying operation is
commutative: a rule "B-constrained-by-A" produces the same results as
"A-constrained-by-B". However, from a rule-naming perspective, the
first name (the constrained) is the name; the second name (the
constraint) is the ternary operand. Therefore, the syntax proposed in
Section 3 is not commutative.
Third of all, it is possible to express a constraint relationship
that cannot generate any production--not even the empty string. This
is consonant with [RFC5234] ABNF. With generic [RFC5234] ABNF, it is
possible to repeat rule definitions in a way that makes them
impossible to be satisfied, such as <rule = 4ALPHA> followed by
<rule = 2DIGIT>. This syntax behaves exactly the same way, except
that it permits creating new rule names.
3. Constraint Syntax
To write a constrained rule, use the ternary syntax
rulename ^ constraint = elements. The constraint can be a list of
elements, but in most formulations will simply be another rulename.
The reflective case where constraint is rulename has no effect and is
analogous to duplicating a rule in a list of rules as in [RFC5234]
ABNF. The following enhancement to [RFC5234] permits this referenced-
rule syntax as an incremental element:
rulelist =/ constrainedrule
constrainedrule = rulename constrained-by constraint
*c-wsp "=" *c-wsp elements c-nl
constraint = elements
constrained-by = *c-wsp "^" *c-wsp
In the constrained-rule production (constrainedrule), the "^" and "="
production is a ternary operator that takes the name of the rule, the
constraint, and the elements.
A processor that wishes to reduce constrainedrule to an [RFC5234]
rule, can do so by conjoining the constraint with the elements. The
synthesized rule is then appended in situ to the constraining rule in
Leonard & Kyzivat Standards Track [Page 3]
Internet-Draft Constrained ABNF March 2017
the alternative (with "/"), in elements where the constraining rule
is named. A basic [RFC5234] processor will match or generate
productions to both the constraining and constrained rules. [[NB:
This synthesis demonstrates that constraint syntax amounts to
syntactic sugar, rather than a fundamental change to ABNF.]]
A rule is defined either regularly ("=") or as constrained("^" "=").
The incremental alternative cannot be used with the constraint
syntax. [[NB: This is a change from -00 to -01.]] However, a rule
first defined as a constrained rule can be further refined with
incremental alternatives.
While this syntax suggests that a binary "^" operator could be
defined, such an operator is undesirable for a couple of reasons.
Firstly, arbitrary <elements ^ elements> syntax serves no parsing or
generating purpose: it would be clearer for an author to combine the
two prior to writing a specification. A hypothetical
<name = *ALPHA ^ 1*10ALPHA> is better written <name = 1*10ALPHA> in
the first place. Similarly, <name2 = "!" 1*10ALPHA ^ 10ASCII> is
better written <name2 = "!" 9ALPHA> in the first place. [[NB: From
this perspective, idealized ABNF can be understood as being in
Disjunctive Normal Form, given the existence of "=/".]]
Secondly, conjunction inverts the dependency relationship between
symbols in ABNF, reversing the directionality of edges in the
reachability graph. In [RFC5234] ABNF, every nonterminal symbol (rule
name) declares its dependent symbols, which are known at the time of
parsing to be terminal or nonterminal. Having parsed a rule, an ABNF
parser knows exactly which nonterminal symbols to look for before
returning. Therefore, the parsing of subsequent rule definitions that
do not have sought-after names can be skipped. The existence of a
general-purpose conjunctive operator, however, implies that any rule
definition could name a rule that constrains the subject rule,
meaning that an ABNF parser would have to parse every rule to
completion before returning. By moving the constraint to the left-
hand-side of the "=", a parser need not parse the right-hand side of
every rule. (Notably, a parser will have to parse all of the elements
of the constraint, which is a reason why authors should exercise
REstraint by limiting the constraint to a single named rule.)
Stylistically, authors are encouraged to put constraint syntax below
the rule that defines the constraint. If the constraining rule refers
to novel rules, those rules may be defined prior the constrained
rules. For example, the relevant parts of [RFC5322] could be written:
; trace is not relevant to this example
fields = *(trace *field / *resent-field) *field
Leonard & Kyzivat Standards Track [Page 4]
Internet-Draft Constrained ABNF March 2017
resent-field = "Resent-" field-name ":" unstructured CRLF
field = field-name ":" unstructured CRLF
field-name = 1*ftext
ftext = %d33-57 / %d59-126
; obs-unstruct removed
unstructured = (*([FWS] VCHAR) *WSP)
resent-date ^ resent-field = "Resent-Date:" date-time CRLF
resent-from ^ resent-field = "Resent-From:" mailbox-list CRLF
resent-sender ^ resent-field = "Resent-Sender:" mailbox CRLF
resent-to ^ resent-field = "Resent-To:" address-list CRLF
resent-cc ^ resent-field = "Resent-Cc:" address-list CRLF
resent-bcc ^ resent-field = "Resent-Bcc:"
[address-list / CFWS] CRLF
resent-msg-id ^ resent-field = "Resent-Message-ID:" msg-id CRLF
orig-date ^ field = "Date:" date-time CRLF
from ^ field = "From:" mailbox-list CRLF
sender ^ field = "Sender:" mailbox CRLF
reply-to ^ field = "Reply-To:" address-list CRLF
to ^ field = "To:" address-list CRLF
cc ^ field = "Cc:" address-list CRLF
bcc ^ field = "Bcc:" [address-list / CFWS] CRLF
message-id ^ field = "Message-ID:" msg-id CRLF
in-reply-to ^ field = "In-Reply-To:" 1*msg-id CRLF
references ^ field = "References:" 1*msg-id CRLF
subject ^ field = "Subject:" unstructured CRLF
Leonard & Kyzivat Standards Track [Page 5]
Internet-Draft Constrained ABNF March 2017
comments ^ field = "Comments:" unstructured CRLF
keywords ^ field = "Keywords:" phrase *("," phrase) CRLF
4. Effects on RFC 5234
Formally, this document updates [RFC5234] but does not modify it in
situ. Authors need to reference this document if they want to include
these enhancements; bare references to [RFC5234] do not include this
specification. This directive follows a model whereby document
authors can choose whether to invoke particular enhancements to ABNF.
As time goes on, the IETF can determine how often these enhancements
are invoked, and can decide whether to include them as part of a
revision to the base [RFC5234].
5. IANA Considerations
This document implies no IANA considerations.
6. Security Considerations
Security is truly believed to be irrelevant to this document.
7. References
7.1. Normative References
[RFC5234] Crocker, D., Ed., and P. Overell, "Augmented BNF for
Syntax Specifications: ABNF", STD 68, RFC 5234, DOI
10.17487/RFC5234, January 2008, <http://www.rfc-
editor.org/info/rfc5234>.
7.2. Informative References
[RFC5322] Resnick, P., Ed., "Internet Message Format", RFC 5322, DOI
10.17487/RFC5322, October 2008, <http://www.rfc-
editor.org/info/rfc5322>.
Leonard & Kyzivat Standards Track [Page 6]
Internet-Draft Constrained ABNF March 2017
Authors' Addresses
Sean Leonard
Penango, Inc.
5900 Wilshire Boulevard
21st Floor
Los Angeles, CA 90036
USA
EMail: dev+ietf@seantek.com
URI: http://www.penango.com/
Paul Kyzivat
Massachusetts
United States
EMail: pkyzivat@alum.mit.edu
Leonard & Kyzivat Standards Track [Page 7]