Network Working Group S. Leonard Internet-Draft Penango, Inc. Updates: 5234 (if approved) October 10, 2015 Intended Status: Standards Track Expires: April 12, 2016 Comprehensive Core Rules and Imports for ABNF draft-seantek-abnf-more-core-rules-03 Abstract This document extends the base definition of ABNF (Augmented Backus- Naur Form) to include comprehensive support for certain symbols related to ASCII, and defines an import syntax. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on April 12, 2016. Copyright Notice Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Leonard Standards Track [Page 1] Internet-Draft More Core Rules October 2015 1. Comprehensive Core Rule Update and Import Syntax Augmented Backus-Naur Form (ABNF) [RFC5234] is a formal syntax that is popular among many Internet specifications. Many Internet documents employ this syntax along with the Core Rules defined in Appendix B.1 of [RFC5234]. However, the Core Rules do not specify many symbols in the ASCII range that are also needed by these relying documents, forcing document authors to define them as local rules. Sometimes different documents define these common symbols in different ways, resulting in confusion or incompatibility when the rules are misread or are combined with other sets of rules. Furthermore, [RFC5234] does not clarify whether referencing [RFC5234] for ABNF automatically defines its Core Rules. [RFC5234] also lacks a syntax for importing rules from other specifications. Instead, authors have been required to name the rules and sources in the specification prose. While this method has served authors well, it has hampered machine-readable ABNF efforts for services such as syntax highlighting, automatic grammar checking, and compiling into target computer languages. This document provides Core Rules that include comprehensive support for certain symbols, namely DELETE (DEL) and the C0 control characters in [ASCII86], which for purposes of this document is equivalent to [RFC0020]. To import a rule, define the rule with a local rule name, and put the reference to the rule in a prose-val. The rule syntax is: "<" [ rulename "@" ] (import-ref / import-uri) ">" The form import-ref is a document reference. In IETF-related publications, import-ref will be enclosed in square brackets, such as "[RFC1605]". The form import-uri is supposed to be a Uniform Resource Indicator [RFC3986], but a machine implementation is not required to validate conformance to the URI production of [RFC3986]. Fragment components might be present, but only if the resource defines the fragment to mean a range of text (i.e., not just a point in the text). When the 'rulename "@"' syntax is present, the rulename production preceding the "@" specifies the name of the rule in the reference. When the 'rulename "@"' syntax is absent, the name of the rule in the reference is the same as the name of the rule in the rule definition preceding the "=". Leonard Standards Track [Page 2] Internet-Draft More Core Rules October 2015 [[DISCUSS: Alternative delimiters? Right now this syntax shares < > with prose-val; this is intentional for compatibility and to reduce symbol proliferation.]] [[DISCUSS: ABNF for this ABNF? The author considers it very undesirable to import URI normatively from RFC 3986. URI is very complicated and RFC 3986 predates RFC 5234 anyway. Need clean break with the past. import-uri = VCHAR could work since VCHAR does not include spaces, and most free-form prose will include at least one space.]] Formally, this document does not make changes to [RFC5234]. Authors need to reference this document if they want to include these enhancements; bare references to [RFC5234] do not include this specification (or, for that matter, [RFC7405]). This directive follows a model whereby document authors can choose whether to invoke particular enhancements to ABNF. As time goes on, the IETF can determine how often these enhancements are invoked, and decide whether to include them as part of a revision to the base [RFC5234]. A reference to this document invokes the import syntax enhancement, as well as all of the Core Rules of Appendix A (i.e., the Core Rules do not have to be imported). Appendix A of this document is meant to mirror Appendix B.1 of [RFC5234]. Document authors who reference this document should use the rules of Appendix A, and should not attempt to redefine or augment them (except for backwards compatibility with prior documents). 2. IANA Considerations This document implies no IANA considerations. 3. Security Considerations Security is truly believed to be irrelevant to this document. 4. References 4.1. Normative References [ASCII86] American National Standards Institute, "Coded Character Set -- 7-bit American Standard Code for Information Interchange", ANSI X3.4, 1986. [RFC0020] Cerf, V., "ASCII format for network interchange", RFC 20, October 1969. Leonard Standards Track [Page 3] Internet-Draft More Core Rules October 2015 [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", STD 68, RFC 5234, January 2008. 4.2. Informative References [UNICODE] The Unicode Consortium, "The Unicode Standard, Version 8.0.0", The Unicode Consortium, August 2015. [RFC1345] Simonsen, K., "Character Mnemonics and Character Sets", RFC 1345, June 1992. [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, January 2005. [RFC5198] Klensin, J. and M. Padlipsky, "Unicode Format for Network Interchange", RFC 5198, March 2008. Leonard Standards Track [Page 4] Internet-Draft More Core Rules October 2015 Appendix A. Comprehensive Core Rules Certain basic rules are in uppercase, such as SP, HTAB, CRLF, DIGIT, ALPHA, etc. ALPHA = %x41-5A / %x61-7A ; A-Z / a-z BIT = "0" / "1" CHAR = %x01-7F ; any 7-bit US-ASCII character, ; excluding NUL CR = %x0D ; carriage return CRLF = CR LF ; Internet standard newline CTL = %x00-1F / %x7F ; controls DIGIT = %x30-39 ; 0-9 DQUOTE = %x22 ; " (Double Quote) HEXDIG = DIGIT / "A" / "B" / "C" / "D" / "E" / "F" HTAB = %x09 ; horizontal tab LF = %x0A ; linefeed LWSP = *(WSP / CRLF WSP) ; Use of this linear-white-space rule ; permits lines containing only white ; space that are no longer legal in ; mail headers and have caused ; interoperability problems in other ; contexts. ; Do not use when defining mail ; headers and use with caution in ; other contexts. OCTET = %x00-FF Leonard Standards Track [Page 5] Internet-Draft More Core Rules October 2015 ; 8 bits of data SP = %x20 VCHAR = %x21-7E ; visible (printing) characters WSP = SP / HTAB ; white space NUL = %d0 SOH = %d1 STX = %d2 ETX = %d3 EOT = %d4 ENQ = %d5 ACK = %d6 BEL = %d7 BS = %d8 HT = %d9 ; also defined as HTAB VT = %d11 FF = %d12 ; (literally used in every RFC) SO = %d14 SI = %d15 DLE = %d16 DC1 = %d17 DC2 = %d18 DC3 = %d19 DC4 = %d20 NAK = %d21 SYN = %d22 ETB = %d23 CAN = %d24 EM = %d25 SUB = %d26 ESC = %d27 FS = %d28 GS = %d29 RS = %d30 US = %d31 DEL = %d127 Leonard Standards Track [Page 6] Internet-Draft More Core Rules October 2015 Appendix B. Guidance for Rule Names for C1 Controls and Other Desiderata Internet protocols have been migrating to Unicode and specifically UTF-8 for general text encoding. Authors need to consider the presence and possible effects of characters and code points beyond ASCII. See [RFC5198]. Therefore, the following rule names MAY take on special meanings. This document does not formally define these rule names, nor does this document prohibit other specifications from using them. However, authors ought only to use these rule names in their normal and natural senses. For the underlying sources, consult [UNICODE] and [RFC1345]. ABNF rules resolve into a string of terminal values. Such a value "is merely a non-negative integer"; only context can furnish a specific mapping of values into a character set. [RFC5234] Therefore, even if Unicode is specified, mappings between terminal values beyond %x7F may be encoded to different bit combinations depending on the encoding method. This document does not purport to change the character set of ABNF itself, which remains [ASCII86]. (See [RFC5234].) [[DISCUSS: what if you include ABNF in a UTF-8 document and you really want to use characters beyond ASCII in literals? Foreseeable? Dangerous?]] ASCII terminal values between 0 - 7F (cf. CHAR) [[DISCUSS: migrate to Appendix A?]] C0 synonym for CTL [[DISCUSS: migrate to Appendix A?]] UNICODE terminal values representing 0 - 10FFFF BEYONDASCII terminal values representing 80 - 10FFFF [[DISCUSS: these definitions include all code points, including surrogate code points, which are not valid or encodable in UTF-8.]] C1 terminal values representing 80 - 9F PAD terminal value representing 80 HOP terminal value representing 81 BPH terminal value representing 82 NBH terminal value representing 83 IND terminal value representing 84 NEL terminal value representing 85 NL terminal value possibly representing CRLF, CR, LF, NEL, or any combination thereof (but not LS or PS) SSA terminal value representing 86 ESA terminal value representing 87 Leonard Standards Track [Page 7] Internet-Draft More Core Rules October 2015 HTS terminal value representing 88 HTJ terminal value representing 89 VTS terminal value representing 8A PLD terminal value representing 8B PLU terminal value representing 8C RI terminal value representing 8D SS2 terminal value representing 8E SS3 terminal value representing 8F DCS terminal value representing 90 PU1 terminal value representing 91 PU2 terminal value representing 92 STS terminal value representing 93 CCH terminal value representing 94 MW terminal value representing 95 SPA terminal value representing 96 EPA terminal value representing 97 SOS terminal value representing 98 SGCI terminal value representing 99 SCI terminal value representing 9A CSI terminal value representing 9B ST terminal value representing 9C OSC terminal value representing 9D PM terminal value representing 9E APC terminal value representing 9F NBSP terminal value representing A0 SHY terminal value representing AD LS terminal value representing 2028 PS terminal value representing 2029 Author's Address Sean Leonard Penango, Inc. 5900 Wilshire Boulevard 21st Floor Los Angeles, CA 90036 USA EMail: dev+ietf@seantek.com URI: http://www.penango.com/ Leonard Standards Track [Page 8]