INTERNET-DRAFT S. Legg draft-legg-xed-rxer-02.txt Adacel Technologies Intended Category: Standards Track D. Prager Deakin University June 16, 2004 Robust XML Encoding Rules for ASN.1 Types Copyright (C) The Internet Society (2004). All Rights Reserved. Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Distribution of this document is unlimited. Technical discussion of this document should take place on the XED developers mailing list . Please send editorial comments directly to the editor . This Internet-Draft expires on 16 December 2004. Abstract This document defines a set of Abstract Syntax Notation One (ASN.1) encoding rules, called the Robust XML Encoding Rules or RXER, that produce an Extensible Markup Language (XML) representation for values of any given ASN.1 data type. Legg & Prager Expires 16 December 2004 [Page 1] INTERNET-DRAFT Robust XML Encoding Rules June 16, 2004 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Conventions. . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Definitions. . . . . . . . . . . . . . . . . . . . . . . . . . 4 3.1. Qualified Reference Names . . . . . . . . . . . . . . . . 4 4. General Considerations . . . . . . . . . . . . . . . . . . . . 5 5. Standalone RXER Encodings. . . . . . . . . . . . . . . . . . . 6 6. Encoding Rules . . . . . . . . . . . . . . . . . . . . . . . . 6 6.1. Identifiers. . . . . . . . . . . . . . . . . . . . . . . 7 6.2. Type Referencing Notations . . . . . . . . . . . . . . . 7 6.3. Restricted Character String Types. . . . . . . . . . . . 8 6.4. BIT STRING . . . . . . . . . . . . . . . . . . . . . . . 8 6.5. BOOLEAN. . . . . . . . . . . . . . . . . . . . . . . . . 10 6.6. CHARACTER STRING . . . . . . . . . . . . . . . . . . . . 10 6.7. CHOICE . . . . . . . . . . . . . . . . . . . . . . . . . 10 6.8. EMBEDDED PDV . . . . . . . . . . . . . . . . . . . . . . 11 6.9. ENUMERATED . . . . . . . . . . . . . . . . . . . . . . . 12 6.10. EXTERNAL . . . . . . . . . . . . . . . . . . . . . . . . 12 6.11. GeneralizedTime. . . . . . . . . . . . . . . . . . . . . 12 6.12. INSTANCE OF. . . . . . . . . . . . . . . . . . . . . . . 13 6.13. INTEGER. . . . . . . . . . . . . . . . . . . . . . . . . 13 6.14. NULL . . . . . . . . . . . . . . . . . . . . . . . . . . 14 6.15. ObjectDescriptor . . . . . . . . . . . . . . . . . . . . 14 6.16. OBJECT IDENTIFIER and RELATIVE-OID . . . . . . . . . . . 15 6.17. OCTET STRING . . . . . . . . . . . . . . . . . . . . . . 15 6.18. REAL . . . . . . . . . . . . . . . . . . . . . . . . . . 16 6.19. SEQUENCE and SET . . . . . . . . . . . . . . . . . . . . 16 6.20. SEQUENCE OF and SET OF . . . . . . . . . . . . . . . . . 18 6.21. UTCTime. . . . . . . . . . . . . . . . . . . . . . . . . 19 6.22. Open Type. . . . . . . . . . . . . . . . . . . . . . . . 20 6.23. AnyType. . . . . . . . . . . . . . . . . . . . . . . . . 21 7. RXER Transfer Syntax . . . . . . . . . . . . . . . . . . . . . 22 8. Relationship to XER. . . . . . . . . . . . . . . . . . . . . . 22 9. Security Considerations. . . . . . . . . . . . . . . . . . . . 23 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 23 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 23 11.1. Normative References . . . . . . . . . . . . . . . . . . 23 11.2. Informative References . . . . . . . . . . . . . . . . . 25 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 25 Full Copyright Statement . . . . . . . . . . . . . . . . . . . . . 26 1. Introduction This document defines a set of Abstract Syntax Notation One (ASN.1) [X680] encoding rules, called the Robust XML Encoding Rules or RXER, that produce an Extensible Markup Language (XML) [XML] representation of ASN.1 values of any given arbitrary ASN.1 type. Legg & Prager Expires 16 December 2004 [Page 2] INTERNET-DRAFT Robust XML Encoding Rules June 16, 2004 An ASN.1 value is regarded as analogous to the content of an element. The RXER encoding of an ASN.1 value is the well-formed and valid content of an element in an XML document [XML] conforming to XML namespaces [XMLNS]. Simple ASN.1 data types such as PrintableString, INTEGER, BOOLEAN, define character data content while the ASN.1 combining types (i.e., SET, SEQUENCE, SET OF, SEQUENCE OF, and CHOICE) define element content. The element names are provided by the identifiers of the components in combining type definitions (i.e., elements correspond to the NamedType notation). Note that "ASN.1 value" does not mean a Basic Encoding Rules (BER) [X690] encoded value. The ASN.1 value is an abstract concept that is independent of any particular encoding. BER is just one possible encoding of an ASN.1 value. This document defines another possible encoding. Rules for canonical RXER encodings will be introduced in a revision of this document. The effect of ASN.1 encoding instructions on RXER encodings will be covered in a revision of this document. 2. Conventions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", and "MAY" in this document are to be interpreted as described in BCP 14, RFC 2119 [BCP14]. The key word "OPTIONAL" is exclusively used with its ASN.1 meaning. Throughout this document "type" shall be taken to mean an ASN.1 type, and "value" shall be taken to mean an ASN.1 abstract value. A reference to a ASN.1 production [X680] (e.g., Type, NamedType) is a reference to the text in an ASN.1 specification corresponding to that production. The specification of RXER makes use of definitions from the XML Information Set (Infoset) [ISET]. In particular, information item property names are presented per the Infoset, e.g., [local name]. In the sections that follow, the term "element" shall be taken to mean an Infoset element information item. Literal character strings to be used in the RXER encoding appear within double quotes, however the double quotes are not part of the literal value and do not appear in the encoding. This document uses the namespace prefix "xsi:" to stand for the namespace name "http://www.w3.org/2001/XMLSchema-instance", though in Legg & Prager Expires 16 December 2004 [Page 3] INTERNET-DRAFT Robust XML Encoding Rules June 16, 2004 practice any valid namespace prefix is permitted in RXER encodings. 3. Definitions The root element of an XML document is the [document element] of the document information item corresponding to the XML document. The normalized content of an element information item is the list of information items formed by taking, in order, each character and element information item in the [children] of the element information item (thus eliminating any comments or PIs from consideration when determining the correctness of an RXER encoding). If the normalized content contains only character information items then its string value is the sequence of [character codes] of those character information items in order, otherwise its string value is empty. Note that the normalized content definition is for descriptive purposes only. There is no requirement for RXER encodings to actually be normalized. White space is a sequence of one or more space (U+0020), tab (U+0009), carriage-return (U+000D) or line-feed (U+000A) characters. 3.1. Qualified Reference Names A Qualified Reference Name is a qualified name [XMLNS] that uniquely identifies a particular type definition. Not all type definitions have a Qualified Reference Name. A Type has a Qualified Reference Name if one of the following applies: a) the Type is a typereference (not a DummyReference) or an ExternalTypeReference in a DefinedType in a ReferencedType and the ASN.1 module in which the referenced type is defined has a namespace name [XEDNS], b) the Type comprises one of the productions in Table 1 of the specification for ASN.1 Schema [ASD], c) the Type is a typereference (not a DummyReference) or an ExternalTypeReference in a DefinedType in a ReferencedType and the ASN.1 module in which the referenced type is defined is SchemaLanguageIntegration [GLUE]. In case a), the Qualified Reference Name is the qualified name with the namespace name of the module (in which the referenced type is defined) as the namespace name, and the typereference as the local Legg & Prager Expires 16 December 2004 [Page 4] INTERNET-DRAFT Robust XML Encoding Rules June 16, 2004 part. In case b), the Qualified Reference Name is the qualified name with the namespace name "http://xmled.info/ns/ASN.1" and the local part as indicated in Table 1. In case c), the Qualified Reference Name is the qualified name with the namespace name "http://xmled.info/ns/ASN.1" and the typereference as the local part. Note that the Qualified Reference Name is the same qualified name that would be used to reference the corresponding type in the ASN.1 Schema representation [ASD] of the ASN.1 specification, or the XML Schema derivation [CXSD] of the ASN.1 specification. 4. General Considerations An RXER encoding is permitted to contain XML comments, processing instructions (PIs), CDATA sections, character references and parsed entity references in any position allowed for a well-formed and valid XML document [XML]. However, note that the environment in which an RXER encoding is used may disallow processing instructions and entity references. If entity references (to other than the predefined entities) are used then the XML document containing the RXER encoding must necessarily contain a document type declaration and the internal or external subset of the document type definition (DTD) must contain a declaration for the entity. Although comments and PIs are permitted in RXER encodings, there is no provision for representing comments and PIs in ASN.1 abstract values, therefore applications using RXER MAY discard any comments or PIs in received encodings. Similarly, there is no provision for representing entity references in ASN.1 abstract values, therefore applications using RXER MAY replace entity references with their replacement text at any time. The [attributes] of any element in an RXER encoding are permitted to contain an attribute information item with the [local name] "type" and the [namespace name] "http://www.w3.org/2001/XMLSchema-instance" (i.e., xsi:type [XSD1]) provided the Type of the corresponding NamedType has a Qualified Reference Name (see Section 3.1) that can be used to identify the type. Any element in an RXER encoding is permitted to have namespace declaration attributes [XMLNS]. However note that, with the possible Legg & Prager Expires 16 December 2004 [Page 5] INTERNET-DRAFT Robust XML Encoding Rules June 16, 2004 exception of the root element, the [namespace name] of an element in an RXER encoding is required to have no value (i.e., non-root element names in RXER encodings are unqualified). 5. Standalone RXER Encodings The RXER encoding of some value generates only the content of an element. When the value being encoded is only part of some larger XML document (which is, for example, the way ASN.1 Schema [ASD] uses RXER) then it is the responsibility of the specification invoking RXER to determine the context of the enclosing element (i.e., its [local name] and [namespace name]). RXER can also be used to generate an entire XML document from the encoding of a value. This is termed a Standalone RXER Encoding of the value. ASN.1 does not have a concept analogous to the root element of an XML document. That is, ASN.1 does not allow a NamedType to appear on its own, outside of an enclosing combining type. This means that the rules for encoding the root element in a Standalone RXER Encoding differ from those that apply to any other element in an RXER encoding. In a Standalone RXER Encoding the [local name] of the root element SHALL be "value", and the [namespace name] of the root element SHALL have no value. If the ASN.1 type of the value being encoded has a Qualified Reference Name (see Section 3.1) then the [attributes] of the root element SHOULD contain an attribute information item with the [local name] "type" and the [namespace name] "http://www.w3.org/2001/XMLSchema-instance" (i.e., an xsi:type attribute). The [normalized value] of this attribute SHALL be the Qualified Reference Name of the ASN.1 type. Where the xsi:type attribute is present, appropriate namespace declaration attributes for the namespace names in the attribute's name and value MUST be added to the root element's [attributes]. The namespace prefixes are the encoder's choice. The [attributes] and [children] of the root element (i.e., its content) are generated by the normal application of the encoding rules in Section 6 to the value being encoded. 6. Encoding Rules The following sections describe the RXER encoding for values of each of the ASN.1 type notations. ASN.1 values are uniformly regarded as analogous to the content of an element, not complete elements in Legg & Prager Expires 16 December 2004 [Page 6] INTERNET-DRAFT Robust XML Encoding Rules June 16, 2004 their own right. Examples of encodings in the following sections use a start tag and end tag to delimit the content. These start and end tags are for illustration only and are not part of the encoding of the abstract value. In normal use, the name of the enclosing element is provided by the context of the abstract value, e.g., an enclosing SEQUENCE type. In every case described in the following sections, if the encoding of an ASN.1 value produces no content then the enclosing element MAY be encoded as an empty element (i.e., using an empty-element tag). 6.1. Identifiers An identifier, as defined in ASN.1 notation (Clause 11.3 of X.680 [X680]), is a character string that begins with a latin lowercase letter (U+0061-U+007A) and is followed by zero, one or more latin letters (U+0041-U+005A, U+0061-U+007A), decimal digits (U+0030-U+0039), and hyphens (U+002D). A hyphen is not permitted to be the last character and a hyphen is not permitted to be followed by another hyphen. The case of letters in an identifier is always significant. ASN.1 identifiers are used for the [local name] of child elements and may also appear in the character data content of elements. 6.2. Type Referencing Notations A value of a type with a defined type name is encoded according to the type definition on the right hand side of the type assignment for the type name. A value of a type denoted by the use of a parameterized type with actual parameters is encoded according to the parameterized type with the DummyReferences [X683] substituted with the actual parameters. A value of a tagged or constrained type is encoded as a value of the type without the tag or constraint, respectively. Tags do not appear in the XML encodings defined by this document. See X.680 [X680] and X.682 [X682] for the details of ASN.1 constraint notation. A value of a fixed type denoted by an ObjectClassFieldType is encoded according to that fixed type (see Section 6.22 for the case of an ObjectClassFieldType denoting an open type). A value of a selection type is encoded according to the type referenced by the selection type. A value of a type described by TypeFromObject notation [X681] is Legg & Prager Expires 16 December 2004 [Page 7] INTERNET-DRAFT Robust XML Encoding Rules June 16, 2004 encoded according to the denoted type. A value of a type described by ValueSetFromObjects notation [X681] is encoded according to the governing type. 6.3. Restricted Character String Types A value of a restricted character string type is encoded such that the normalized content is the sequence of character information items representing the characters in the string. Depending on the ASN.1 string type, and an application's internal representation of that string type, a character may need to be translated to or from the equivalent ISO 10646 character code [UCS]. The NumericString, PrintableString, IA5String, VisibleString (ISO646String), BMPString and UniversalString character encodings use the same character codes as ISO 10646. For the remaining string types (GeneralString, GraphicString, TeletexString, T61String and VideotexString) see X.680 [X680]. Note that a consequence of defining the RXER encoding in terms of the XML Infoset is the implied requirement for ampersand ('&', U+0026) and left angle bracket ('<', U+003C) characters in string values to be escaped appropriately [XML]. Certain characters (e.g., control characters) are not legal characters for XML. These characters are encoded as the replacement character (U+FFFD). When decoding, the replacement character is retained if it is a permitted character for the string type, otherwise it is converted to U+0000 if that character is permitted by the string type, otherwise it is discarded. All white space characters in the RXER encoding of a value of a restricted character string type are significant, i.e., part of the abstract value. Examples The content of each of the following elements is the RXER encoding of a IA5String value: Don't run with scissors! Markup (e.g., <value>) has to be escaped. Markup (e.g., ]]>) has to be escaped. 6.4. BIT STRING Legg & Prager Expires 16 December 2004 [Page 8] INTERNET-DRAFT Robust XML Encoding Rules June 16, 2004 A value of the BIT STRING type without a NamedBitList is encoded such that the string value of the normalized content is either a binary digit string or a hexadecimal digit string, optionally preceded by and/or followed by white space characters. A hexadecimal digit string MAY be used only if the number of bits in a BIT STRING value is a multiple of eight, otherwise a binary digit string is used. A binary digit string is a sequence of zero, one or more of the binary digit characters "0" and "1" (i.e., U+0030 and U+0031). Each bit in the BIT STRING value is encoded as a binary digit in order from the first bit to the last bit. A hexadecimal digit string is a sequence of zero, one or more pairs of the hexadecimal digit characters "0"-"9", "A"-"F" and "a"-"f" (i.e., U+0030-U+0039, U+0041-U+0046 and U+0061-U+0066). Each group of eight bits in the BIT STRING value is encoded as a pair of hexadecimal digits where the first bit is the most significant. An odd number of hexadecimal digits is not permitted. If a hexadecimal digit string is used then the enclosing element's [attributes] SHALL contain an attribute information item with the [local name] "type" and the [namespace name] "http://www.w3.org/2001/XMLSchema-instance". The [normalized value] of this attribute SHALL be the qualified name with namespace name "http://www.w3.org/2001/XMLSchema" and local part "hexBinary" (e.g., xsi:type="xsd:hexBinary"). A value of the BIT STRING type with a NamedBitList is encoded such that the string value of the normalized content is either, as above for the BIT STRING type without a NamedBitList or, a list of identifiers separated by one or more white space characters optionally preceded by and/or followed by white space characters. In the latter case, each "1" bit in the BIT STRING value is represented by its corresponding identifier from the NamedBitList, in any order. Examples Consider this type definition: BIT STRING { black(0), red(1), orange(2), yellow(3), green(4), blue(5), indigo(6), violet(7) } The content of each of the following elements is an RXER encoding of the same abstract value: green violet orange 00101001 00101001 Legg & Prager Expires 16 December 2004 [Page 9] INTERNET-DRAFT Robust XML Encoding Rules June 16, 2004 29 6.5. BOOLEAN The BOOLEAN value TRUE is encoded such that the string value of the normalized content is the literal "true" or "1", at the encoder's option, optionally preceded by and/or followed by white space characters. The BOOLEAN value FALSE is encoded such that the string value of the normalized content is the literal "false" or "0", at the encoder's option, optionally preceded by and/or followed by white space characters. The RXER encoding of BOOLEAN values is intended to conform to the lexical representation of the XML Schema [XSD2] boolean datatype. Examples The content of each of the following elements is the RXER encoding of a BOOLEAN value: 1 false false 6.6. CHARACTER STRING A value of the unrestricted CHARACTER STRING type is encoded according to the corresponding SEQUENCE type defined in Clause 40.5 of X.680 [X680]. 6.7. CHOICE A value of a CHOICE type other than a ChoiceOfStrings type [RFC3641] or the AnyType type [GLUE] is encoded such that the normalized content is a single child element information item - corresponding to the actual chosen alternative - optionally preceded by and/or followed by white space character information items. The chosen alternative corresponds to some NamedType in the CHOICE type definition. The [local name] of the child element corresponding to the chosen alternative SHALL be the identifier of the corresponding NamedType, the [namespace name] of the child element Legg & Prager Expires 16 December 2004 [Page 10] INTERNET-DRAFT Robust XML Encoding Rules June 16, 2004 SHALL have no value, and the content of the child element SHALL be the encoding of the value of the chosen alternative according to the Type of this NamedType. Examples Consider this type definition: CHOICE { name [0] IA5String, serialNumber [1] INTEGER } The content of each of the following elements is the RXER encoding of a value of the above type: Bob Alice 344 A value of a ChoiceOfStrings type is encoded such that the string value of the normalized content is the encoding of the value of the chosen alternative. The enclosing element's [attributes] MAY contain an attribute information item with the [local name] "type" and the [namespace name] "http://www.w3.org/2001/XMLSchema-instance" to identify the chosen alternative. The [normalized value] of this attribute SHALL be the qualified name with namespace name "http://xmled.info/ns/ASN.1" and local part either "BMPString", "GeneralString", "GraphicString", "IA5String", "ISO646String", "NumericString", "PrintableString", "TeletexString", "T61String", "UniversalString", "UTF8String", "VideotexString", or "VisibleString", as appropriate. If the ChoiceOfStrings value has no character data then the enclosing element MAY be encoded as an empty element (i.e., using an empty-element tag). 6.8. EMBEDDED PDV Legg & Prager Expires 16 December 2004 [Page 11] INTERNET-DRAFT Robust XML Encoding Rules June 16, 2004 A value of the EMBEDDED PDV type is encoded according to the corresponding SEQUENCE type defined in Clause 33.5 of X.680 [X680]. 6.9. ENUMERATED A value of an ENUMERATED type is encoded such that the string value of the normalized content is the identifier corresponding to the actual value, optionally preceded by and/or followed by white space characters. Examples Consider this type definition: ENUMERATED { sunday, monday, tuesday, wednesday, thursday, friday, saturday } The content of each of the following elements is the RXER encoding of a value of the above type: monday thursday 6.10. EXTERNAL A value of the EXTERNAL type is encoded according to the corresponding SEQUENCE type defined in Clause 8.18.1 of X.690 [X690]. 6.11. GeneralizedTime A value of the GeneralizedTime type is encoded such that the string value of the normalized content is optional leading whitespace characters followed by a date, the letter "T", a time of day, optional fractional seconds, an optional time zone and optional trailing white space characters. The date is two decimal digits representing the century, followed by two decimal digits representing the year, "-" (U+002D), two decimal digits representing the month, "-" (U+002D), and two decimal digits representing the day. The time of day is two decimal digits representing the hour, followed by ":" (U+003A), two decimal digits representing the minutes, ":" (U+003A), and two decimal digits representing the seconds. Legg & Prager Expires 16 December 2004 [Page 12] INTERNET-DRAFT Robust XML Encoding Rules June 16, 2004 The fractional seconds is a period "." (U+002E) followed by zero, one or more decimal digits (U+0030-U+0039). A GeneralizedTime value with fractional hours or minutes is first converted to the equivalent time with whole minutes and seconds and, if necessary, fractional seconds. The minutes are encoded as "00" if the GeneralizedTime value omits minutes. The seconds are encoded as "00" if the GeneralizedTime value omits seconds. The time zone, if present, is either the letter "Z" (U+005A) to indicate Coordinated Universal Time, a "+" (U+002B) followed by a time zone differential, or a "-" (U+002D) followed by a time zone differential. A time zone differential indicates the difference between local time (the time specified by the preceding date and time of day) and Coordinated Universal Time. Coordinated Universal Time can be calculated from the local time by subtracting the differential. A time zone differential is encoded as two decimal digits representing hours, the character ":" (U+003A), and two decimal digits representing minutes. The minutes are encoded as "00" if the GeneralizedTime value omits minutes from the time zone differential. The RXER encoding of GeneralizedTime values is intended to conform to the lexical representation of the XML Schema [XSD2] dateTime datatype. Examples The content of each of the following elements is the RXER encoding of a GeneralizedTime value: 2004-06-15T12:00:00Z 2004-06-15T02:00:00+10:00 2004-06-15T12:00:00.5 6.12. INSTANCE OF A value of the INSTANCE OF type is encoded according to the corresponding SEQUENCE type defined in Annex C of X.681 [X681]. 6.13. INTEGER Legg & Prager Expires 16 December 2004 [Page 13] INTERNET-DRAFT Robust XML Encoding Rules June 16, 2004 A value of the INTEGER type without a NamedNumberList is encoded such that the string value of the normalized content is a number string representing the integer value, optionally preceded by and/or followed by white space characters. A number string is a sequence of one or more of the decimal digit characters "0" to "9" (U+0030-U+0039), with an optional leading sign, either "+" (U+002B) or "-" (U+002D). Multiple leading zero digits are permitted in a number string. A value of an INTEGER type with a NamedNumberList is encoded such that the string value of the normalized content is either a number string or the identifier corresponding to the actual INTEGER value, optionally preceded by and/or followed by white space characters. The RXER encoding of INTEGER values is intended to conform to the lexical representation of the XML Schema [XSD2] integer datatype. Examples Consider this type definition: INTEGER { zero(0), one(1) } The content of each of the following elements is the RXER encoding of a value of the above type: 0 zero 2 00167 6.14. NULL A value of the NULL type is encoded such that the normalized content is empty. Examples 6.15. ObjectDescriptor Legg & Prager Expires 16 December 2004 [Page 14] INTERNET-DRAFT Robust XML Encoding Rules June 16, 2004 A value of the ObjectDescriptor type is encoded according to the GraphicString type. 6.16. OBJECT IDENTIFIER and RELATIVE-OID A value of the OBJECT IDENTIFIER type or RELATIVE-OID type is encoded such that the string value of the normalized content is a "." (U+002E) separated list of the object identifier components of the value, optionally preceded by and/or followed by white space characters. Each object identifier component is encoded as a non- negative number string. A non-negative number string is either the digit character "0" (U+0030), or a non-zero decimal digit character (U+0031-U+0039) followed by zero, one or more of the decimal digit characters "0" to "9" (U+0030-U+0039). Examples The content of each of the following elements is the RXER encoding of an OBJECT IDENTIFIER value: 2.5.6.0 2.5.4.10 2.5.4.3 6.17. OCTET STRING A value of the OCTET STRING type is encoded such that the string value of the normalized content is the hexadecimal digit string representation of the octets, optionally preceded by and/or followed by white space characters. The octets are encoded in order from the first octet to the last octet. Each octet is encoded as a pair of the hexadecimal digit characters "0"-"9", "A"-"F" and "a"-"f" (i.e., U+0030-U+0039, U+0041-U+0046 and U+0061-U+0066) where the first digit in the pair corresponds to the four most significant bits of the octet. An odd number of hexadecimal digits is not permitted. The RXER encoding of OCTET STRING values is intended to conform to the lexical representation of the XML Schema [XSD2] hexBinary datatype. Examples The content of each of the following elements is the RXER encoding of an OCTET STRING value: Legg & Prager Expires 16 December 2004 [Page 15] INTERNET-DRAFT Robust XML Encoding Rules June 16, 2004 27F69A0300 efA03bFF 6.18. REAL A value of the REAL type is encoded such that the string value of the normalized content is the character string "0" if the value is positive zero, the character string "-0" if the value is negative zero, the character string "INF" if the value is positive infinity, the character string "-INF" if the value is negative infinity, the character string "NaN" if the value is not a number, or a real number otherwise, optionally preceded by and/or followed by white space characters in each case. A real number is the mantissa followed by either "E" (U+0045) or "e" (U+0065) and the exponent. If the exponent is zero then the "E" or "e" and exponent MAY be omitted. The mantissa is a sequence of one or more of the decimal digit characters "0" to "9" (U+0030-U+0039), with an optional leading sign, either "+" (U+002B) or "-" (U+002D). Multiple leading zero digits are permitted. The exponent is encoded as a number string (see Section 6.13). The RXER encoding of REAL values is intended to be compatible with the lexical representation of the XML Schema [XSD2] double datatype (but allows real values outside the range permitted by double). Examples The content of each of the following elements is the RXER encoding of a REAL value: 3.14159 1.0e6 INF -01e-06 6.19. SEQUENCE and SET Legg & Prager Expires 16 December 2004 [Page 16] INTERNET-DRAFT Robust XML Encoding Rules June 16, 2004 A value of a SEQUENCE type or a SET type is encoded such that the normalized content is a series of zero, one or more child element information items - one for each component value actually present in the SEQUENCE or SET value - optionally preceded by, followed by, and/or separated by white space character information items. Each component value corresponds to some NamedType in the SEQUENCE or SET type definition. The [local name] of the child element corresponding to the component value SHALL be the identifier of the corresponding NamedType, the [namespace name] of the child element SHALL have no value, and the content of the child element SHALL be the encoding of the component value according to the Type of the NamedType. The component values are encoded in the order of their corresponding NamedType definitions in the SEQUENCE or SET type definition. In the case of the SET type, this is a deliberate departure from BER where the components of a SET can be encoded in any order. If the SEQUENCE or SET type is extensible [X680] then the RXER decoder must be capable of skipping over any child element with a name that is not recognised, on the assumption that the sender is using a more recent definition of the SEQUENCE or SET type. Examples Consider this type definition: SEQUENCE { name [0] IA5String OPTIONAL, partNumber [1] INTEGER, quantity [2] INTEGER DEFAULT 0 } The content of each of the following elements is the RXER encoding of a value of the above type: 23 chisel 37 0 Legg & Prager Expires 16 December 2004 [Page 17] INTERNET-DRAFT Robust XML Encoding Rules June 16, 2004 1543 29 6.20. SEQUENCE OF and SET OF A value of a SEQUENCE OF or SET OF ASN.1 type is encoded such that the normalized content is a series of zero, one or more child elements - one for each component value - optionally preceded by, followed by, and/or separated by white space character information items. The [namespace name] of each child element SHALL have no value, and the content of each child element SHALL be the encoding of the corresponding component value according to the Type. For a value of a SEQUENCE OF NamedType or SET OF NamedType, the [local name] of each child element SHALL be the identifier of the NamedType. For a value of a SEQUENCE OF Type or SET OF Type, the [local name] of each child element SHALL be the literal "item". If the SEQUENCE OF or SET OF value has no component values then the enclosing element MAY be encoded as an empty element (i.e., using an empty-element tag). Examples Consider this type definition: SEQUENCE OF INTEGER The content of the following element is the RXER encoding of a value of the above type: 12 9 7 Consider this type definition: SEQUENCE OF timeStamp GeneralizedTime The content of the following element is the RXER encoding of a value of the above type: Legg & Prager Expires 16 December 2004 [Page 18] INTERNET-DRAFT Robust XML Encoding Rules June 16, 2004 2004-06-15T12:14:56Z 2004-06-15T12:18:13Z 2004-06-15T01:00:25Z 6.21. UTCTime A value of the UTCTime type is encoded such that the string value of the normalized content is optional leading whitespace characters followed by a date, the letter "T", a time of day, an optional time zone and optional trailing white space characters. The date is two decimal digits representing the century, followed by two decimal digits representing the year, "-" (U+002D), two decimal digits representing the month, "-" (U+002D), and two decimal digits representing the day. A UTCTime value does not indicate the century, therefore the century in the RXER encoding is generated from the year value as follows. If the year is in the range 50-99 then the century is "19", otherwise the century is "20". Note that RXER encoded UTCTime values with a four digit year outside the range 1950 to 2049 are illegal. RXER decoders MUST discard the century before passing a UTCTime value to an application. The time of day is two decimal digits representing the hour, followed by ":" (U+003A), two decimal digits representing the minutes, ":" (U+003A), and two decimal digits representing the seconds. The seconds are encoded as "00" if the UTCTime value omits seconds. The time zone, if present, is either the letter "Z" (U+005A) to indicate Coordinated Universal Time, a "+" (U+002B) followed by a time zone differential, or a "-" (U+002D) followed by a time zone differential. A time zone differential indicates the difference between local time (the time specified by the preceding date and time of day) and Coordinated Universal Time. Coordinated Universal Time can be calculated from the local time by subtracting the differential. A time zone differential is encoded as two decimal digits representing hours, the character ":" (U+003A), and two decimal digits representing minutes. The RXER encoding of UTCTime values is intended to conform to the Legg & Prager Expires 16 December 2004 [Page 19] INTERNET-DRAFT Robust XML Encoding Rules June 16, 2004 lexical representation of the XML Schema [XSD2] dateTime datatype. The inclusion of two digits for the century in the RXER encoding of a UTCTime value is not intended to alter UTCTime abstract values, nor to alter how applications might already calculate a suitable century for UTCTime values. The reason for including the century in the encoding is to allow the UTCTime type to be mapped [CXSD] to something meaningful in XML Schema (i.e., dateTime) so that XML Schema aware toolkits will invoke reasonably sensible default processing of UTCTime values. 6.22. Open Type A value of an open type denoted by an ObjectClassFieldType [X.681] is encoded according to the specific Type of the value. If the encoding of the value does not generate an attribute information item with the [local name] "type" and the [namespace name] "http://www.w3.org/2001/XMLSchema-instance" (i.e., xsi:type, see Sections 6.4 & 6.7) and the specific Type of the value of the open type has a Qualified Reference Name (see Section 3.1) then the [attributes] of the enclosing element SHOULD contain an attribute information item with the [local name] "type" and the [namespace name] "http://www.w3.org/2001/XMLSchema-instance" (i.e., xsi:type), where the [normalized value] of this attribute SHALL be the Qualified Reference Name. The xsi:type attribute is added by RXER encoders for the benefit of XML Schema validators. For an RXER decoder, the actual type in an open type value is generally determined by an associated component relation constraint [X682], hence RXER decoders MAY ignore the xsi:type attribute. Where the xsi:type attribute is present, appropriate namespace declaration attributes for the namespace names in the attribute's name and value MUST be added to the enclosing element's [attributes] if not already in the [in-scope namespaces] for the element. The namespace prefixes are the encoder's choice. Examples The content of the following element is the RXER encoding of an open type value containing a BOOLEAN value: true Legg & Prager Expires 16 December 2004 [Page 20] INTERNET-DRAFT Robust XML Encoding Rules June 16, 2004 6.23. AnyType The AnyType type [GLUE] is used to embed arbitrary XML within ASN.1 abstract values. The RXER encoding of a value of the AnyType type is intended to be Infoset equivalent to the original XML used to populate the AnyType value. The character string in the attributes or context component of the text alternative of an AnyType value is an XML textual representation of a sequence of attribute information items. The character string in the content component of the text alternative of an AnyType value is an XML textual representation of a sequence of character, comment, processing instruction and child element information items. A value of the AnyType type is encoded such that: a) the [children] of the enclosing element is the same as the sequence of information items represented by the content component, b) the [attributes] of the enclosing element includes the attribute information items represented by the attributes component, plus the namespace declarations in the context component that are not already defined in the [in-scope namespaces] of the enclosing element. The character string in the prolog component of the text alternative of an AnyType value is text conforming to the prolog production of XML [XML]. It is used to interpret entity references in the context, attributes or content components. Any entity references in the context, attributes or content components MUST either be replaced in the RXER encoding by their replacement text, or the corresponding entity declarations in the prolog component must be added to the DTD of the XML document containing the RXER encoding. Note that the latter may not be possible because of a conflict with an existing entity declaration of the same name. Such a conflict can be resolved by renaming one of the entities throughout the RXER encoding (to some unused name of the encoder's choosing), however applications will generally find it easier to expand out entity references at the earliest opportunity. If the content component is absent then the enclosing element MAY be encoded as an empty element (i.e., using an empty-element tag). Example Consider the following AnyType value represented in ASN.1 value Legg & Prager Expires 16 December 2004 [Page 21] INTERNET-DRAFT Robust XML Encoding Rules June 16, 2004 notation [X680]: text:{ context "xmlns:ns=""http://www.example.com/SLI""", attributes "bar=""0"" ns:foo=""1""", content { lf, " true", lf, " ", lf } } The content of the following element is an RXER encoding of the above AnyType value: true 7. RXER Transfer Syntax The following OBJECT IDENTIFIER has been assigned by Adacel Technologies, under an arc assigned to Adacel by Standards Australia, to identify the Robust XML Encoding Rules: { 1 2 36 79672281 0 2 } This OBJECT IDENTIFIER would be used, for example, to describe the transfer syntax for an RXER encoded data-value in an EMBEDDED PDV value. 8. Relationship to XER RXER and XER [X693] are separate, distinctly different and incompatible ASN.1 encoding rules for producing XML markup from ASN.1 abstract values. RXER is therefore unrelated to the XML ASN.1 Value Notation of X.680 [X680]. There is usually a requirement on applications specified in ASN.1 to maintain backward compatibility with the encodings generated by previous versions. The encodings in question are typically BER. Even with the backward compatibility constraint there is still considerable leeway for specification writers to rewrite the earlier specification. For example, renaming types, factoring out an in-line type definition as a named type (or the reverse), or replacing a type definition with an equivalent parameterized reference. These changes produce no change to BER, DER, CER, PER [X691], or GSER [RFC3641] encodings (so specification writers have felt free to make such Legg & Prager Expires 16 December 2004 [Page 22] INTERNET-DRAFT Robust XML Encoding Rules June 16, 2004 changes to improve their specification), but can change the [local name] of elements in the XER encoding. The RXER encoding is immune to this problem, thus RXER encodings are more stable than XER encodings over successive revisions of an ASN.1 specification. That has an obvious benefit for interoperability. RXER allows entity references, comments and PIs in encodings. XER does not. RXER is conformant with XML namespaces [XMLNS], while XER does not allow qualified names at all. RXER has also been designed so that is it possible to generate, from any arbitrary ASN.1 specification, a compatible XML Schema that will validate correct RXER encodings [CXSD]. The same is not generally true of XER, except by making changes to the original ASN.1 specification. 9. Security Considerations RXER does not necessarily enable the exact octet encoding of values of the TeletexString, VideotexString, GraphicString or GeneralString types to be reconstructed, so a transformation from DER to RXER and back to DER may not reproduce the original DER encoding. Therefore RXER MUST NOT be used to re-encode, whether for storage or transmission, ASN.1 abstract values whose original binary encoding must be recoverable. Such recovery is needed for the verification of digital signatures. In such cases, protocols ought to use DER or a DER-reversible encoding. When interpreting security-sensitive fields, and in particular fields used to grant or deny access, implementations MUST ensure that any comparisons are done on the underlying abstract value, regardless of the particular encoding used. 10. Acknowledgements This document and the technology it describes are a product of a joint research project between Adacel Technologies Limited and Deakin University on leveraging existing directory technology to produce an XML-based directory service. 11. References 11.1. Normative References [BCP14] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. Legg & Prager Expires 16 December 2004 [Page 23] INTERNET-DRAFT Robust XML Encoding Rules June 16, 2004 [XEDNS] Legg, S. and D. Prager, "The XML Enabled Directory: IANA Considerations", draft-legg-xed-iana-xx.txt, a work in progress, to be published. [GLUE] Legg, S. and D. Prager, "The XML Enabled Directory: Schema Language Integration", draft-legg-xed-glue-xx.txt, a work in progress, June 2004. [X680] ITU-T Recommendation X.680 (07/02) | ISO/IEC 8824-1, Information technology - Abstract Syntax Notation One (ASN.1): Specification of basic notation. [X681] ITU-T Recommendation X.681 (07/02) | ISO/IEC 8824-2, Information technology - Abstract Syntax Notation One (ASN.1): Information object specification. [X682] ITU-T Recommendation X.682 (07/02) | ISO/IEC 8824-3, Information technology - Abstract Syntax Notation One (ASN.1): Constraint specification. [X683] ITU-T Recommendation X.683 (07/02) | ISO/IEC 8824-4, Information technology - Abstract Syntax Notation One (ASN.1): Parameterization of ASN.1 specifications. [X690] ITU-T Recommendation X.690 (07/02) | ISO/IEC 8825-1, Information technology - ASN.1 encoding rules: Specification of Basic Encoding Rules (BER), Canonical Encoding Rules (CER) and Distinguished Encoding Rules (DER). [UCS] ISO/IEC 10646-1:2000, Information technology - Universal Multiple-Octet Coded Character Set (UCS) - Part 1: Architecture and Basic Multilingual Plane. [XML] Bray, T., Paoli, J., Sperberg-McQueen, C., Maler, E. and F. Yergeau, "Extensible Markup Language (XML) 1.0 (Third Edition)", W3C Recommendation, http://www.w3.org/TR/2004/REC-xml-20040204, February 2004. [XMLNS] Bray, T., Hollander, D. and A. Layman, "Namespaces in XML", http://www.w3.org/TR/1999/REC-xml-names-19990114, January 1999. [ISET] Cowan, J. and R. Tobin, "XML Information Set", W3C Recommendation, http://www.w3.org/TR/2001/REC-xml- infoset-20011024, October 2001. [XSD1] Thompson, H., Beech, D., Maloney, M. and N. Mendelsohn, Legg & Prager Expires 16 December 2004 [Page 24] INTERNET-DRAFT Robust XML Encoding Rules June 16, 2004 "XML Schema Part 1: Structures", W3C Recommendation, http://www.w3.org/TR/2001/REC-xmlschema-1-20010502, May 2001. 11.2. Informative References [RFC3641] Legg, S., "Generic String Encoding Rules (GSER) for ASN.1 Types", RFC 3641, October 2003. [ASD] Legg, S. and D. Prager, "ASN.1 Schema: An XML Representation for ASN.1 Specifications", draft-legg-xed-asd-xx.txt, a work in progress, June 2004. [CXSD] Legg, S. and D. Prager, "Translation of ASN.1 Specifications into XML Schema", draft-legg-xed-xsd-xx.txt, a work in progress, to be published. [X691] ITU-T Recommendation X.691 (07/02) | ISO/IEC 8825-4:2002, Information technology - ASN.1 encoding rules: Specification of Packed Encoding Rules (PER) [X693] ITU-T Recommendation X.693 (12/01) | ISO/IEC 8825-4:2002, Information technology - ASN.1 encoding rules: XML encoding rules (XER) [XSD2] Biron, P.V. and A. Malhotra, "XML Schema Part 2: Datatypes", W3C Recommendation, http://www.w3.org/TR/2001/REC-xmlschema-2-20010502, May 2001. Authors' Addresses Dr. Steven Legg Adacel Technologies Ltd. 250 Bay Street Brighton, Victoria 3186 AUSTRALIA Phone: +61 3 8530 7710 Fax: +61 3 8530 7888 EMail: steven.legg@adacel.com.au Dr. Daniel Prager C/o Professor Lynn Batten Department of Computing and Mathematics Deakin University Geelong, Victoria 3217 Legg & Prager Expires 16 December 2004 [Page 25] INTERNET-DRAFT Robust XML Encoding Rules June 16, 2004 AUSTRALIA EMail: dan@layabout.net EMail: lmbatten@deakin.edu.au Full Copyright Statement Copyright (C) The Internet Society (2004). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Intellectual Property The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Changes in Draft 00 The Directory XML Encoding Rules (DXER) have been renamed to the Robust XML Encoding Rules (RXER). The previous file name for this draft was draft-legg-xed-dxer-00.txt . Legg & Prager Expires 16 December 2004 [Page 26] INTERNET-DRAFT Robust XML Encoding Rules June 16, 2004 The rules for forming the [local name] and [namespace name] of the root element of a Standalone DXER Encoding have been changed to remove any dependency on type reference names. Changes in Draft 01 The namespace name for the ASN.1 namespace has been shortened. Additional insignificant leading and trailing white space is permitted in the encodings for some of the simple ASN.1 types in order to align them fully with their analogous XML Schema types. Changes in Draft 02 The AnyType ASN.1 type from [GLUE] has been revised to be a CHOICE whose only alternative is the previous SEQUENCE type. The description of the RXER encoding of values of AnyType has been revised to account for the change. Examples of RXER encodings have been added to the specification. Legg & Prager Expires 16 December 2004 [Page 27]