Internet DRAFT - draft-jaffer-metric-interchange-format

draft-jaffer-metric-interchange-format



INTERNET-DRAFT                                          A. Jaffer
Expires: June 2001                                      January 2001

            draft-jaffer-metric-interchange-format-03.txt

          Representation of numerical values and SI units in
            character strings for information interchanges

                         Status of this Memo

        This document is an Internet-Draft and is in full conformance
        with all provisions of Section 10 of RFC2026.

     Internet-Drafts are working documents of the Internet Engineering
     Task Force (IETF), its areas, and its working groups.  Note that
     other groups may also distribute working documents as
     Internet-Drafts.

     Internet-Drafts are draft documents valid for a maximum of six
     months and may be updated, replaced, or obsoleted by other
     documents at any time.  It is inappropriate to use
     Internet-Drafts as reference material or to cite them other than
     as "work in progress."

     The list of current Internet-Drafts can be accessed at
     http://www.ietf.org/ietf/1id-abstracts.txt

     The list of Internet-Draft Shadow Directories can be accessed at
     http://www.ietf.org/shadow.html.

The previous four paragraphs are required verbatim by
http://www.ietf.org/ietf/1id-guidelines.txt.  But as of January 10
2001, the URL cited for 1id-abstracts, was last updated August 1997.
The list of current Internet-Drafts is actually
http://www.ietf.org/1id-abstracts.html.

                               Abstract

This document describes a character string encoding for numerical
values and units which:

 * is unambiguous in all locales;

 * uses only [TOG] "Portable Character Set" characters matching "Basic
   Latin" characters in Plane 0 of the Universal Character Set [UCS];

 * is transparent to [UTF-7] and [UTF-8] UCS transformation formats;

 * is human readable and writable;

 * is machine readable and writable;

 * incorporates SI prefixes and units;

 * incorporates [ISO 6093] numbers; and

 * incorporates [IEC 60027-2] binary prefixes.

                              Motivation

According to [NASA 1999] Arthur Stephenson, chairman of the Mars Climate
Orbiter Mission Failure Investigation Board:

    "The 'root cause' of the loss of the spacecraft was the failed
    translation of English units into metric units in a segment of
    ground-based, navigation-related mission software, ..."

Although the [ISO 6093] standard for automated interchange of numerical
data is widely used, standardized measurement units (other than for page
formating) are not routinely attached to interchange data.

                      Relation to Previous Work

The 1986 standard "Representations for U.S. Customary, SI, and Other
Units to Be Used in Systems with Limited Character Sets" [ANSI X3.50]
states:

    This standard was not designed for ... usage by humans as input
    to, or output from, data systems. ...  They should never be
    printed out for publication or for other forms of public
    information transfer.

[ANSI X3.50] representations of units are ambiguous.  "min" is both
"minute" and "milliinch"; "cd" is both "candela" and "centiday".

Apart from SI units, [ANSI X3.50] supports only U.S. local units, is
not complete in that support, and has no provision for extension to
other locales.  But non-SI unit systems are in such disarray that
using them for interchange is not practical.  Unit names signify
different volumes in different locales; the Canadian gallon is 4.54609
liters, while the U.S. gallon is 3.785412 liters.  The "CRC Handbook
of Chemistry and Physics" [CRC] lists no less than six distinct
(incompatible) systems of wire gauges.

The character set limitations targeted by [ANSI X3.50], namely single
alphabetic case, are no longer common in data interchanges.  But much
of its double case "Form I" SI unit representations are similar to
those presented here.

The audience for metric standards has changed and grown.  In the
preface to "Guide for the Use of the International System of Units
(SI)" [NIST 811], B. Taylor writes:

    The International System of Units, universally abbreviated SI, is
    the modern metric system of measurement.  Long the dominant
    measurement system used in science, the SI is becoming the
    dominant measurement system used in international commerce.

[NIST 811] details a methodology for expressing measurement units in
both text and symbolic form in scientific and other documents.  Its
unit expressions combine over 40 metric base and derived unit symbols
unambiguously.  Taylor's unit symbols are the basis for this metric
interchange format.

                      Metric Interchange Format

In the expression for the value of a quantity, the unit symbol is placed
after the numerical value.  A dot (PERIOD, ".") is placed between the
numerical value and the unit symbol.

Within a compound unit, each of the base and derived symbols can
optionally have an attached SI prefix.

Unit symbols formed from other unit symbols by multiplication are
indicated by means of a dot (PERIOD, ".") placed between them.

Unit symbols formed from other unit symbols by division are indicated by
means of a SOLIDUS ("/") or negative exponents.  The SOLIDUS must not be
repeated in the same compound unit unless contained within a
parenthesized subexpression.

The grouping formed by a prefix symbol attached to a unit symbol
constitutes a new inseparable symbol (forming a multiple or submultiple
of the unit concerned) which can be raised to a positive or negative
power and which can be combined with other unit symbols to form compound
unit symbols.

The grouping formed by surrounding compound unit symbols with
parentheses ("(" and ")") constitutes a new inseparable symbol which can
be raised to a positive or negative power and which can be combined with
other unit symbols to form compound unit symbols.

Compound prefix symbols, that is, prefix symbols formed by the
juxtaposition of two or more prefix symbols, are not permitted.

Prefix symbols are not used with the time-related unit symbols min
(minute), h (hour), d (day).  No prefix symbol may be used with dB
(decibel).  Only submultiple prefix symbols may be used with the unit
symbols L (liter), Np (neper), o (degree), oC (degree Celsius), rad
(radian), and sr (steradian).  Submultiple prefix symbols may not be
used with the unit symbols t (metric ton), r (revolution), or Bd (baud).

A unit exponent follows the unit, separated by a CIRCUMFLEX ("^").
Exponents may be positive or negative.  Fractional exponents must be
parenthesized.

                           Alphabetic Case

The case of letters in unit symbols must match the symbols specified
here.  Unit symbols are composed of lower-case letters except that:

 * the first letter of the symbol is an upper-case letter when the
   name of the unit is derived from the name of a person; and

 * the symbol for the liter is L.

The prefix symbols Y (yotta), Z (zetta), E (exa), P (peta), T (tera),
G (giga), and M (mega) are printed in upper-case letters while all
other prefix symbols are printed in lower-case letters.

                             SI Prefixes

       Factor     Name    Symbol  |  Factor     Name    Symbol
       ======     ====    ======  |  ======     ====    ======
        1e24      yotta      Y    |   1e-1      deci       d
        1e21      zetta      Z    |   1e-2      centi      c
        1e18      exa        E    |   1e-3      milli      m
        1e15      peta       P    |   1e-6      micro      u
        1e12      tera       T    |   1e-9      nano       n
        1e9       giga       G    |   1e-12     pico       p
        1e6       mega       M    |   1e-15     femto      f
        1e3       kilo       k    |   1e-18     atto       a
        1e2       hecto      h    |   1e-21     zepto      z
        1e1       deka       da   |   1e-24     yocto      y

                            Binary Prefixes

These binary prefixes are valid only with the units B (byte) and bit.
However, decimal prefixes can also be used with bit; and decimal
multiple (not submultiple) prefixes can also be used with B (byte).

                Factor       (power-of-2)  Name  Symbol
                ======       ============  ====  ======
       1.152921504606846976e18  (2^60)     exbi    Ei
          1.125899906842624e15  (2^50)     pebi    Pi
             1.099511627776e12  (2^40)     tebi    Ti
                1.073741824e9   (2^30)     gibi    Gi
                   1.048576e6   (2^20)     mebi    Mi
                      1.024e3   (2^10)     kibi    Ki

                             Unit Symbols

    Type of Quantity      Name          Symbol   Equivalent
    ================      ====          ======   ==========
time                      second           s
time                      minute           min = 60.s
time                      hour             h   = 60.min
time                      day              d   = 24.h
frequency                 hertz            Hz    s^-1
signaling rate            baud             Bd    s^-1
length                    meter            m
volume                    liter            L     dm^3
plane angle               radian           rad
solid angle               steradian        sr    rad^2
plane angle               revolution     * r   = 6.283185307179586.rad
plane angle               degree         * o   = 2.777777777777778e-3.r
information capacity      bit              bit
information capacity      byte, octet      B   = 8.bit
mass                      gram             g
mass                      ton              t     Mg
mass              unified atomic mass unit u   = 1.66053873e-27.kg
amount of substance       mole             mol
catalytic activity        katal            kat   mol/s
thermodynamic temperature kelvin           K
temperature               degree Celsius   oC
luminous intensity        candela          cd
luminous flux             lumen            lm    cd.sr
illuminance               lux              lx    lm/m^2
force                     newton           N     m.kg.s^-2
pressure, stress          pascal           Pa    N/m^2
energy, work, heat        joule            J     N.m
energy                    electronvolt     eV  = 1.602176462e-19.J
power, radiant flux       watt             W     J/s
logarithm of power ratio  neper            Np
logarithm of power ratio  decibel        * dB  = 0.1151293.Np
electric current          ampere           A
electric charge           coulomb          C     s.A
electric potential, EMF   volt             V     W/A
capacitance               farad            F     C/V
electric resistance       ohm              Ohm   V/A
electric conductance      siemens          S     A/V
magnetic flux             weber            Wb    V.s
magnetic flux density     tesla            T     Wb/m^2
inductance                henry            H     Wb/A
radionuclide activity     becquerel        Bq    s^-1
absorbed dose energy      gray             Gy    m^2.s^-2
dose equivalent           sievert          Sv    m^2.s^-2

* The exact formulas are:

r/rad = 8 * atan(1)
o/r   = 1 / 360
db/Np = ln(10) / 20

                            Unit Examples

    Type of Quantity        Name                            Symbol
    ================        ====                            ======
area                        square meter                    m^2
volume                      cubic meter                     m^3
speed, velocity             meter per second                m/s
acceleration                meter per second squared        m/s^2
wave number                 reciprocal meter                m^-1
mass density (density)      kilogram per cubic meter        kg/m^3
specific volume             cubic meter per kilogram        m^3/kg
current density             ampere per square meter         A/m^2
magnetic field strength     ampere per meter                A/m
concentration               mole per cubic meter            mol/m^3
luminance                   candela per square meter        cd/m^2
angular velocity            radian per second               rad/s
angular acceleration        radian per second squared       rad/s^2
dynamic viscosity           pascal second                   Pa.s
moment of force             newton meter                    N.m
surface tension             newton per meter                N/m
heat flux density           watt per square meter           W/m^2
radiant intensity           watt per steradian              W/sr
radiance                    watt per square meter steradian W/(m^2.sr)
heat capacity, entropy      joule per kelvin                J/K
specific heat or entropy    joule per kilogram kelvin       J/(kg.K)
specific energy             joule per kilogram              J/kg
thermal conductivity        watt per meter kelvin           W/(m.K)
energy density              joule per cubic meter           J/m^3
electric field strength     volt per meter                  V/m
electric charge density     coulomb per cubic meter         C/m^3
electric flux density       coulomb per square meter        C/m^2
permittivity                farad per meter                 F/m
permeability                henry per meter                 H/m
molar energy                joule per mole                  J/mol
molar entropy or heat       joule per mole kelvin           J/(mol.K)
exposure (x and g rays)     coulomb per kilogram            C/kg
rotational speed            revolution per minute           r/min
catalytic concentration     katal per cubic meter           kat/m^3
data rate                   mebibit per second              Mib/s
noise voltage density       nanovolt per root hertz         nV/Hz^(1/2)

               Use of Metric Units by Computer Programs

Metric units attached to individual numerical values have the format
described above.  An unattached unit can be used to specify the units
applying to a row, column, or entire table of numerical values; or for
other purposes.

Programming language support for metric interchange should be provided
by a function of two unit arguments returning a conversion factor.
Multiplying a numerical value expressed in the second unit by the
returned conversion factor yields the numerical value expressed in the
first unit.  This function must return a non-positive number if either
of its arguments is not a syntactically valid unit; or if the conversion
factor does not exist.

                              Examples

    UCF("km/s", "m/s" ) --> 0.001     UCF("N"   , "m/s" ) --> 0
    UCF("moC" , "oC"  ) --> 1000      UCF("mK"  , "oC"  ) --> 0
    UCF("rad" , "o"   ) --> 0.0174533 UCF("K"   , "o"   ) --> 0
    UCF("K"   , "K"   ) --> 1         UCF("oK"  , "oK"  ) --> -3
    UCF(""    , "s/s" ) --> 1         UCF("km/h", "mph" ) --> -2

                    Programming Language Extension

Lexical numerical constants in the programming languages C, Pascal,
and Scheme could be extended to incorporate Metric Interchange Syntax
compatibly with their current syntaxes; but this is not required for
supporting input and output of units.

                              Rationales
                              ==========

                        Portability of Numbers

"Representation of numerical values in character strings for information
interchanges", [ISO 6093], specifies the three machine-readable
presentations in widespread use (Integer, Decimal, and Exponential
notations) using only the characters:

 <space>
 <left-parenthesis>     (
 <right-parenthesis>    )
 <comma>                ,
 <plus-sign>            +
 <hyphen-minus>         -
 <period>               .
 <E>                    E
 <e>                    e
 <digit>                0 - 9

In [UTF-7] the character PLUS-SIGN ("+") is not directly encoded,
requiring multi-octet encoding.  But every [ISO 6093] numeric value can
be expressed without the use of PLUS-SIGN.  So the number syntax given
here does not include PLUS-SIGN.

Locale charsets all support the digits 0 to 9.  There are only 3
LC_NUMERIC attributes: decimal_point, thousands_sep, and grouping.  [ISO
6093] specifies use of either "." or "," for the decimal point.  [ISO
6093] does not allow grouping.  There is no LC_NUMERIC attribute for
exponent.  Thus Latin characters ("e" or "E") must be available in all
languages which support [ISO 6093].

The programming languages C, Fortran, PL/I, Pascal, and Scheme accept
[ISO 6093] numbers both as lexical constants and as input data.

                        Portable Character Set

Of the SI symbols, the "micro" prefix (GREEK-SMALL-LETTER-MU or
MICRO-SIGN), "ohm" symbol (GREEK-CAPITAL-LETTER-OMEGA), and "degree"
symbol (DEGREE-SIGN) are not supported by all charset encodings.  By
substituting "u", "Ohm", and "o" respectively, the unit symbols remain
readable while preserving the system's unambiguity.

Taylor recommends using the MIDDLE-DOT character between multiplied unit
symbols.  To support those charset encodings lacking MIDDLE-DOT, metric
interchange format instead uses PERIOD (".").

The unit superscript exponents could be formed using SUPERSCRIPT-MINUS,
SUPERSCRIPT-ONE, SUPERSCRIPT-TWO, SUPERSCRIPT-THREE, etc.  But these
characters are not universal.  So the CIRCUMFLEX (^) is placed between a
unit and its exponent, written with a portable (HYPHEN-MINUS and) digit.

The symbol for the liter, L, was adopted by the General Conference on
Weights and Measures in order to avoid the risk of confusion between the
letter l and the number 1.

Metric Interchange Format (including numbers) uses only the characters:

 <left-parenthesis>     (
 <right-parenthesis>    )
 <comma>                ,
 <hyphen-minus>         -
 <period>               .
 <solidus>              /
 <circumflex>           ^
 <digit>                0 - 9
 <upper>                A - Z
 <lower>                a - z

                      Binary Units and Prefixes

Computer professionals sometimes use the term "kilobyte" to mean 1024
bytes.  However, standards for data interchange must be unambiguous in
all contexts.  In December 1998 the International Electrotechnical
Commission (IEC) approved as an IEC International Standard [IEC 60027-2]
names and symbols for prefixes for binary multiples for use in the
fields of data processing and data transmission.

As of 2000, the units bit and byte have not been accepted for use with
SI, but are in widespread use.  The IEC symbols are "B" for byte and
"bit" for bit.  To avoid conflict for "B", the bel was replaced by the
decibel (dB).

                               Miscellany

Because white noise power in a bandwidth is proportional to that
bandwidth, electronic noise units can have fractional exponents as in
nV/Hz^(1/2) (nanovolt per root hertz).

Degree Celsius (oC) is not convertible to kelvin (K) by multiplication
of a constant.  Thus the formula "oC = K - 273.15" does not appear in
the "Unit Symbols" table; and the conversion-factor function must return
a non-positive number when called to convert between oC and K.

                Programming Language Syntax Extension

Because a PERIOD (".") after a numerical lexical constant is not
specified in the syntax of the programming languages C, Pascal, and
Scheme, the syntax of their lexical constants could be extended to
incorporate SI unit symbols.  The syntax of "double" in Java could
similarly be extended.

                            Acknowledgements

Arnold G. Reinhold helped complete and clarify ideas and presentation.

                             Author Address
Aubrey Jaffer                                      agj @ alum.mit.edu
33 Buehler Rd.
Bedford MA 01730, USA

                              References
[ANSI X3.50]
     ANSI, "Representations for U.S. customary, SI, and other units to
     be used in systems with limited character sets", ANSI X3.50, 1986.

[CODATA 1998]
     P. Mohr and B. Taylor, "CODATA Recommended Values of the
     Fundamental Physical Constants", National Institute of standards
     and Technology, 1998.

[CRC]
     Chemical Rubber Company, "CRC handbook of chemistry and physics",
     CRC Press, 67th edition, 1986.

[IEC 60027-2]
     IEC, Amendment 2 to IEC International Standard IEC 60027-2:
     "Letter symbols to be used in electrical technology - Part 2:
     Telecommunications and electronics.", January 1999.

[ISO 2955]
     ISO, "Information processing-Representation of SI and other units
     in systems with limited character sets", ISO 2955:1983.

[ISO 6093]
     ISO, "Representation of numerical values in character strings for
     information interchanges", ISO 6093:1985.

[NASA 1999]
     NASA, "Mars Climate Orbiter Failure Board Releases Report",
     http://mars.jpl.nasa.gov/msp98/news/mco991110.html, November 1999.

[NIST 811]
     Taylor, B., "Guide for the Use of the International System of
     Units (SI)", NIST Special Publication 811, 1995 Edition.

[TOG]
     The Open Group, "The Single UNIX Specification, Version 2",
     http://www.opennc.org/onlinepubs/7908799/xbd/charset.html, 1997.

[UCS]
     ISO, "Universal Multiple-Octet Coded Character Set (UCS) - Part 1:
     Architecture and Basic Multilingual Plane (BMP)", ISO/IEC 10646-1,
     March 2000.

[UNICODE]
     The Unicode Consortium, "The Unicode Standard, Version 3.0"
     Addison-Wesley Pub Co, February, 2000.

[UTF-7]
     D. Goldsmith, "UTF-7, A Mail-Safe Transformation Format of
     Unicode", RFC 2152, May 1997.

[UTF-8]
     F. Yergeau, "UTF-8, a transformation format of ISO 10646", RFC 2279
     January 1998.

                      Metric Interchange Syntax

 quantity_value
        : numerical_value
        | numerical_value '.' unit
        ;

 unit
        : unit_product
        | unit_product '/(' unit_product ')'
        | unit_product '/(' unit_product ')^' uxponent
        | unit_product '/' single_unit
        ;

 unit_product
        : unit_product '.' single_unit
        | single_unit
        ;

 uxponent
        : uinteger
        | '-' uinteger
        | '(' uinteger '/' uinteger ')'
        | '(-' uinteger '/' uinteger ')'
        ;

 single_unit
        : punit
        | punit '^' uxponent
        ;

 punit
        : decimal_multiple_prefix unit_p_symbol
        | decimal_submultiple_prefix unit_n_symbol
        | decimal_multiple_prefix unit_b_symbol
        | decimal_submultiple_prefix unit_b_symbol
        | binary_prefix 'B'
        | binary_prefix 'bit'
        | unit_p_symbol
        | unit_n_symbol
        | unit_b_symbol
        | unit___symbol
        ;

 decimal_multiple_prefix
        : 'E' | 'G' | 'M' | 'P' | 'T' | 'Y' | 'Z' | 'da' | 'h' | 'k'
        ;

 decimal_submultiple_prefix
        : 'a' | 'c' | 'd' | 'f' | 'm' | 'n' | 'p' | 'u' | 'y' | 'z'
        ;

 binary_prefix
        : 'Ei' | 'Gi' | 'Ki' | 'Mi' | 'Pi' | 'Ti'
        ;

 unit_p_symbol
        : 'B' | 'Bd' | 'r' | 't'
        ;

 unit_n_symbol
        : 'L' | 'Np' | 'o' | 'oC' | 'rad' | 'sr'
        ;

 unit_b_symbol
        : 'A' | 'Bq' | 'C' | 'F' | 'Gy' | 'H' | 'Hz' | 'J' | 'K' | 'N'
        | 'Ohm' | 'Pa' | 'S' | 'Sv' | 'T' | 'V' | 'W' | 'Wb' | 'bit'
        | 'cd' | 'eV' | 'g' | 'kat' | 'lm' | 'lx' | 'm' | 'mol' | 's'
        ;

 unit___symbol
        : 'd' | 'dB' | 'h' | 'min' | 'u'
        ;

 real
        : ureal
        : '-' ureal
        ;

 ureal
        : numerical_value
        | numerical_value suffix
        ;

 numerical_value
        : uinteger
        | dot uinteger
        | uinteger dot uinteger
        | uinteger dot
        ;

 dot
        : '.' | ','
        ;

 uinteger
        : digit uinteger
        | uinteger
        ;

 suffix
        : exponent_marker uinteger
        | exponent_marker '-' uinteger
        ;

 exponent_marker
        : 'e' | 'E'
        ;

 digit
        : '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
        ;

                              Revisions

2001-02-16  draft-jaffer-metric-interchange-format-03.txt:

        * (Metric Interchange Syntax) moved to end.
        * (b) removed as symbol for bit.
        * (bit) allowed with all decimal prefixes.
        * (Portability of Numbers) Fortran and PL/I added.
        * (Motivation) [CNN 1999] replaced with [NASA 1999].

2001-01-18  draft-jaffer-metric-interchange-format-02.txt:

        * ([IEC 60027-2] Prefixes): Added binary prefixes.
        * (Unit Symbols): Added 'katal', byte, and bit.
        * (Unit Symbols): Replaced bel (B) with decibel (dB) to avoid
          confusion with byte.
        * (References): Added [IEC 60027-2].
        * (Abstract): Clarified character encoding claims.
        * (Metric Interchange Syntax): Removed use of '+'.
        * (Portability of Numbers): Described '+' conflict and
          resolution.
        * (Portable Character Set): Enumerated characters used.
        * (References): Added [UTF-7], [UTF-8], [UCS], and [UNICODE].
        * (Use of Metric Units by Computer Programs): Error returns
          generalized to being non-positive.
        * (Rationales): Put enumerated characters into tables.
        * (Unit Symbols): Reordered; added formulas for dB, r, and o.

2001-01-15  draft-jaffer-metric-interchange-format-01.txt:

        * Added section "Revisions".
        * (Unit Symbols): Added derived unit "revolution" ('r').
        * (Unit Examples): Added "revolution per minute" ('r/min').
        * (Unit Symbols): sr is rad^2; Numeric equivalence for degree.
        * (Metric Interchange Syntax): Removed 'ua'.
        * (Unit Symbols): B (Bel) now derived from NP (Neper).
        * (Unit Examples): Added returned values.
        * (Metric Interchange Syntax): Added uxponent.
        * (Unit Symbols): Updated eV and u from [CODATA 1998].
        * (References): Added [CODATA 1998].
        * (Use of Metric Units by Computer Programs): swapped argument
          order.

2001-01-12  draft-jaffer-metric-interchange-format-00.txt:

        * Initial release.

                          Expires: June 2001