INTERNET-DRAFT A. Jaffer Expires: June 2001 January 2001 draft-jaffer-metric-interchange-format-00.txt Representation of numerical values and SI units in character strings for information interchanges Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. The previous four paragraphs are required verbatim by http://www.ietf.org/ietf/1id-guidelines.txt. But as of January 10 2001, the URL cited for 1id-abstracts, was last updated August 1997. The list of current Internet-Drafts is actually http://www.ietf.org/1id-abstracts.html. Abstract This document describes a character string encoding for numerical values and units which: * is unambiguous in all locales; * uses only "Portable Character Set" characters; * is human readable and writable; * is machine readable and writable; * incorporates SI prefixes and units; and * incorporates [ISO 6093] numbers. Motivation In 1999, [CNN 1999] reported in its article "European Mars mission looks for lessons in polar lander loss": The Climate Orbiter was lost due to human error when mission personnel failed to convert numerical data from English units into metric. Although the [ISO 6093] standard for automated interchange of numerical data is widely used, standardized measurement units (other than for page formating) are not routinely attached to interchange data. Relation to Previous Work The 1986 standard "Representations for U.S. Customary, SI, and Other Units to Be Used in Systems with Limited Character Sets" [ANSI X3.50] states: This standard was not designed for ... usage by humans as input to, or output from, data systems. ... They should never be printed out for publication or for other forms of public information transfer. [ANSI X3.50] representations of units are ambiguous. "min" is both "minute" and "milliinch"; "cd" is both "candela" and "centiday". Apart from SI units, [ANSI X3.50] supports only U.S. local units, is not complete in that support, and has no provision for extension to other locales. But non-SI unit systems are in such disarray that using them for interchange is not practical. Unit names signify different volumes in different locales; the Canadian gallon is 4.54609 liters, while the U.S. gallon is 3.785412 liters. The "CRC Handbook of Chemistry and Physics" [CRC] lists no less than six distinct (incompatible) systems of wire gauges. The character set limitations targeted by [ANSI X3.50], namely single alphabetic case, are no longer common in data interchanges. But much of its double case "Form I" SI unit representations are similar to those presented here. The audience for metric standards has changed and grown. In the preface to "Guide for the Use of the International System of Units (SI)" [NIST 811], B. Taylor writes: The International System of Units, universally abbreviated SI, is the modern metric system of measurement. Long the dominant measurement system used in science, the SI is becoming the dominant measurement system used in international commerce. [NIST 811] details a methodology for expressing measurement units in both text and symbolic form in scientific and other documents. Its unit expressions combine over 40 metric base and derived unit symbols unambiguously. Taylor's unit symbols are the basis for this metric interchange format. International Units In the expression for the value of a quantity, the unit symbol is placed after the numerical value. A dot (FULL STOP, ".") is placed between the numerical value and the unit symbol. Within a compound unit, each of the base and derived symbols can optionally have an attached SI prefix. Unit symbols formed from other unit symbols by multiplication are indicated by means of a dot (FULL STOP, ".") placed between them. Unit symbols formed from other unit symbols by division are indicated by means of a SOLIDUS ("/") or negative exponents. The SOLIDUS must not be repeated in the same compound unit unless contained within a parenthesized subexpression. The grouping formed by a prefix symbol attached to a unit symbol constitutes a new inseparable symbol (forming a multiple or submultiple of the unit concerned) which can be raised to a positive or negative power and which can be combined with other unit symbols to form compound unit symbols. The grouping formed by surrounding compound unit symbols with parentheses ("(" and ")") constitutes a new inseparable symbol which can be raised to a positive or negative power and which can be combined with other unit symbols to form compound unit symbols. Compound prefix symbols, that is, prefix symbols formed by the juxtaposition of two or more prefix symbols, are not permitted. Prefix symbols are not used with the time-related unit symbols min (minute), h (hour), d (day). Only submultiple prefix symbols may be used with the unit symbols L (liter), B (bel), Np (neper), o (degree), oC (degree Celsius), rad (radian), and sr (steradian). Submultiple prefix symbols may not be used with the unit symbols t (metric ton) or Bd (baud). A unit exponent follows the unit, separated by a circumflex ("^"). Exponents may be positive or negative. Fractional exponents must be parenthesized. Alphabetic Case The case of letters in unit symbols must match the symbols specified here. Unit symbols are composed of lower-case letters except that: * the first letter of the symbol is an upper-case letter when the name of the unit is derived from the name of a person; and * the symbol for the liter is L. The prefix symbols Y (yotta), Z (zetta), E (exa), P (peta), T (tera), G (giga), and M (mega) are printed in upper-case letters while all other prefix symbols are printed in lower-case letters. SI Prefixes Factor Name Symbol | Factor Name Symbol ====== ==== ====== | ====== ==== ====== 1e24 yotta Y | 1e-1 deci d 1e21 zetta Z | 1e-2 centi c 1e18 exa E | 1e-3 milli m 1e15 peta P | 1e-6 micro u 1e12 tera T | 1e-9 nano n 1e9 giga G | 1e-12 pico p 1e6 mega M | 1e-15 femto f 1e3 kilo k | 1e-18 atto a 1e2 hecto h | 1e-21 zepto z 1e1 deka da | 1e-24 yocto y Unit Symbols Type of Quantity Name Symbol Equivalent ================ ==== ====== ========== length meter m mass gram g time second s electric current ampere A thermodynamic temperature kelvin K amount of substance mole mol luminous intensity candela cd plane angle radian rad m/m solid angle steradian sr m^2/m^2 frequency hertz Hz s^-1 force newton N m.kg.s^-2 pressure, stress pascal Pa N/m^2 energy, work, heat joule J N.m power, radiant flux watt W J/s electric charge coulomb C s.A electric potential, EMF volt V W/A capacitance farad F C/V electric resistance ohm Ohm V/A electric conductance siemens S A/V magnetic flux weber Wb V.s magnetic flux density tesla T Wb/m^2 inductance henry H Wb/A luminous flux lumen lm cd.sr illuminance lux lx lm/m^2 radionuclide activity becquerel Bq s^-1 absorbed dose energy gray Gy m^2.s^-2 dose equivalent sievert Sv m^2.s^-2 volume liter L dm^3 mass ton t Mg logarithm of power ratio neper Np logarithm of power ratio bel B signaling rate baud Bd s^-1 plane angle degree o = [pi/180] temperature degree Celsius oC time minute min = 60.s time hour h = 60.min time day d = 24.h energy electronvolt eV = 1.60217733e-19.J mass unified atomic mass unit u = 1.6605402e-27.kg Unit Examples Type of Quantity Name Symbol ================ ==== ====== area square meter m^2 volume cubic meter m^3 speed, velocity meter per second m/s acceleration meter per second squared m/s^2 wave number reciprocal meter m^-1 mass density (density) kilogram per cubic meter kg/m^3 specific volume cubic meter per kilogram m^3/kg current density ampere per square meter A/m^2 magnetic field strength ampere per meter A/m concentration mole per cubic meter mol/m^3 luminance candela per square meter cd/m^2 angular velocity radian per second rad/s angular acceleration radian per second squared rad/s^2 dynamic viscosity pascal second Pa.s moment of force newton meter N.m surface tension newton per meter N/m heat flux density watt per square meter W/m^2 radiant intensity watt per steradian W/sr radiance watt per square meter steradian W/(m^2.sr) heat capacity, entropy joule per kelvin J/K specific heat or entropy joule per kilogram kelvin J/(kg.K) specific energy joule per kilogram J/kg thermal conductivity watt per meter kelvin W/(m.K) energy density joule per cubic meter J/m^3 electric field strength volt per meter V/m electric charge density coulomb per cubic meter C/m^3 electric flux density coulomb per square meter C/m^2 permittivity farad per meter F/m permeability henry per meter H/m molar energy joule per mole J/mol molar entropy or heat joule per mole kelvin J/(mol.K) exposure (x and g rays) coulomb per kilogram C/kg noise voltage density nanovolt per root hertz nV/Hz^(1/2) Use of Metric Units by Computer Programs Metric units attached to individual numerical values have the format described above. An unattached unit can be used to specify the units applying to a row, column, or entire table of numerical values; or for other purposes. Programming language support for metric interchange should be provided by a function of two unit arguments returning a conversion factor. Multiplying a numerical value expressed in the first unit by the returned conversion factor yields the numerical value expressed in the second unit. This function must return 0.0 (zero) if either of its arguments is not a syntactically valid unit; or if the conversion factor does not exist. Examples: UCF("m/s", "km/s") but not: UCF("m/s", "N") UCF("oC", "moC") but not: UCF("oC", "mK") UCF("o", "rad") but not: UCF("o", "K") UCF("K", "K") but not: UCF("oK", "oK") UCF("s/s", "") but not: UCF("mph", "km/h") Programming Language Extension Lexical numerical constants in the programming languages C, Pascal, and Scheme could be extended to incorporate Metric Interchange Syntax compatibly with their current syntaxes; but this is not required for supporting input and output of units. Metric Interchange Syntax Here is a YACC-like syntax for metric quantities (with [ISO 6093] numbers). quantity_value : numerical_value | numerical_value '.' unit ; unit : unit_product | unit_product '/(' unit_product ')' | unit_product '/(' unit_product ')^' digit | unit_product '/' single_unit ; unit_product : unit_product '.' single_unit | single_unit ; single_unit : punit | punit '^' digit | punit '^(1/2)' | punit '^-' digit | punit '^(-1/2)' ; punit : decimal_multiple_prefix unit_p_symbol | decimal_submultiple_prefix unit_n_symbol | decimal_multiple_prefix unit_b_symbol | decimal_submultiple_prefix unit_b_symbol | unit_p_symbol | unit_n_symbol | unit_b_symbol | unit___symbol ; decimal_multiple_prefix : 'E' | 'G' | 'M' | 'P' | 'T' | 'Y' | 'Z' | 'da' | 'h' | 'k' ; decimal_submultiple_prefix : 'a' | 'c' | 'd' | 'f' | 'm' | 'n' | 'p' | 'u' | 'y' | 'z' ; unit_p_symbol : 'Bd' | 't' ; unit_n_symbol : 'B' | 'L' | 'Np' | 'o' | 'oC' | 'rad' | 'sr' ; unit_b_symbol : 'A' | 'Bq' | 'C' | 'F' | 'Gy' | 'H' | 'Hz' | 'J' | 'K' | 'N' | 'Ohm' | 'Pa' | 'S' | 'Sv' | 'T' | 'V' | 'W' | 'Wb' | 'cd' | 'eV' | 'g' | 'lm' | 'lx' | 'm' | 'mol' | 's' ; unit___symbol : 'd' | 'h' | 'min' | 'u' | 'ua' ; real : ureal : sign ureal ; ureal : numerical_value | numerical_value suffix ; numerical_value : uinteger | dot uinteger | uinteger dot uinteger | uinteger dot ; dot : '.' | ',' ; uinteger : digit uinteger | uinteger ; suffix : exponent_marker uinteger | exponent_marker sign uinteger ; exponent_marker : 'e' | 'E' ; sign : '+' | '-' ; digit : '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ; Rationales ========== Portability of Numbers "Representation of numerical values in character strings for information interchanges", [ISO 6093], specifies the three machine-readable presentations in widespread use (Integer, Decimal, and Exponential notations) using only the characters "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", ",", ".", "+", "-", "E", "e", and space (" "). Locale charsets all support the digits 0 to 9. There are only 3 LC_NUMERIC attributes: decimal_point, thousands_sep, and grouping. [ISO 6093] specifies use of either "." or "," for the decimal point. [ISO 6093] does not allow grouping. There is no LC_NUMERIC attribute for exponent. Thus Latin characters ("e" or "E") must be available in all languages which support [ISO 6093]. The programming languages C, Pascal, and Scheme accept [ISO 6093] numbers both as lexical constants and as input data. Portable Character Set Of the SI symbols, the "micro" prefix (GREEK SMALL LETTER MU or MICRO SIGN), "ohm" symbol (GREEK CAPITAL LETTER OMEGA), and "degree" symbol (DEGREE SIGN) are not supported by all charset encodings. By substituting "u", "Ohm", and "o" respectively, the unit symbols remain readable while preserving the system's unambiguity. Taylor recommends using the MIDDLE DOT character between multiplied unit symbols. To support those charset encodings lacking MIDDLE DOT, metric interchange format instead uses FULL STOP ("."). The unit superscript exponents could be formed using SUPERSCRIPT MINUS, SUPERSCRIPT ONE, SUPERSCRIPT TWO, SUPERSCRIPT THREE, etc. But these characters are not universal. So the CIRCUMFLEX ACCENT (^) is placed between a unit and its exponent, written with a portable (minus sign and) digit. The symbol for the liter, L, was adopted by the General Conference on Weights and Measures in order to avoid the risk of confusion between the letter l and the number 1. Programming Language Syntax Extension Because a FULL STOP (".") after a numerical lexical constant is not specified in the syntax of the programming languages C, Pascal, and Scheme, the syntax of their lexical constants could be extended to incorporate SI unit symbols. Radical Units Because white noise power in a bandwidth is proportional to that bandwidth, electronic noise units can have fractional exponents as in nV/Hz^(1/2) (nanovolt per root hertz). Author's Address Aubrey Jaffer agj @ alum.mit.edu 33 Buehler Rd. Bedford MA 01730, USA References [ANSI X3.50] ANSI, "Representations for U.S. customary, SI, and other units to be used in systems with limited character sets", ANSI X3.50, 1986 [CNN 1999] CNN, "European Mars mission looks for lessons in polar lander loss", http://cnn.com/1999/TECH/space/12/29/mars.express/, December 1999. [CRC] Chemical Rubber Company, "CRC handbook of chemistry and physics", CRC Press, 67th edition, 1986. [ISO 2955] ISO, "Information processing-Representation of SI and other units in systems with limited character sets", ISO 2955:1983 [ISO 6093] ISO, "Representation of numerical values in character strings for information interchanges", ISO 6093:1985. [NIST 811] Taylor, B., "Guide for the Use of the International System of Units (SI)", NIST Special Publication 811, 1995 Edition. [TOG] The Open Group, "The Single UNIX Specification, Version 2", http://www.opennc.org/onlinepubs/7908799/xbd/charset.html, 1997 Expires: June 2001