Internet Draft Robert D. Cameron Document: draft-cameron-tatu-bibp-02.txt Serban G. Tatu 4 December 2000 Simon Fraser University Expires: 4 June 2001 Bibliographic Protocol Level 1: Link Resolution and Metapage Retrieval Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Copyright Notice Copyright (C) Robert D. Cameron and Serban G. Tatu (2000). All Rights Reserved. Table of Contents 1. Introduction ................................................ 2 2. Universal Serial Item Names ................................. 4 2.1 Grammatical Notation ....................................... 4 2.2 Character Set .............................................. 4 2.3 Lexical Elements and Generic Grammar ....................... 5 2.4 Syntactic Framework ........................................ 6 2.5 Publication Domains ........................................ 7 2.6 Collections and Collection Labels .......................... 8 2.7 Items and Item Extensions .................................. 9 2.8 Attributes and Attribute Specifiers ........................ 11 3. BibP Level 1: Description and Rationale ..................... 12 3.1 BibP Link Syntax ........................................... 12 3.2 Server Identification Hierarchy ............................ 13 Cameron and Tatu Informational [Page 1] INTERNET-DRAFT BibP Level 1 December 2000 3.3 Default Local Server ....................................... 14 3.4 The Document-Specified Server .............................. 15 3.5 Default Global Server ...................................... 16 3.6 Link Translation ........................................... 16 3.7 Metapage Response .......................................... 17 3.8 Fault Handling ............................................. 18 3.9 Future Server Requirements ................................. 18 4. A JavaScript Resolver for BibP .............................. 19 4.1 Setting BibP_BaseURL ....................................... 20 4.2 Translating and Displaying BibP Links ...................... 20 4.3 Future Development of Client-Side Resolvers ................ 21 5. Security Considerations ..................................... 22 6. Conclusions ................................................. 22 7. References .................................................. 23 Abstract BibP (bibliographic protocol) links bibliographic identifiers of pub- lished works to bibliographic services for those works. Identifiers follow the Universal Serial Item Name (USIN) scheme, providing a scholar-friendly conventional notation for journal articles, books and institutional publications, as well as a generic framework that can scale to identify documents in any organized collection. A hierarchical resolution model emphasizes bibliographic services available through local libraries backed up by publisher-specified and global services. Resolution is achieved through existing DNS technology coupled with appropriate client-side support. Deployment of BibP clients with most of the popular web browsers is possible today; this paper presents one such client, written in JavaScript. 1. Introduction BibP (bibliographic protocol) is a web-based protocol for linking bibliographic references via Universal Serial Item Names [USIN]. It is intended to allow linking to each bibliographic item as a concep- tual entity, independent of any particular copy or service with respect to that item. Indeed, it is even intended to allow linking to items which may not exist on-line; resolution of such a link could yield a metapage that identifies existing print-based services (library holdings, document delivery) for accessing the item. In this regard, BibP is a proposed reference linking solution that seeks to maintain integrated access to both newly published on-line items as well as the vast body of print-based literature. The BibP/USIN approach applies the principles underlying the Uniform Resource Name (URN) concept [RFC1737] to the particular problem of Cameron and Tatu Informational [Page 2] INTERNET-DRAFT BibP Level 1 December 2000 bibliographic linking. Focussing on a library-based network in this narrower domain, BibP defines a much simpler resolution model than that required for URN support in general [RFC2276]. Based on existing DNS support for relative domain names [RFC1034], the model requires no new development or deployment of DNS technology. In addition, the problems of namespace definition and management [RFC2611] are consid- erably simplified by restriction to bibliographic identifiers of the USIN system. From the author perspective, reference linking with BibP is intended to be as simple and scholar-friendly as possible. For example, to denote the paper by Norman Paskin entitled "Information Identifiers" as it appears on pages 135-6 of volume 10, issue 2 of the journal Learned Publishing, the BibP link is formed from the journal ISSN, volume and page using the USIN conventional syntax: bibp:ISSN/0953- 1513:10@135. Similarly, bibp:RDNS(ietf.org)/RFC:2396 is the minimal syntax that denotes the report by T. Berners-Lee, R. Fielding and L. Masinter entitled "Uniform Resource Identifiers (URI): Generic Syn- tax," published as Request for Comments 2396 of the Internet Engineering Task Force. In general, BibP links to most published documents can be constructed using elements of existing identifica- tion standards combined in a minimal way according to USIN syntactic conventions. In the parlance of Paskin [Idents], USINs are compound identifiers; this contrasts with the simple identifiers (or dumb pointers) of the DOI system [DOI]. Ultimately, the BibP framework is envisioned to facilitate access to bibliographic items through a library-based network of BibP servers. In essence, each library-operated server will provide information and access to items emphasizing locally-available resources and agree- ments; networking will provide access to items not locally available. For example, a university library may operate a BibP server as the default server for its students and faculty, providing access to bibliographic items in accord with university holdings, interlibrary loan options and site licensing arrangements. However, the framework is also envisioned to allow other options as well. For example, com- mercial document delivery services may compete to provide BibP ser- vice to industrial clients. As a first step in the staged development of a multi-level specifica- tion for BibP, this report addresses the basic client-server interac- tion in resolving an individual BibP link and retrieving an appropri- ate metapage. In particular, we present both a Level 1 specification for this interaction and a scalable client-side implementation of that specification. This work is sufficient for initial deployment of BibP-based links and servers. The remainder of this paper is organized as follows. Section 2 Cameron and Tatu Informational [Page 3] INTERNET-DRAFT BibP Level 1 December 2000 describes the syntactic framework and conventions for Universal Serial Item Names under BibP Level 1. BibP Level 1 itself is addressed in Section 3, with accompanying notes discussing the rationale and planning for further development. A scalable client- side implementation of the specification, in the form of an JavaScript program, is presented in Section 4. Section 5 concludes the paper with a discussion of the further development of BibP. 2. Universal Serial Item Names Previous work has proposed a system of Universal Serial Item Names (USINs) for the persistent identification of documents published or otherwise organized in serial collections [USIN]. The overall frame- work defines a concept of publication domains within which standard- ized codes are used to identify particular collections. In principle, each collection may then have its own particular system of hierarchi- cal enumeration and labeling to identify particular published items within the collection. In this way, the USIN framework is generic and extensible; it can be readily scaled to provide for unambiguous iden- tification of documents in any organized collection. This section defines a precise syntactic framework for USINs, slightly modified from the original proposal to better account for the encoding requirements of HTML documents. Within this framework, each collection potentially has its own syntax. However, the USIN proposal also outlines a conventional predefined syntax that provides substantial coverage of the existing literature published as journal articles, books, book articles (include papers in published proceed- ings) and institutional reports in numbered series. The conventional syntax is formalized here and is used as the basis of identification under BibP Level 1. Mechanisms for defining customized syntax for particular publication domains or collections are left for future work. 2.1 Grammatical Notation The grammatical notation used for describing the syntax of USINs is based on EBNF. Terminal symbols (symbols that will actually appear in the syntactic forms) are enclosed in quotation marks. Nonterminal symbols (names of syntactic classes) are expressed as identifiers with possible embedded hyphens or underscores. Alternative syntactic forms are separated by the vertical bar ("|"). Parentheses ("(" and ")" are used to group syntactic phrases. Square brackets ("[" and "]") are used for optional phrases. Braces ("{" and "}") are used for phrases to be repeated zero or more times. Names of nonprinting char- acters are enclosed in angle brackets ("<" and ">"). 2.2 Character Set Cameron and Tatu Informational [Page 4] INTERNET-DRAFT BibP Level 1 December 2000 Under BibP Level 1, USINs are character strings composed of charac- ters in the following classes. UC_LETTER = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" LC_LETTER = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" LETTER = UC_LETTER | LC_LETTER DIGIT = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ALPHANUMERIC = LETTER | DIGIT EXTENDER = "_" | "-" SEPARATOR = "/" | ":" | "!" | "@" | "$" | "*" | "~" | "+" | "," | "." PAREN = "(" | ")" WHITE = | The USIN framework is designed to accommodate future extension of the USIN character set in support of internationalization. That is, non- ASCII characters of Unicode/ISO 10646 [Unicode] may be added to the LETTER, DIGIT, and EXTENDER character classes. However, USINs are designed to be parsed based on recognition of SEPARATOR and PAREN characters. Thus, carefully written USIN parsers under BibP Level 1 may accommodate future extensions to the USIN character set without modification. Closely related to the USIN is the notion of a USIN Octet Sequence (UOS), an encoding of a USIN as a sequence of 8-bit bytes. USINs themselves are simply character strings without any particular con- straint on their representation. Thus a USIN may be represented as a sequence of handwritten or printed marks on paper. Alternatively, it may be represented as a series of 16-bit quantities in the UCS-2 for- mat of Unicode/ISO 10646 [Unicode]. However, when a USIN is to be communicated under BibP Level 1, it is always encoded as a USIN Octet Sequence, as described in Section 3.1 following. 2.3 Lexical Elements and Generic Grammar USINs are made up of lexical elements known as symbols, operators and phrases. symbol = ALPHANUMERIC {[EXTENDER] ALPHANUMERIC} operator = SEPARATOR {SEPARATOR} phrase = "(" {ALPHANUMERIC | EXTENDER | SEPARATOR} ")" Symbols are generally names or numerals that identify particular entities within some level of the identification hierarchy. Parenthesized phrases play a similar role but provide wider-ranging Cameron and Tatu Informational [Page 5] INTERNET-DRAFT BibP Level 1 December 2000 syntax for imported notations and/or internal structure. Operators are generally syntactic markers that guide the interpretation of sym- bols and phrases. WHITE characters (whitespace) may be embedded in a USIN only in accord with the following hyphenation convention. A hyphenation sub- string consisting of a single hyphen ("-") followed by zero or more whitespace characters may be inserted after an operator or parenthesized phrase. However, a hyphenation substring may not appear at the end of a USIN. Following these rules, hyphenation substrings may be assumed to be implicitly permitted after any operator or parenthesized phrase appearing in the USIN grammar. The hyphenation convention permits a USIN appearing in plain text to be formatted over more than one line. Cut-and-paste operations on USINs displayed in this manner may thus extract USINs with embedded whitespace. USIN processing software will normally remove the embed- ded whitespace prior to further work. The USIN framework allows symbols, operators and phrases to be com- bined in a variety of ways, depending on the identification needs of particular publication domains and collections. However, a USIN must always satisfy the following generic grammar of permissible USIN forms (after removal of hyphenation substrings). form = symbol | form phrase | form operator symbol The generic grammar of forms reflects the hierarchical left-to-right structure of USINs. The most elementary form of a USIN is a single symbol. All other USINs are formed hierarchically by extending known forms with additional identification elements consisting of phrases or operator-symbol combinations. 2.4 Syntactic Framework The syntactic framework for USINs identifies publication-domains, collections, items, and attributes as the four key syntactic struc- tures. The term USIN may refer to any one of these structures, which are hierarchically related as follows. USIN = publication-domain | collection | item | attribute collection = publication-domain "/" collection-label item = (collection | item) item-extension attribute = (collection | item | attribute) "!" attribute-specifier For example, consider the USIN ISSN/0953-1513:10@135!title. The pub- lication domain is ISSN, the space of all serial publications registered with an International Standard Serial Number [ISO3297]. Cameron and Tatu Informational [Page 6] INTERNET-DRAFT BibP Level 1 December 2000 The collection is the set of all articles published in the journal whose ISSN is 0953-1513, namely, Learned Publishing. The item exten- sions :10 and @135 specify respectively volume 10 of the journal and the article that appears on page 135 of that volume (using the con- ventional syntax described later). Attribute notation is used to specify the title of the article as the object of interest. 2.5 Publication Domains Publication domains represent namespaces within which publications and other collections are assigned identifiers according to a specific scheme and/or authority. The syntax presented here is used both for the three initial domains supported under BibP Level 1 (namely ISSN, ISBN, and RDNS) and for future domains. Although the initial domains provide for substantial coverage of referenced literature, the general syntax accommodates future development of a richer hierarchical domain structure to provide for both greater cov- erage and the development of more mnemonic forms. Publication domains may be simple, hierarchical, and/or parameter- ized. publication-domain = symbol | publication-domain "." symbol | publication-domain phrase When a parenthesized phrase is appended to a publication domain, it may be considered to instantiate that domain for the particular string value given in parentheses. Under BibP Level 1, two simple domains are predefined, represented by the symbols ISSN and ISBN. As noted previously, the ISSN domain con- sists of those serial publications that may be identified by an International Standard Serial Number. Similarly, the ISBN domain is the space of those publications identified by an International Stan- dard Book Number [ISO2108]. RDNS is a parameterized domain that uses a restricted subset of names assigned under the Domain Name System [RFC1034] to identify publica- tion namespaces for individual institutions. For example, RDNS(sfu.ca) denotes a publication namespace for Simon Fraser Univer- sity, while RDNS(ietf.org) denotes a similar namespace for the Inter- net Engineering Task Force. Here, the parameter string must a well- established domain name under DNS that is both owned by the institu- tion and has a clear interpretation as a code for that institution. The domain parameter for RDNS is case-insensitive, following the con- ventions for DNS. For example, RDNS(sfu.ca) and RDNS(SFU.CA) are Cameron and Tatu Informational [Page 7] INTERNET-DRAFT BibP Level 1 December 2000 equivalent. Following DNS tradition, the lower case version of the RDNS parameter is considered the canonical and preferred form. Hierarchical divisions of an institution may be identified by hierarchical RDNS domains. The subdomains are identified by unambigu- ous codes for the divisions as used by the institution itself. For example, RDNS(sfu.ca).CMPT denotes the School of Computing Science at Simon Fraser University using the four-letter code CMPT unambiguously used by SFU for the School. Alternatively, RDNS(cs.sfu.ca) also denotes the School, using its well-established DNS name. The astute reader may note that the parameterized domain syntax used for RDNS differs from the quoted DNS names original proposed [USIN]. It is slightly cleaner and simplifies the USIN Octet Sequence representation (see Section 3.1) by eliminating the need for escape- encoding of quotation marks. 2.6 Collections and Collection Labels Collections are sets of documents organized by a particular serial numbering scheme. For example, a journal is typically a collection organized using volume, issue and page numbering, while a technical report series is a collection organized by a numbering scheme speci- fied by the issuing institution. A book may be a collection of arti- cles (for example, the proceedings of a conference) or may be con- sidered a singleton collection (a single document in its own right). Collection labels are symbols that identify particular collections within the context of a publication domain. collection-label = symbol Collection labels must always conform to this syntax, but particular publication domains may impose further restrictions. In the context of the ISSN domain, collection labels are restricted to the following ISSN syntax [ISSN]. collection-label(ISSN) = ISSN ISSN = DIGIT DIGIT DIGIT DIGIT ["-"] DIGIT DIGIT DIGIT DIGITX DIGITX = DIGIT | "X" | "x" The embedded hyphen within an ISSN is preferred and canonical for USIN syntax, but may be omitted. Similarly, the upper case X is the preferred and canonical form for the ISSN check digit denoting 10, but x is considered equivalent. BibP servers must accept serial-codes in any of these forms. However, when generating or otherwise report- ing USINs within the ISSN domain, BibP servers must use the canonical Cameron and Tatu Informational [Page 8] INTERNET-DRAFT BibP Level 1 December 2000 forms. Collection labels within the ISBN domain similarly follow ISBN syntax [ISO2108]. collection-label(ISBN) = ISBN ISBN = INTEGER "-" INTEGER "-" INTEGER "-" DIGITX | DIGIT DIGIT DIGIT DIGIT DIGIT DIGIT DIGIT DIGIT DIGIT DIGITX INTEGER = DIGIT {DIGIT} The preferred and canonical form of ISBNs includes the correct hyphe- nation to separate it into four fields for the country/group coding, publisher coding, title coding and check digit. Each of the first three fields is variable length, but in total these fields must con- tain exactly nine digits. As with ISSNs, x may be used for the check digit, but X is preferred and canonical. With RDNS domains, collection labels should be the identifiers actu- ally used by the institution. For example, the Internet Engineering Task Force uses RFC to refer to documents in its Request for Comments Series, so this collection may be identified by the USIN RDNS(ietf.org)/RFC. The technical report series of SFU's School of Computing Science is denoted RDNS(sfu.ca).CMPT/TR. Theses published by an institution are conventionally denoted by the abbreviation for the degree, so RDNS(sfu.ca).CMPT/PhD denotes the Ph.D. thesis series of the School. 2.7 Items and Item Extensions Item is the generic term used to refer to an individual document or group of documents that form an identified division within the hierarchical identification scheme of a collection. For example, the volumes, issues and articles of a journal are all items. Within the context of a specific collection, item extensions are the USIN suffixes that specify items. In general, the syntax and interpretation of item extensions depends on the particular collec- tion or publication domain involved. However, each item extension conforms to the generic grammar of Section 2.2. The USIN conventional syntax predefines a number of item extensions for common forms of hierarchical identification. These involve the operators ":" for introducing the principal enumeration scheme of a collection, "@" for page-based article specification and "$" for direct article specification by symbol or count. Whenever a collection is explicitly divided into enumerated divi- sions, the ":" operator is used to introduce the division label. Cameron and Tatu Informational [Page 9] INTERNET-DRAFT BibP Level 1 December 2000 Volumes of a journal are a typical use, so :10 is the item-extension specifying volume 10 of Learned Publishing in the USIN ISSN/0953- 1513:10@135. Although volumes will be denoted by integer numerals in most cases, the conventional syntax also permits arbitrary symbols. For example, ISSN/0098-5589:SE-12 denotes Volume SE-12 of IEEE Tran- sactions on Software Engineering. Report numbers, year numbers and other top-level enumeration elements are also introduced using the ":" operator. For example, RDNS(ietf.org)/RFC:2396 denotes RFC 2396 of the IETF Request for Com- ment series, while RDNS(sfu.ca).CMPT/PhD:2000 denotes PhD theses pub- lished in the year 2000 by the SFU School of Computing Science. The USIN convention for journals also includes a syntax for issue numbers as a second level of enumeration, namely a parenthesized phrase. Thus ISSN/0953-1513:10(2) denotes volume 10, issue 2 of Learned Publishing. Special issues and combined issues typically use non-numeric issue strings. For example, ISSN/0038-0644:20(S2) denotes special issue S2 of volume 20 of Software--Practice & Experience (December 1990), while ISSN/0361-526X:36(3/4) denotes combined issue 3/4 of volume 36 of Serials Librarian (1999). Note that the parenthesized notation for issues is also quite common in biblio- graphic citations; the USIN convention takes advantage of this for mnemonic effect. The second conventional operator under the USIN system is the "@" operator for specifying articles in books or journals by starting page number. For example, the article at page 135 of Learned Publish- ing 10(2) is denoted ISSN/0953-1513:10(2)@135. For journals that are paginated by volume, such as this one, the issue number may be omit- ted; ISSN/0953-1513:10@135 is thus equivalent to the USIN just given. In the event that more than one article starts on a given page, the articles are numbered sequentially with an alphabetic code: a for the first, b for the second, c for the third, and so on. In the rare event that there are more than 26 articles on the page, the code allows arbitrary base 26 numerals such as aa for the 27th item, ab for the 28th and so on. The final form of item extension in the USIN conventional syntax uses the "$" operator to specify articles in unpaginated e-journals or other contexts by a numeric or symbolic label. Numeric labels indi- cate either explicit or implicit enumeration within contents lists. However, where a clear symbolic label exists either in plain text or encoded in the article URL, then the symbolic form is preferred and canonical. For example, the article "Towards Universal Serial Item Names" published in Volume 1, Issue 3 of the Journal of Digital Information is denoted ISSN/1368-7506:1(3)$Cameron, where Cameron is Cameron and Tatu Informational [Page 10] INTERNET-DRAFT BibP Level 1 December 2000 the symbolic code clearly used in the JoDI URL to distinguish this article from others in the same issue. The generic USIN grammar supports the definition of many additional forms of item extension. Future developments of the USIN system will likely introduce additional conventional syntax as well as mechanisms for specifying domain- or collection-dependent syntax. 2.8 Attributes and Attribute Specifiers Attributes are properties or metadata elements that pertain to a par- ticular collection or item. The USIN syntax reserves the "!" operator for introducing attribute-specifiers, as shown in the grammar of Sec- tion 2.3 above. An attribute-specifier itself consists of a symbol naming the attribute, with an optional parenthesized phrase to specify a parameter value. attribute-specifier = symbol [phrase] For example, ISSN/0953-1513!title denotes the title of the journal whose ISSN is 0953-1513, namely Learned Publishing, ISSN/0953- 1513:10@135!title denotes the article title "Information Identifiers" and ISSN/0953-1513:10@135!author(1) denotes the first (and only, in this case) author of this article, namely, Norman Paskin. In general, attributes denote publication facts about particular items or collections. Attributes are not intended to account for classification or other metadata that may be attributed to items by third parties. Philosophically, third-party metadata is considered to be interpretative, not factual. Different third parties may well describe and/or classify the same document in quite different ways. Thus the attribute sets used with USINs may be expected to be sub- stantially narrower than general metadata element sets such as those of Dublin Core [RFC2413]. Of particular importance to the further development of the BibP net- work as it evolves towards the concept of a universal citation data- base [UCD] is the parameterized ref attribute. This attribute refers to the bibliographic references in a document, identified by numeric or symbolic citation tag. For example, RDNS(ietf.org)/RFC:2XXX!ref(UCD) denotes the document cited as UCD in this RFC. The ref attribute supports even broader coverage of the literature than that provided by the direct identification provisions of the USIN conventional syntax. Any documents that are cited within other documents may be identified by specification of the citing document and a citation tag. Effectively, this provides for universal coverage Cameron and Tatu Informational [Page 11] INTERNET-DRAFT BibP Level 1 December 2000 of all documents that are transitively reachable by citation. The attribute framework of the USIN scheme is substantially an area for future work, however. No requirements for attribute processing are specified under BibP Level 1, except to recognize that attribute syntax is valid. 3. BibP Level 1: Description and Rationale BibP Level 1 establishes the syntax of BibP links together with requirements on HTTP-based client-server interactions for resolving individual links and retrieving bibliographic metapages for display to the user. A BibP client is a web browser or other user agent that either has built-in support for BibP (BibP-aware user agent) or operates in conjunction with an appropriate client-side script. (Sec- tion 4 presents one such script-based implementation of BibP link resolution). The BibP client resolves BibP links by identifying an appropriate BibP server and generating a well-formatted BibP request to that server. Upon receiving the request, the BibP server is responsible for generating an HTML page presenting relevant biblio- graphic and service information with respect to the cited item. 3.1 BibP Link Syntax A BibP link is a uniform resource identifier (URI) [RFC2396] of the form bibp:UOS, where UOS is a USIN Octet Sequence as described below. In the parlance of RFC2396, BibP links are absolute URIs whose scheme is bibp and whose scheme-specific-part is a UOS of a cited USIN. The UOS is considered an opaque part because its structure has no meaning with respect to the network. In the normal case, a UOS is simply the representation of a USIN as an ASCII character string [ASCII]. Under BibP Level 1, the only exception is that WHITE characters must be encoded according to the following grammar. WHITESPACE = (CR | LF | HT | SPACE)* CR = "%" "0" ("D" | "d") LF = "%" "0" ("A" | "a") HT = "%" "0" "8" SPACE = "%" "2" "0" That is, the URI transformation of escape encoding [URI] must be applied to the normal ASCII spacing control characters to produce the UOS. Also note that the WHITESPACE grammar permits newlines to be encoded using any of the common file format conventions with various combinations of CR and LF characters. Cameron and Tatu Informational [Page 12] INTERNET-DRAFT BibP Level 1 December 2000 As described previously, the USIN character set is subject to future extension to include non-ASCII characters of Unicode/ISO 10646 for the purpose of internationalization. These characters may be represented within a UOS by first expressing them as octet sequences in the UTF-8 format of Unicode and then applying the URI-encoding transformation to the octets. Because UTF-8 octet sequences for non- ASCII characters always have their high-order bit set, the first hex digit of the escaped encoding will be 8 through F. Thus character sequences of the following grammar may be expected. A_F = "A" | "B" | "C" | "D" | "E" | "F" | "a" | "b" | "c" | "d" | "e" | "f" HEX8_F = "8" | "9" | A_F HEX = DIGIT | A_F UTF-8_encoded = "%" HEX8-F HEX Finally, although the canonical and preferred representation of the USIN characters under BibP Level 1 is indeed as normal ASCII octets, URI-encoded forms thereof are permitted and considered equivalent. Thus a UOS may also contain character sequences of the following grammar. HEX2_7 = "2" | "3" | "4" | "5" | "6" | "7" ASCII_encoded = "%" HEX2_7 HEX The USIN syntax is designed to make considerations of escape encoding completely transparent to the user. Under BibP Level 1, the whitespace-free form of every USIN may be entered directly as a nor- mal ASCII character sequence. Escaped forms will normally only be generated by BibP-aware document composition software supporting the cut-and-paste of USINs or other software that performs escape encod- ing as a part of of general URI processing. 3.2 Server Identification Hierarchy The first step in BibP link resolution is identification of an appropriate BibP server to handle the request. In order of prefer- ence, a BibP client must select from the following servers. - The local bibhost, if it exists (Section 3.3). - The document-specified citehost, if it exists (Section 3.4). - A known global server (such as usin.org) (Section 3.5). This server identification hierarchy provides for a scalable BibP network with particular provisions for library- and publisher- operated BibP servers. Library-operated servers that provide access Cameron and Tatu Informational [Page 13] INTERNET-DRAFT BibP Level 1 December 2000 to local holdings and site licensing information will generally be made available through the bibhost mechanism. Publisher-operated servers that provide particular support for the BibP links contained in a given document may be specified with the citehost mechanism. The citehost is consulted directly if the local bibhost is unavailable, and is also passed as a parameter in bibhost-based resolution to pro- vide for indirect consultation (see Section 3.7). Both mechanisms provide for scalability by reducing the load on global BibP servers as the overall BibP network grows. A BibP-aware user agent may provide a finer-grained hierarchy for server identification by allowing users to specify overriding servers at any position within the hierarchy. For example, a particular BibP-aware web browser may specify three separate configuration set- tings, one each for overriding the BibP server determination at the bibhost, citehost and global levels. 3.3 Default Local Server The key characteristic of BibP Level 1 is the ability for a locally available server to act as the default BibP server for a particular user environment. The following conventions apply. - The DNS (Domain Name System) alias bibhost is used to identify the default BibP server (if one exists) in the the local environment of the web browser or other user agent. For example, if a web browser accessing a BibP link is operating in the univ.edu domain, then the typical configuration of the local DNS resolver would interpret the relative domain name bibhost as the fully qualified domain name bibhost.univ.edu (if it exists). In accord with the recommendations of RFC 2219 [RFC2219], bibhost is the conventional DNS alias for the BibP protocol. - A bibhost server must signal its ability to respond to BibP Level 1 requests by providing HTTP access to the BibP Identification Icon at the URL http://bibhost/bibp1.0/bibpicon.jpg. A user agent tests for the existence of a conforming bibhost by issuing an HTTP HEAD or GET request for this URL. If an error response is received, the user agent directs resolution of the BibP link to the next level in the server hierarchy. - A bibhost server must provide HTTP-based access to a JavaScript- based implementation of BibP link resolution at the URL http://bibhost/bibp1.0/bibres.js. See Section 4 for a suitable script. A user agent may implement link resolution through this script or by some other method. If the script is unavailable, a user agent may direct resolution of a BibP link to the next level in the server hierarchy. Cameron and Tatu Informational [Page 14] INTERNET-DRAFT BibP Level 1 December 2000 Use of the DNS alias bibhost provides a browser-independent and highly configurable mechanism for identifying local BibP servers. Using the normal configuration options available with typical DNS software, it is possible to configure a local BibP server on either a per-client basis or a per-domain basis. Configuration of the DNS resolver on a client machine can specify the machine to be used as bibhost for that client only. However, configuration of a DNS nameserver to provide a bibhost definition for an entire local domain will normally be a much more convenient option. Such a configuration can generally be made without requiring any configuration actions on individual client machines on the network, assuming only that the DNS resolvers on those machines follow the usual practice of including the local domain in the search list for resolution of relative domain names. The BibP Identification Icon has four roles. First, it provides a graphical trademark serving to visually identify a particular bibhost as a participating server with respect to the BibP network. Second, it provides an extra level of assurance to user agents that bibhost does indeed denote a BibP server rather than a machine that just hap- pens to have that name. Third, it allows distinction between dif- ferent levels and versions of the BibP protocol that may be supported by a particular BibP server. Fourth and finally, given the restric- tive security model of JavaScript and other client-side scripting languages, it also provides for feasible script-based testing of bibhost existence using image preloading. The availability of a JavaScript-based resolver on the bibhost server provides for flexibility, scalability and maintainability. Although other resolution mechanisms exist, the local script nevertheless pro- vides authors, publishers and user agents the flexibility to delegate link resolution to the local service. Such delegation represents an inherently scalable design in comparison to an implementation that relies on JavaScript served from a single global source. Furthermore, as the BibP protocol evolves, previously published documents can benefit from updated resolution scripts installed on local bibhosts. The use of the path component bibp1.0 in the URLs for the identifica- tion icon and local resolution script identifies specific support for Level 1 of BibP. Future clients dependent on services defined at Level 2 must not assume that these are available from a bibhost iden- tifying itself as a provider of Level 1 services only. 3.4 The Document-Specified Server In order to identify a BibP server that provides specific and known support for the links in a particular document, publishers or authors may use the citehost mechanism. In the absence of a local bibhost, Cameron and Tatu Informational [Page 15] INTERNET-DRAFT BibP Level 1 December 2000 the citehost denotes the actual BibP server to be used for link reso- lution and metapage retrieval. When a local bibhost is known, the citehost setting is passed on to the bibhost for consultation or citation as a service relevant to the identified document. A citehost is specified by the http URL of a server or server sub- directory. To set the citehost to http://www.pubhost.com/bibpserver/, for example, two declarations should be included in the ele- ment of the document. These two declarations respectively identify the citehost to BibP- aware user agents and JavaScript-based user-agents. The latter declaration is defined to work with the JavaScript resolvers presented in Section 4. Other client-side resolvers (for example, using different scripting languages) may use different conventions for establishing the citehost. 3.5 Default Global Server In the absence of either a local bibhost or a document-specified citehost, a web browser or other user agent must use a known global server as the default BibP server. At the time of writing of this report, the prototype BibP server at usin.org is available and is being further developed as the recommended global server this pur- pose. 3.6 Link Translation After identification of an appropriate server to resolve BibP links, the second step in link resolution is to generate well-formed HTTP requests to that server. The form of those requests is specified using the following translational semantics. A BibP URI of the form bibp:UOS is equivalent to an HTTP URL of one the following forms. - http://server/bibp1.0/resolve?usin=UOS - http://server/bibp1.0/resolve?citehost=citehost&usin=UOS The second form is used when a document-specified citehost is defined in accordance with Section 3.4. In both cases, server denotes the BibP server determined by the rules of Sections 3.2 through 3.5 above. The path component bibp1.0 indicates that the client is expecting resolution services defined at this level (Level 1) of Cameron and Tatu Informational [Page 16] INTERNET-DRAFT BibP Level 1 December 2000 BibP. A user agent may use this translation rule either explicitly or implicitly to generate well-formed HTTP requests. If used explicitly, the form of the required HTTP request follows directly from the HTTP 1.1 specification [RFC2616]. However, the translation may be impli- cit, so long as the HTTP request generated is that same as that specified by the explicit translational semantics. 3.7 Metapage Response Given an HTTP request constructed according to the specifications of Section 3.6, a BibP server must generate an appropriate response in the form of an HTML document [HTML]. When a UOS corresponding to a valid USIN for a known document has been cited, the response page should report bibliographic and service metadata in a format intended for human readers as follows. - The canonical form for the USIN should be reported, removing whi- tespace and performing transformations as described previously. - Basic bibliographic metadata for the appropriate document type should be provided. For journal articles, this typically includes authors, title, journal, volume, issue, year, month and pagina- tion. For books, author, title, publisher, publisher address, date and total pagination are usual. Other document types include the appropriate bibliographic elements commonly accepted to estab- lish a bibliographic citation. - Additional bibliographic metadata may be provided. This may include an abstract, keywords, classification metadata, known reviews, citations received, additional author information and so on. However, a BibP server must provide only factual metadata in the public domain or copyrighted metadata where explicit permis- sion has been obtained. Links to copyrighted materials (e.g., reviews) are preferred. - Known service metadata for the cited item should be provided. This may include on-line full-text access, paper-based library holdings, document delivery options, additional bibliographic sources and so on. A BibP server operating as bibhost for a par- ticular domain will be expected to emphasize locally available services for that domain. - If citehost has been specified, a link to, or information from, the appropriate document metapage at citehost must be provided. Under BibP 1.0, no additional constraints are placed on the metadata Cameron and Tatu Informational [Page 17] INTERNET-DRAFT BibP Level 1 December 2000 to be provided on the response page or its format. The intent is to provide a relatively open framework to allow the development of alternative models for document metapages. BibP 1.0 servers may freely format metadata for human readers, without consideration of how this data may be extracted under program control. However, subsequent development of BibP is expected to specify formal requirements for server-to-server interaction for sharing of basic bibliographic and service metadata. 3.8 Fault Handling BibP servers must provide mechanisms to handle errors, ambiguities and unknowns. - BibP servers should check the syntax of resolve requests for con- sistency with Section 3.6 and the individual syntax of USINs for consistency with Section 2 and report any errors. However, if a syntax error in a resolve requests consists of additional keyword=value parameters with & separators, then the server should simply warn of unknown parameters and continue to respond to the request by ignoring these parameters. - In the event that a USIN is known not to denote any entity, a BibP server must report this fact and should provide links to allow exploration of nearby valid USINs. For example, if a USIN refers to an article starting on a particular page of a journal and it is known that no article starts on that page, the server may provide links to the article that starts on the closest previous page as well as links to the apparent issue and volume that were intended. - In the event that an article-level USIN is known to be ambiguous (incomplete), a BibP server must report all matching articles and their USINs. For example, when more than one article starts on a particular page of a journal, omission of the disambiguating suf- fixes a, b, and so on, is likely to be common. By reporting all articles on the page, the BibP server nevertheless provides a credible response to the ambiguous USIN. - In the event that a BibP server has insufficient data to fully resolve a USIN, the server should nevertheless report any informa- tion that is known. For example, if a server request involves the USIN of an article in a known journal, but the article metadata itself is unavailable, the server should report the journal title, volume and page. 3.9 Future Server Requirements Cameron and Tatu Informational [Page 18] INTERNET-DRAFT BibP Level 1 December 2000 It is anticipated that BibP Level 2 will impose additional require- ments on BibP servers, particularly in the areas of server-to-server interactions and acceptance of metadata submissions. BibP Level 3 is further planned to incorporate the capture and dissemination of the citing relationship (from citing works to cited works) as metadata, as a step towards the universal citation database [UCD]. BibP Level 1 server software should be designed to accomodate these evolving requirements. 4. A JavaScript Resolver for BibP This section presents and documents a JavaScript program for client- side resolution of BibP. This program is intended to be embedded in the HEAD element of HTML documents to implement client-side resolu- tion with browsers that provide JavaScript support. The program has been written to use only those JavaScript features that are rela- tively standard and are expected to remain so. The script is effec- tive with Netscape Navigator (versions 3 through 6), Internet Explorer (versions 4 through 5.5) and Opera (version 4), although full bibhost support is not yet available in the latter. The prefix BibP_ is used for all global functions and variables of the resolver so that the resolver can be freely mixed with other client-side JavaScript that respects this prefix. The full script, in a relatively condensed form for ease of cut-and-paste, is presented immediately below and documented in the following subsec- tions. 4.1 Setting BibP_BaseURL The core strategy of the resolver is to define and use the global variable BibP_BaseURL as the common prefix for resolution of BibP links. That is, given a link of the form bibp:USIN, the link transla- tion of Section 3.6 is performed by concatenation of BibP_BaseURL and USIN. The function BibP_SetBaseURL constructs the prefix given a BibP server as its input parameter and using the global setting of BibP_citehost as described in Section 3.4. The determination of the server to be used for BibP_BaseURL follows the server identification hierarchy of Section 3.2. Initially, the value of BibP_BaseURL is set based on the document-specified BibP_citehost if it exists, or the global server usin.org, otherwise. However, if a test for the availability of a local bibhost subse- quently proves successful, BibP_BaseURL will be adjusted to use bibhost (BibP_onIcon function). The test for the availability of bibhost uses the image preloading feature of common web browsers to check the required identification icon at http://bibhost/bibp1.0/bibpicon.jpg. After the document has loaded, and links have been processed with the initial value of BibP_BaseURL, the assignment of the src property of BibP_Icon ini- tiates the test for that icon (BibP_onLoad function). On a successful load event, the BibP_onIcon handler is called. If an icon of nonzero height is reported, bibhost is used to establish BibP_BaseURL. A zero height icon indicates either that images were turned off in the browser (an empty icon is trivially loaded without verifying the existence of bibhost) or an erroneous icon. 4.2 Translating and Displaying BibP Links Cameron and Tatu Informational [Page 20] INTERNET-DRAFT BibP Level 1 December 2000 The function BibP_ProcessLink is responsible for translating BibP links to the appropriate URLs as well as for arranging for correct display of the links in the browser status bar on MouseOver events. It is first used within the BibP_onLoad function to process links based on the initial BibP_BaseURL value (before bibhost testing). Subsequently, it may also be used within the BibP_onIcon function to update the translation if bibhost availability is confirmed. The two-pass approach assures the availability of BibP link service as soon as a document is loaded. Because the test for bibhost availa- bility proceeds asynchronously with user action, it is possible that a user may access a BibP link after document loading but before bibhost availability is known. In this case, service from the citehost or the global server will be provided. Link translations are effected within BibP_ProcessLink by changing the stored href attribute associated with each BibP link. In the first pass, the USIN is determined as the substring following the first occurrence of the string "bibp:" and the transformed link is formed by appending this USIN to the value of BibP_BaseURL. The second pass, invoked if bibhost availability has been confirmed, per- forms a similar transformation, replacing the initial BibP_BaseURL prefix with the updated value. A complication of href modification is that the link value displayed in the browser status bar on mouseover events would normally be the actual stored value, not the original bibp: form. The BibP_onMouseOver function arranges to display the original form, while the BibP_onMouseOut function ensures that the status bar is cleared when the mouse is moved off a BibP link. 4.3 Future Development of Client-Side Resolvers It is anticipated that future versions of web browsers and other user agents will provide direct BibP support. In this case, it will likely be desirable to disable the JavaScript resolver. The resolver pro- vides for this with the test on the property navigator.bibpSupport. Any appropriately defined value for this property will prevent BibP link processing that would otherwise be initiated by the window.onload event. It is also anticipated that future developments of the JavaScript resolver may expand the coverage of browser support and perhaps add functionality. Updated resolvers should be available through each local bibhost of the BibP network. These resolvers may be directly used in documents served from the domain. For example, if bibhost.xxx.tld has been implemented, it is recommended that docu- ments served from within the xxx.tld domain incorporate client-side Cameron and Tatu Informational [Page 21] INTERNET-DRAFT BibP Level 1 December 2000 resolution using the following declaration in the HEAD element. This provides for automatic updating of client-side resolvers without document modification. 5. Security Considerations BibP Level 1 defines read-only access to networked bibliographic information. The security concerns are therefore minimal. It is pos- sible that spoofing of the bibhost for a particular domain could pro- vide inaccurate bibliographic or metaservice information. However, such an effect would be localized and should be easy to address by the local domain administrator. A second security concern is the security of the default global service at usin.org as well as the potential use of www.bibhost.com to capture global services. Both of these domains have been registered by the first author of this docu- ment; eventually they should be turned over to an appropriate insti- tutional authority. A potential security concern is the substitution of a malicious JavaScript applet in place of the JavaScript resolver under bibp1.0/bibres.js. Server administrators should ensure the security of installed resolvers. 6. Conclusions Bibliographic Protocol Level 1 provides a new layer of abstraction for web-based reference linking. In essence, the linking to a copy or service with respect to a referenced document is eliminated in favor of a link to the document itself. The link specifies what the cited document is, not how to access it. Link resolution under BibP is based on an open-architecture model involving a network of library-based and publisher-based servers. A global BibP service is also defined, but can be overridden by a document-specific (publisher-based) servers, which can in turn be overridden by library-based bibhost servers. A JavaScript based client-side resolver can be incorporated into HTML documents to enable BibP with most recent versions of Netscape Navi- gator and Internet Explorer. The protocol anticipates the development of clients that understand bibliographic protocol natively, without the use of JavaScript. However, JavaScript is a necessary part of our deployment model. That is, JavaScript allows use of the protocol Cameron and Tatu Informational [Page 22] INTERNET-DRAFT BibP Level 1 December 2000 within a critical mass of existing web browsers. A trivial BibP server implementation can be easily installed as a scaffold for experimental work and monitoring of BibP traffic. A non- trivial prototype implementation of BibP has also been constructed with full support for BibP Level 1 and several features of BibP Level 2 [BibP-MSc]. 7. References [BibP-MSc] Serban Tatu. "Bibliographic Protocol: Distributed Reference Linking to Document Metaservices on the Web." M.Sc. Thesis, School of Computing Science, Simon Fraser University, July 2000. RDNS(sfu.ca).CMPT/MSc:2000$SerbanTatu [DOI] Norman Paskin. "DOI: Current Status and Outlook," D-Lib Magazine Volume 5, Number 5, May 1999. ISSN/1082-9873:5(5)$paskin [Idents] Norman Paskin. "Information Identifiers," Learned Publishing Volume 10, Number 2, April 1997, pp. 135-156. ISSN/0953-1513:10@135 [ISO2108] International Organization for Standardization, Information and documentation - International standard book numbering (ISBN), ISO 2108:1992, 1992. RDNS(iso.ch)/ISO:2108(1992) [ISO3297] International Organization for Standardization, Information and documentation - International standard serial numbering (ISSN), ISO 3297:1998, 1998. RDNS(iso.ch)/ISO:3297(1998) [RFC1034] P. Mockapetris. "Domain Names - Concepts and Facilities," Request for Comments 1034, Internet Engineering Task Force, November 1987. RDNS(ietf.org)/RFC:1034 [RFC1737] K. Sollins and L. Masinter. "Functional Requirements for Uniform Resource Names," Request for Comments 1737, Internet Engineering Task Force, December 1994. RDNS(ietf.org)/RFC:1737 [RFC2219] M. Hamilton and R. Wright. "Use of DNS Aliases for Network Services," Request for Comments 2219, Internet Engineering Task Force, October 1997. RDNS(ietf.org)/RFC:2219 Cameron and Tatu Informational [Page 23] INTERNET-DRAFT BibP Level 1 December 2000 [RFC2276] K. Sollins. "Architectural Principles of Uniform Resource Name Resolution," Request for Comments 2276, Internet Engineering Task Force, January 1998. RDNS(ietf.org)/RFC:2276 [RFC2396] T. Berners-Lee, R. Fielding and L. Masinter. "Uniform Resource Identifiers (URI): Generic Syntax," Request for Comments 2396, Internet Engineering Task Force, August 1998. RDNS(ietf.org)/RFC:2396 [RFC2413] S. Weibel, J. Kunze, C. Lagoze and M. Wolf. "Dublin Core Metadata for Resource Discovery," Request for Comments 2413, Internet Engineering Task Force, September 1998. RDNS(ietf.org)/RFC:2413 [RFC2611] L. Daigle, D. van Gulik, R. Iannella, P. Faltstrom. "URN Namespace Definition Mechanisms," Request for Comments 2611, Internet Engineering Task Force, June 1999. RDNS(ietf.org)/RFC:2611 [RFC2616] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, and T. Berners-Lee. "Hypertext Transfer Protocol -- HTTP/1.1," Request for Comments 2616, Internet Engineering Task Force, June 1999. RDNS(ietf.org)/RFC:2616 [UCD] Robert D. Cameron. "A Universal Citation Database as a Catalyst for Reform in Scholarly Communication." First Monday, Volume 2, No. 4, April 1997. ISSN/1396-0466:2(4)$cameron [Unicode] The Unicode Consortium. The Unicode Standard, Version 3.0, Addison Wesley Longman, Reading, Massachusetts, 2000. ISBN/0-201-61633-5 [USIN] Robert D. Cameron. "Towards Universal Serial Item Names," Journal of Digital Information, Volume 1, Number 3, October 1998. ISSN/1368-7506:1(3)$Cameron Authors' Addresses Robert D. Cameron School of Computing Science Simon Fraser University 8888 University Drive Burnaby, B.C. V5A 1S6 Canada Cameron and Tatu Informational [Page 24] INTERNET-DRAFT BibP Level 1 December 2000 Phone: +1 604 291 3241 EMail: cameron@cs.sfu.ca Serban G. Tatu EMail: statu@cs.sfu.ca Cameron and Tatu Informational [Page 25]