Internet Draft Robert D. Cameron Document: draft-cameron-tatu-bibp-00.txt Serban G. Tatu 24 August 2000 Simon Fraser University Expires: 24 February 2001 Bibliographic Protocol Level 1: Link Resolution and Metapage Retrieval Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Copyright Notice Copyright (C) Robert D. Cameron and Serban G. Tatu (2000). All Rights Reserved. Table of Contents 1. Introduction ................................................ 2 2. Universal Serial Item Names ................................. 4 2.1 Grammatical Notation ....................................... 4 2.2 Character Set .............................................. 4 2.3 Lexical Elements and Generic Grammar ....................... 5 2.4 Syntactic Framework ........................................ 6 2.5 Publication Domains ........................................ 7 2.6 Collections and Collection Labels .......................... 8 2.7 Items and Item Extensions .................................. 9 2.8 Attributes and Attribute Specifiers ........................ 11 3. BibP Level 1: Description and Rationale ..................... 11 3.1 BibP Link Syntax ........................................... 12 3.2 Server Identification Hierarchy ............................ 13 Cameron and Tatu Experimental [Page 1] INTERNET-DRAFT BibP Level 1 August 2000 3.3 Default Local Server ....................................... 14 3.4 The Document-Specified Server .............................. 15 3.5 Default Global Server ...................................... 16 3.6 Link Translation ........................................... 16 3.7 Metapage Response .......................................... 16 3.8 Fault Handling ............................................. 17 3.9 Future Server Requirements ................................. 18 4. A JavaScript Resolver for BibP .............................. 18 4.1 Setting BibP_BaseURL ....................................... 20 4.2 Translating and Displaying BibP Links ...................... 20 4.3 Future Development of Client-Side Resolvers ................ 21 5. Security Considerations ..................................... 21 6. Conclusions ................................................. 22 7. References .................................................. 22 Abstract BibP (bibliographic protocol) is a proposed web-based protocol for linking bibliographic references via Universal Serial Item Names [USIN]. It is intended to allow linking to each bibliographic item as a conceptual entity, independent of any particular copy or service with respect to that item. Indeed, it is even intended to allow link- ing to items which may not exist on-line; resolution of such a link could yield a metapage that identifies existing print-based services (library holdings, document delivery) for accessing the item. In this regard, BibP is a proposed reference linking solution that seeks to maintain integrated access to both newly published on-line items as well as the vast body of print-based literature. This is in marked constrast to both the DOI initiative focussing on articles as digital objects [DOI] and the URN initiative focussed on general web resources [RFC1737]. 1. Introduction BibP (bibliographic protocol) is a proposed web-based protocol for linking bibliographic references via Universal Serial Item Names [USIN]. It is intended to allow linking to each bibliographic item as a conceptual entity, independent of any particular copy or service with respect to that item. Indeed, it is even intended to allow link- ing to items which may not exist on-line; resolution of such a link could yield a metapage that identifies existing print-based services (library holdings, document delivery) for accessing the item. In this regard, BibP is a proposed reference linking solution that seeks to maintain integrated access to both newly published on-line items as well as the vast body of print-based literature. This is in marked constrast to both the DOI initiative focussing on articles as digital objects [DOI] and the URN initiative focussed on general web Cameron and Tatu Experimental [Page 2] INTERNET-DRAFT BibP Level 1 August 2000 resources [RFC1737]. From the author perspective, reference linking with BibP is intended to be as simple and scholar-friendly as possible. For example, to denote the paper by Norman Paskin entitled "Information Identifiers" as it appears on pages 135-6 of volume 10, issue 2 of the journal Learned Publishing, the BibP link is formed from the journal ISSN, volume and page using the USIN conventional syntax: bibp:ISSN/0953- 1513:10@135. Similarly, bibp:RDNS(ietf.org)/RFC:2396 is the minimal syntax that denotes the report by T. Berners-Lee, R. Fielding and L. Masinter entitled "Uniform Resource Identifiers (URI): Generic Syn- tax," published as Request for Comments 2396 of the Internet Engineering Task Force. In general, BibP links to most published documents can be constructed using elements of existing identifica- tion standards combined in a minimal way according to USIN syntactic conventions. Ultimately, the BibP framework is envisioned to facilitate access to bibliographic items through a library-based network of BibP servers. In essence, each library-operated server will provide information and access to items emphasizing locally-available resources and agree- ments; networking will provide access to items not locally available. For example, a university library may operate a BibP server as the default server for its students and faculty, providing access to bibliographic items in accord with university holdings, interlibrary loan options and site licensing arrangements. However, the framework is also envisioned to allow other options as well. For example, com- mercial document delivery services may compete to provide BibP ser- vice to industrial clients. As a first step in the staged development of a multi-level standard for BibP, this report addresses the basic client-server interaction in resolving an individual BibP link and retrieving an appropriate metapage. In particular, we present both a proposed Level 1 standard for this interaction and a scalable client-side implementation of that standard. This work is sufficient for initial deployment of BibP-based links and servers and is demonstrated by the linking in the on-line version of this report! The remainder of this paper is organized as follows. Section 2 describes the syntactic framework and conventions for Universal Serial Item Names under BibP Level 1. BibP Level 1 itself is addressed in Section 3, with accompanying notes discussing the rationale and planning for further development. A scalable client- side implementation of the standard, in the form of an JavaScript program, is presented in Section 4. Section 5 concludes the paper with a discussion of the further development of BibP. Cameron and Tatu Experimental [Page 3] INTERNET-DRAFT BibP Level 1 August 2000 2. Universal Serial Item Names Previous work has proposed a system of Universal Serial Item Names (USINs) for the persistent identification of documents published or otherwise organized in serial collections [USIN]. The overall frame- work defines a concept of publication domains within which standard- ized codes are used to identify particular collections. In principle, each collection may then have its own particular system of hierarchi- cal enumeration and labeling to identify particular published items within the collection. In this way, the USIN framework is generic and extensible; it can be readily scaled to provide for unambiguous iden- tification of documents in any organized collection. This section defines a precise syntactic framework for USINs, slightly modified from the original proposal to better account for the encoding requirements of HTML documents. Within this framework, each collection potentially has its own syntax. However, the USIN proposal also outlines a conventional predefined syntax that provides substantial coverage of the existing literature published as journal articles, books, book articles (include papers in published proceed- ings) and institutional reports in numbered series. The conventional syntax is formalized here and is used as the basis of identification under BibP Level 1. Mechanisms for defining customized syntax for particular publication domains or collections are left for future work. 2.1 Grammatical Notation The grammatical notation used for describing the syntax of USINs is based on EBNF. Terminal symbols (symbols that will actually appear in the syntactic forms) are enclosed in quotation marks. Nonterminal symbols (names of syntactic classes) are expressed as identifiers with possible embedded hyphens or underscores. Alternative syntactic forms are separated by the vertical bar ("|"). Parentheses ("(" and ")" are used to group syntactic phrases. Square brackets ("[" and "]") are used for optional phrases. Braces ("{" and "}") are used for phrases to be repeated zero or more times. Names of nonprinting char- acters are enclosed in angle brackets ("<" and ">"). 2.2 Character Set Under BibP Level 1, USINs are character strings composed of charac- ters in the following classes. UC_LETTER = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" LC_LETTER = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | Cameron and Tatu Experimental [Page 4] INTERNET-DRAFT BibP Level 1 August 2000 "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" LETTER = UC_LETTER | LC_LETTER DIGIT = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ALPHANUMERIC = LETTER | DIGIT EXTENDER = "_" | "-" SEPARATOR = "/" | ":" | "!" | "@" | "$" | "*" | "~" | "+" | "," | "." PAREN = "(" | ")" WHITE = | The USIN framework is designed to accommodate future extension of the USIN character set in support of internationalization. That is, non- ASCII characters of Unicode/ISO 10646 [Unicode] may be added to the LETTER, DIGIT, and EXTENDER character classes. However, USINs are designed to be parsed based on recognition of SEPARATOR and PAREN characters. Thus, carefully written USIN parsers under BibP Level 1 may accommodate future extensions to the USIN character set without modification. Closely related to the USIN is the notion of a USIN Octet Sequence (UOS), an encoding of a USIN as a sequence of 8-bit bytes. USINs themselves are simply character strings without any particular con- straint on their representation. Thus a USIN may be represented as a sequence of handwritten or printed marks on paper. Alternatively, it may be represented as a series of 16-bit quantities in the UCS-2 for- mat of Unicode/ISO 10646 [Unicode]. However, when a USIN is to be communicated under BibP Level 1, it is always encoded as a USIN Octet Sequence, as described in Section 3.1 following. 2.3 Lexical Elements and Generic Grammar USINs are made up of lexical elements known as symbols, operators and phrases. symbol = ALPHANUMERIC {[EXTENDER] ALPHANUMERIC} operator = SEPARATOR {SEPARATOR} phrase = "(" {ALPHANUMERIC | EXTENDER | SEPARATOR} ")" Symbols are generally names or numerals that identify particular entities within some level of the identification hierarchy. Parenthesized phrases play a similar role but provide wider-ranging syntax for imported notations and/or internal structure. Operators are generally syntactic markers that guide the interpretation of sym- bols and phrases. WHITE characters (whitespace) may be embedded in a USIN only in accord with the following hyphenation convention. A hyphenation sub- string consisting of a single hyphen ("-") followed by zero or more Cameron and Tatu Experimental [Page 5] INTERNET-DRAFT BibP Level 1 August 2000 whitespace characters may be inserted after an operator or parenthesized phrase. However, a hyphenation substring may not appear at the end of a USIN. Following these rules, hyphenation substrings may be assumed to be implicitly permitted after any operator or parenthesized phrase appearing in the USIN grammar. The hyphenation convention permits a USIN appearing in plain text to be formatted over more than one line. Cut-and-paste operations on USINs displayed in this manner may thus extract USINs with embedded whitespace. USIN processing software will normally remove the embed- ded whitespace prior to further work. The USIN framework allows symbols, operators and phrases to be com- bined in a variety of ways, depending on the identification needs of particular publication domains and collections. However, a USIN must always satisfy the following generic grammar of permissible USIN forms (after removal of hyphenation substrings). form = symbol | form phrase | form operator symbol The generic grammar of forms reflects the hierarchical left-to-right structure of USINs. The most elementary form of a USIN is a single symbol. All other USINs are formed hierarchically by extending known forms with additional identification elements consisting of phrases or operator-symbol combinations. 2.4 Syntactic Framework The syntactic framework for USINs identifies publication-domains, collections, items, and attributes as the four key syntactic struc- tures. The term USIN may refer to any one of these structures, which are hierarchically related as follows. USIN = publication-domain | collection | item | attribute collection = publication-domain "/" collection-label item = (collection | item) item-extension attribute = (collection | item | attribute) "!" attribute-specifier For example, consider the USIN ISSN/0953-1513:10@135!title. The pub- lication domain is ISSN, the space of all serial publications registered with an International Standard Serial Number [ISO3297]. The collection is the set of all articles published in the journal whose ISSN is 0953-1513, namely, Learned Publishing. The item exten- sions :10 and @135 specify respectively volume 10 of the journal and the article that appears on page 135 of that volume (using the con- ventional syntax described later). Attribute notation is used to specify the title of the article as the object of interest. Cameron and Tatu Experimental [Page 6] INTERNET-DRAFT BibP Level 1 August 2000 2.5 Publication Domains Publication domains represent namespaces within which publications and other collections are assigned identifiers according to a specific scheme and/or authority. The syntax presented here is used both for the three initial domains supported under BibP Level 1 (namely ISSN, ISBN, and RDNS) and for future domains. Although the initial domains provide for substantial coverage of referenced literature, the general syntax accommodates future development of a richer hierarchical domain structure to provide for both greater cov- erage and the development of more mnemonic forms. Publication domains may be simple, hierarchical, and/or parameter- ized. publication-domain = symbol | publication-domain "." symbol | publication-domain phrase When a parenthesized phrase is appended to a publication domain, it may be considered to instantiate that domain for the particular string value given in parentheses. Under BibP Level 1, two simple domains are predefined, represented by the symbols ISSN and ISBN. As noted previously, the ISSN domain con- sists of those serial publications that may be identified by an International Standard Serial Number. Similarly, the ISBN domain is the space of those publications identified by an International Stan- dard Book Number [ISO2108]. RDNS is a parameterized domain that uses a restricted subset of names assigned under the Domain Name System [RFC1034] to identify publica- tion namespaces for individual institutions. For example, RDNS(sfu.ca) denotes a publication namespace for Simon Fraser Univer- sity, while RDNS(ietf.org) denotes a similar namespace for the Inter- net Engineering Task Force. Here, the parameter string must a well- established domain name under DNS that is both owned by the institu- tion and has a clear interpretation as a code for that institution. The domain parameter for RDNS is case-insensitive, following the con- ventions for DNS. For example, RDNS(sfu.ca) and RDNS(SFU.CA) are equivalent. Following DNS tradition, the lower case version of the RDNS parameter is considered the canonical and preferred form. Hierarchical divisions of an institution may be identified by hierarchical RDNS domains. The subdomains are identified by unambigu- ous codes for the divisions as used by the institution itself. For example, RDNS(sfu.ca).CMPT denotes the School of Computing Science at Cameron and Tatu Experimental [Page 7] INTERNET-DRAFT BibP Level 1 August 2000 Simon Fraser University using the four-letter code CMPT unambiguously used by SFU for the School. Alternatively, RDNS(cs.sfu.ca) also denotes the School, using its well-established DNS name. The astute reader may note that the parameterized domain syntax used for RDNS differs from the quoted DNS names original proposed [USIN]. It is slightly cleaner and simplifies the USIN Octet Sequence representation (see Section 3.1) by eliminating the need for escape- encoding of quotation marks. 2.6 Collections and Collection Labels Collections are sets of documents organized by a particular serial numbering scheme. For example, a journal is typically a collection organized using volume, issue and page numbering, while a technical report series is a collection organized by a numbering scheme speci- fied by the issuing institution. A book may be a collection of arti- cles (for example, the proceedings of a conference) or may be con- sidered a singleton collection (a single document in its own right). Collection labels are symbols that identify particular collections within the context of a publication domain. collection-label = symbol Collection labels must always conform to this syntax, but particular publication domains may impose further restrictions. In the context of the ISSN domain, collection labels are restricted to the following ISSN syntax [ISSN]. collection-label(ISSN) = ISSN ISSN = DIGIT DIGIT DIGIT DIGIT ["-"] DIGIT DIGIT DIGIT DIGITX DIGITX = DIGIT | "X" | "x" The embedded hyphen within an ISSN is preferred and canonical for USIN syntax, but may be omitted. Similarly, the upper case X is the preferred and canonical form for the ISSN check digit denoting 10, but x is considered equivalent. BibP servers must accept serial-codes in any of these forms. However, when generating or otherwise report- ing USINs within the ISSN domain, BibP servers must use the canonical forms. Collection labels within the ISBN domain similarly follow ISBN syntax [ISO2108]. collection-label(ISBN) = ISBN ISBN = INTEGER "-" INTEGER "-" INTEGER "-" DIGITX | Cameron and Tatu Experimental [Page 8] INTERNET-DRAFT BibP Level 1 August 2000 DIGIT DIGIT DIGIT DIGIT DIGIT DIGIT DIGIT DIGIT DIGIT DIGITX INTEGER = DIGIT {DIGIT} The preferred and canonical form of ISBNs includes the correct hyphe- nation to separate it into four fields for the country/group coding, publisher coding, title coding and check digit. Each of the first three fields is variable length, but in total these fields must con- tain exactly nine digits. As with ISSNs, x may be used for the check digit, but X is preferred and canonical. With RDNS domains, collection labels should be the identifiers actu- ally used by the institution. For example, the Internet Engineering Task Force uses RFC to refer to documents in its Request for Comments Series, so this collection may be identified by the USIN RDNS(ietf.org)/RFC. The technical report series of SFU's School of Computing Science is denoted RDNS(sfu.ca).CMPT/TR. Theses published by an institution are conventionally denoted by the abbreviation for the degree, so RDNS(sfu.ca).CMPT/PhD denotes the Ph.D. thesis series of the School. 2.7 Items and Item Extensions Item is the generic term used to refer to an individual document or group of documents that form an identified division within the hierarchical identification scheme of a collection. For example, the volumes, issues and articles of a journal are all items. Within the context of a specific collection, item extensions are the USIN suffixes that specify items. In general, the syntax and interpretation of item extensions depends on the particular collec- tion or publication domain involved. However, each item extension conforms to the generic grammar of Section 2.2. The USIN conventional syntax predefines a number of item extensions for common forms of hierarchical identification. These involve the operators ":" for introducing the principal enumeration scheme of a collection, "@" for page-based article specification and "$" for direct article specification by symbol or count. Whenever a collection is explicitly divided into enumerated divi- sions, the ":" operator is used to introduce the division label. Volumes of a journal are a typical use, so :10 is the item-extension specifying volume 10 of Learned Publishing in the USIN ISSN/0953- 1513:10@135. Although volumes will be denoted by integer numerals in most cases, the conventional syntax also permits arbitrary symbols. For example, ISSN/0098-5589:SE-12 denotes Volume SE-12 of IEEE Tran- sactions on Software Engineering. Cameron and Tatu Experimental [Page 9] INTERNET-DRAFT BibP Level 1 August 2000 Report numbers, year numbers and other top-level enumeration elements are also introduced using the ":" operator. For example, RDNS(ietf.org)/RFC:2396 denotes RFC 2396 of the IETF Request for Com- ment series, while RDNS(sfu.ca).CMPT/PhD:2000 denotes PhD theses pub- lished in the year 2000 by the SFU School of Computing Science. The USIN convention for journals also includes a syntax for issue numbers as a second level of enumeration, namely a parenthesized phrase. Thus ISSN/0953-1513:10(2) denotes volume 10, issue 2 of Learned Publishing. Special issues and combined issues typically use non-numeric issue strings. For example, ISSN/0038-0644:20(S2) denotes special issue S2 of volume 20 of Software--Practice & Experience (December 1990), while ISSN/0361-526X:36(3/4) denotes combined issue 3/4 of volume 36 of Serials Librarian (1999). Note that the parenthesized notation for issues is also quite common in biblio- graphic citations; the USIN convention takes advantage of this for mnemonic effect. The second conventional operator under the USIN system is the "@" operator for specifying articles in books or journals by starting page number. For example, the article at page 135 of Learned Publish- ing 10(2) is denoted ISSN/0953-1513:10(2)@135. For journals that are paginated by volume, such as this one, the issue number may be omit- ted; ISSN/0953-1513:10@135 is thus equivalent to the USIN just given. In the event that more than one article starts on a given page, the articles are numbered sequentially with an alphabetic code: a for the first, b for the second, c for the third, and so on. In the rare event that there are more than 26 articles on the page, the code allows arbitrary base 26 numerals such as aa for the 27th item, ab for the 28th and so on. The final form of item extension in the USIN conventional syntax uses the "$" operator to specify articles in unpaginated e-journals or other contexts by a numeric or symbolic label. Numeric labels indi- cate either explicit or implicit enumeration within contents lists. However, where a clear symbolic label exists either in plain text or encoded in the article URL, then the symbolic form is preferred and canonical. For example, the article "Towards Universal Serial Item Names" published in Volume 1, Issue 3 of the Journal of Digital Information is denoted ISSN/1368-7506:1(3)$Cameron, where Cameron is the symbolic code clearly used in the JoDI URL to distinguish this article from others in the same issue. The generic USIN grammar supports the definition of many additional forms of item extension. Future developments of the USIN system will likely introduce additional conventional syntax as well as mechanisms for specifying domain- or collection-dependent syntax. Cameron and Tatu Experimental [Page 10] INTERNET-DRAFT BibP Level 1 August 2000 2.8 Attributes and Attribute Specifiers Attributes are properties or metadata elements that pertain to a par- ticular collection or item. The USIN syntax reserves the "!" operator for introducing attribute-specifiers, as shown in the grammar of Sec- tion 2.3 above. An attribute-specifier itself consists of a symbol naming the attribute, with an optional parenthesized phrase to specify a parameter value. attribute-specifier = symbol [phrase] For example, ISSN/0953-1513!title denotes the title of the journal whose ISSN is 0953-1513, namely Learned Publishing, ISSN/0953- 1513:10@135!title denotes the article title "Information Identifiers" and ISSN/0953-1513:10@135!author(1) denotes the first (and only, in this case) author of this article, namely, Norman Paskin. In general, attributes denote publication facts about particular items or collections. Attributes are not intended to account for classification or other metadata that may be attributed to items by third parties. Philosophically, third-party metadata is considered to be interpretative, not factual. Different third parties may well describe and/or classify the same document in quite different ways. Thus the attribute sets used with USINs may be expected to be sub- stantially narrower than general metadata element sets such as those of Dublin Core [RFC2413]. Of particular importance to the further development of the BibP net- work as it evolves towards the concept of a universal citation data- base [UCD] is the parameterized ref attribute. This attribute refers to the bibliographic references in a document, identified by numeric or symbolic citation tag. For example, RDNS(sfu.ca).CMPT/TR:2000- XX!ref(UCD) denotes the document cited as UCD in this paper. The ref attribute supports even broader coverage of the literature than that provided by the direct identification provisions of the USIN conventional syntax. Any documents that are cited within other documents may be identified by specification of the citing document and a citation tag. Effectively, this provides for universal coverage of all documents that are transitively reachable by citation. The attribute framework of the USIN scheme is substantially an area for future work, however. No requirements for attribute processing are specified under BibP Level 1, except to recognize that attribute syntax is valid. 3. BibP Level 1: Description and Rationale Cameron and Tatu Experimental [Page 11] INTERNET-DRAFT BibP Level 1 August 2000 BibP Level 1 establishes the syntax of BibP links together with requirements on HTTP-based client-server interactions for resolving individual links and retrieving bibliographic metapages for display to the user. A BibP client is a web browser or other user agent that either has built-in support for BibP (BibP-aware user agent) or operates in conjunction with an appropriate client-side script. (Sec- tion 4 presents one such script-based implementation of BibP link resolution). The BibP client resolves BibP links by identifying an appropriate BibP server and generating a well-formatted BibP request to that server. Upon receiving the request, the BibP server is responsible for generating an HTML page presenting relevant biblio- graphic and service information with respect to the cited item. 3.1 BibP Link Syntax A BibP link is a uniform resource identifier (URI) [RFC2396] of the form bibp:UOS, where UOS is a USIN Octet Sequence as described below. In the parlance of RFC2396, BibP links are absolute URIs whose scheme is bibp and whose scheme-specific-part is a UOS of a cited USIN. The UOS is considered an opaque part because its structure has no meaning with respect to the network. In the normal case, a UOS is simply the representation of a USIN as an ASCII character string [ASCII]. Under BibP Level 1, the only exception is that WHITE characters must be encoded according to the following grammar. WHITESPACE = (CR | LF | HT | SPACE)* CR = "%" "0" ("D" | "d") LF = "%" "0" ("A" | "a") HT = "%" "0" "8" SPACE = "%" "2" "0" That is, the URI transformation of escape encoding [URI] must be applied to the normal ASCII spacing control characters to produce the UOS. Also note that the WHITESPACE grammar permits newlines to be encoded using any of the common file format conventions with various combinations of CR and LF characters. As described previously, the USIN character is subject to future extension to include non-ASCII characters of Unicode/ISO 10646 for the purpose of internationalization. These characters may be represented within a UOS by first expressing them as octet sequences in the UTF-8 format of Unicode and then applying the URI-encoding transformation to the octets. Because UTF-8 octet sequences for non- ASCII characters always have their high-order bit set, the first hex digit of the escaped encoding will be 8 through F. Thus character sequences of the following grammar may be expected. Cameron and Tatu Experimental [Page 12] INTERNET-DRAFT BibP Level 1 August 2000 A_F = "A" | "B" | "C" | "D" | "E" | "F" | "a" | "b" | "c" | "d" | "e" | "f" HEX8_F = "8" | "9" | A_F HEX = DIGIT | A_F UTF-8_encoded = "%" HEX8-F HEX Finally, although the canonical and preferred representation of the USIN characters under BibP Level 1 is indeed as normal ASCII octets, URI-encoded forms thereof are permitted and considered equivalent. Thus a UOS may also contain character sequences of the following grammar. HEX2_7 = "2" | "3" | "4" | "5" | "6" | "7" ASCII_encoded = "%" HEX2_7 HEX The USIN syntax is designed to make considerations of escape encoding completely transparent to the user. Under BibP Level 1, the whitespace-free form of every USIN may be entered directly as a nor- mal ASCII character sequence. Escaped forms will normally only be generated by BibP-aware document composition software supporting the cut-and-paste of USINs or other software that performs escape encod- ing as a part of of general URI processing. 3.2 Server Identification Hierarchy The first step in BibP link resolution is identification of an appropriate BibP server to handle the request. In order of prefer- ence, a BibP client must select from the following servers. - The local bibhost, if it exists (Section 3.3). - The document-specified citehost, if it exists (Section 3.4). - A known global server (such as usin.org) (Section 3.5). This server identification hierarchy provides for a scalable BibP network with particular provisions for library- and publisher- operated BibP servers. Library-operated servers that provide access to local holdings and site licensing information will generally be made available through the bibhost mechanism. Publisher-operated servers that provide particular support for the BibP links contained in a given document may be specified with the citehost mechanism. The citehost is consulted directly if the local bibhost is unavailable, and is also passed as a parameter in bibhost-based resolution to pro- vide for indirect consultation (see Section 3.7). Both mechanisms provide for scalability by reducing the load on global BibP servers as the overall BibP network grows. Cameron and Tatu Experimental [Page 13] INTERNET-DRAFT BibP Level 1 August 2000 A BibP-aware user agent may provide a finer-grained hierarchy for server identification by allowing users to specify overriding servers at any position within the hierarchy. For example, a particular BibP-aware web browser may specify three separate configuration set- tings, one each for overriding the BibP server determination at the bibhost, citehost and global levels. 3.3 Default Local Server The key characteristic of BibP Level 1 is the ability for a locally available server to act as the default BibP server for a particular user environment. The following conventions apply. - The DNS (Domain Name System) alias bibhost is used to identify the default BibP server (if one exists) in the the local environment of the web browser or other user agent. For example, if a web browser accessing a BibP link is operating in the univ.edu domain, then the typical configuration of the local DNS resolver would interpret the relative domain name bibhost as the fully qualified domain name bibhost.univ.edu (if it exists). In accord with the recommendations of RFC 2219 [RFC2219], bibhost is the conventional DNS alias for the BibP protocol. - A bibhost server must signal its ability to respond to BibP Level 1 requests by provision of HTTP access to the BibP Identification Icon at the URL http://bibhost/bibp1.0/bibpicon.jpg. If this icon is unavailable, a user agent may direct resolution of a BibP link to the next level in the server hierarchy. - A bibhost server must provide HTTP-based access to a JavaScript- based implementation of BibP link resolution at the URL http://bibhost/bibp1.0/bibres.js. See Section 4 for a suitable script. A user agent may implement link resolution through this script or by some other method. If the script is unavailable, a user agent may direct resolution of a BibP link to the next level in the server hierarchy. Use of the DNS alias bibhost provides a browser-independent and highly configurable mechanism for identifying local BibP servers. Using the normal configuration options available with typical DNS software, it is possible to configure a local BibP server on either a per-client basis or a per-domain basis. Configuration of the DNS resolver on a client machine can specify the machine to be used as bibhost for that client only. However, configuration of a DNS nameserver to provide a bibhost definition for an entire local domain will normally be a much more convenient option. Such a configuration can generally be made without requiring any configuration actions on individual client machines on the network, assuming only that the DNS Cameron and Tatu Experimental [Page 14] INTERNET-DRAFT BibP Level 1 August 2000 resolvers on those machines follow the usual practice of including the local domain in the search list for resolution of relative domain names. The BibP Identification Icon has four roles. First, it provides a graphical trademark serving to visually identify a particular bibhost as a participating server with respect to the BibP network. Second, it provides an extra level of assurance to user agents that bibhost does indeed denote a BibP server rather than a machine that just hap- pens to have that name. Third, it allows distinction between dif- ferent levels and versions of the BibP protocol that may be supported by a particular BibP server. Fourth and finally, given the restric- tive security model of JavaScript and other client-side scripting languages, it also provides for feasible script-based testing of bibhost existence using image preloading. The availability of a JavaScript-based resolver on the bibhost server provides for flexibility, scalability and maintainability. Although other resolution mechanisms exist, the local script nevertheless pro- vides authors, publishers and user agents the flexibility to delegate link resolution to the local service. Such delegation represents an inherently scalable design in comparison to an implementation that relies on JavaScript served from a single global source. Furthermore, as the BibP protocol evolves, previously published documents can benefit from updated resolution scripts installed on local bibhosts. The use of the path component bibp1.0 in the URLs for the identifica- tion icon and local resolution script identifies specific support for Level 1 of BibP. Future clients dependent on services defined at Level 2 must not assume that these are available from a bibhost iden- tifying itself as a provider of Level 1 services only. 3.4 The Document-Specified Server In order to identify a BibP server that provides specific and known support for the links in a particular document, publishers or authors may use the citehost mechanism. In the absence of a local bibhost, the citehost denotes the actual BibP server to be used for link reso- lution and metapage retrieval. When a local bibhost is known, the citehost setting is passed on to the bibhost for consultation or citation as a service relevant to the identified document. To set the citehost to http://www.pubhost.com/, for example, two declarations should be included in the element of the docu- ment. These two declarations respectively identify the citehost to BibP- aware user agents and JavaScript-based user-agents. The latter declaration is defined to work with the JavaScript resolvers presented in Section 4. Other client-side resolvers (for example, using different scripting languages) may use different conventions for establishing the citehost. 3.5 Default Global Server In the absence of either a local bibhost or a document-specified citehost, a web browser or other user agent must use a known global server as the default BibP server. At the time of writing of this report, the prototype BibP server at usin.org is available and is being further developed as the recommended global server this pur- pose. 3.6 Link Translation After identification of an appropriate server to resolve BibP links, the second step in link resolution is to generate well-formed HTTP requests to that server. The form of those requests is specified using the following translational semantics. A BibP URI of the form bibp:UOS is equivalent to an HTTP URL of one the following forms. - http://server/bibp1.0/resolve?usin=UOS - http://server/bibp1.0/resolve?citehost=citehost&usin=UOS The second form is used when a document-specified citehost is defined in accordance with Section 3.4. In both cases, server denotes the BibP server determined by the rules of Sections 3.2 through 3.5 above. The path component bibp1.0 indicates that the client is expecting resolution services defined at this level (Level 1) of BibP. A user agent may use this translation rule either explicitly or implicitly to generate well-formed HTTP requests. If used explicitly, the form of the required HTTP request follows directly from the HTTP 1.1 specification [RFC2616]. However, the translation may be impli- cit, so long as the HTTP request generated is that same as that specified by the explicit translational semantics. 3.7 Metapage Response Given an HTTP request constructed according to the specifications of Cameron and Tatu Experimental [Page 16] INTERNET-DRAFT BibP Level 1 August 2000 Section 3.6, a BibP server must generate an appropriate response in the form of an HTML document [HTML]. When a UOS corresponding to a valid USIN for a known document has been cited, the response page should report bibliographic and service metadata in a format intended for human readers as follows. - The canonical form for the USIN should be reported, removing whi- tespace and performing transformations as described previously. - Basic bibliographic metadata for the appropriate document type should be provided. For journal articles, this typically includes authors, title, journal, volume, issue, year, month and pagina- tion. For books, author, title, publisher, publisher address, date and total pagination are usual. Other document types include the appropriate bibliographic elements commonly accepted to estab- lish a bibliographic citation. - Additional bibliographic metadata may be provided. This may include an abstract, keywords, classification metadata, known reviews, citations received, additional author information and so on. However, a BibP server must provide only factual metadata in the public domain or copyrighted metadata where explicit permis- sion has been obtained. Links to copyrighted materials (e.g., reviews) are preferred. - Known service metadata for the cited item should be provided. This may include on-line full-text access, paper-based library holdings, document delivery options, additional bibliographic sources and so on. A BibP server operating as bibhost for a par- ticular domain will be expected to emphasize locally available services for that domain. - If citehost has been specified, a link to, or information from, the appropriate document metapage at citehost must be provided. Under BibP 1.0, no additional constraints are placed on the metadata to be provided on the response page or its format. The intent is to provide a relatively open framework to allow the development of alternative models for document metapages. BibP 1.0 servers may freely format metadata for human readers, without consideration of how this data may be extracted under program control. However, subsequent development of BibP is expected to specify formal requirements for server-to-server interaction for sharing of basic bibliographic and service metadata. 3.8 Fault Handling Cameron and Tatu Experimental [Page 17] INTERNET-DRAFT BibP Level 1 August 2000 BibP servers must provide mechanisms to handle errors, ambiguities and unknowns. - BibP servers should check the syntax of resolve requests for con- sistency with Section 3.6 and the individual syntax of USINs for consistency with Section 2 and report any errors. However, if a syntax error in a resolve requests consists of additional keyword=value parameters with & separators, then the server should simply warn of unknown parameters and continue to respond to the request by ignoring these parameters. - In the event that a USIN is known not to denote any entity, a BibP server must report this fact and should provide links to allow exploration of nearby valid USINs. For example, if a USIN refers to an article starting on a particular page of a journal and it is known that no article starts on that page, the server may provide links to the article that starts on the closest previous page as well as links to the apparent issue and volume that were intended. - In the event that an article-level USIN is known to be ambiguous (incomplete), a BibP server must report all matching articles and their USINs. For example, when more than one article starts on a particular page of a journal, omission of the disambiguating suf- fixes a, b, and so on, is likely to be common. By reporting all articles on the page, the BibP server nevertheless provides a credible response to the ambiguous USIN. - In the event that a BibP server has insufficient data to fully resolve a USIN, the server should nevertheless report any informa- tion that is known. For example, if a server request involves the USIN of an article in a known journal, but the article metadata itself is unavailable, the server should report the journal title, volume and page. 3.9 Future Server Requirements It is anticipated that BibP Level 2 will impose additional require- ments on BibP servers, particularly in the areas of server-to-server interactions and acceptance of metadata submissions. BibP Level 3 is further planned to incorporate the capture and dissemination of the citing relationship (from citing works to cited works) as metadata, as a step towards the universal citation database [UCD]. BibP Level 1 server software should be designed to accomodate these evolving requirements. 4. A JavaScript Resolver for BibP This section presents and documents a JavaScript program for client- Cameron and Tatu Experimental [Page 18] INTERNET-DRAFT BibP Level 1 August 2000 side resolution of BibP. This program is intended to be embedded in the HEAD element of HTML documents to implement client-side resolu- tion with versions of Netscape Navigator (3.0 and above) and Internet Explorer (4.0 and above). The prefix BibP_ is used for all global functions and variables of the resolver so that the resolver can be freely mixed with other client-side JavaScript that respects this prefix. The full script, in a relatively condensed form for ease of cut-and-paste, is presented immediately below and documented in the following subsections. 4.1 Setting BibP_BaseURL The core strategy of the resolver is to define and use the global variable BibP_BaseURL as the common prefix for resolution of BibP links. That is, given a link of the form bibp:USIN, the link transla- tion of Section 3.6 is performed by concatentation of BibP_BaseURL and USIN. The function BibP_SetBaseURL constructs the prefix given a BibP server as its input parameter and using the global setting of the BibP_citehost as described in Section 3.4. The determination of the server to be used for BibP_BaseURL follows the server identification hierarchy of Section 3.2. Initially, the value of BibP_BaseURL is set based on the document-specified BibP_citehost if it exists, or the global server usin.org, otherwise. Next, a test for the availability of a local bibhost is initiated. However, this test proceeds asynchronously and conceivably may not be complete by the time the document has been loaded and the first BibP link accessed. It is thus valuable to initialize using citehost or the global server to ensure that a BibP service is available. The test for the availability of bibhost uses the image preloading feature of common web browsers to check the required identification icon at http://bibhost/bibp1.0/bibpicon.jpg. The test is initiated with the assignment of the src property of Bibp_Icon. On a successful load event, the Bibp_onIcon handler is called. If an icon of nonzero height is reported, bibhost is used to establish BibP_BaseURL. A zero height icon indicates either that images were turned off in the browser (an empty icon is trivially loaded without verifying the existence of bibhost) or an erroneous icon. 4.2 Translating and Displaying BibP Links Once the BibP_BaseURL setting has been established, BibP links can be resolved by translating them to the appropriate URLs with respect to this base. The resolver carries out this task by changing the actual stored href value on the first MouseOver event for that link. A complication of this technique is that the link value displayed to the user in the window status bar would normally be the actual stored href value, not the original bibp: form. To display the original form, the strategy is to ensure that the BibP_BaseURL prefix is removed and replaced with the string "bibp:" whenever the link is to be displayed in the window status bar. Cameron and Tatu Experimental [Page 20] INTERNET-DRAFT BibP Level 1 August 2000 The function BibP_ShowResolved takes a URL as a parameter and deter- mines whether that URL is the resolved form of a BibP link (by check- ing for BibP_BaseURL as a prefix of the URL value). If so, the window.status is correctly displayed in the original form. A value of true or false is also returned to indicate whether or not a resolved link was found and displayed appropriately. The main function to perform resolution is BibP_TryResolve. It takes as parameters theURI as a potential bibp: link and target as the object whose href value should be changed to store the resolved form if appropriate. If the string bibp: is found within the URL, it is treated as a BibP link with everything after this string interpreted as a USIN and everything prior to it discarded. The discarding of a prefix prior to bibp: is necessary for some browser versions in which the browser would interpret the link as a relative path, adding a document base URL as a prefix. The methods by which event processing occurs in Netscape Navigator and Internet Explorer are slightly different, so distinct MouseOver event handlers must be registered each of them. In principle, though, each handler will first display any link that has already been resolved before going on to attempt link resolution. The onmouseout handler ensures that the status bar is cleared when the mouse is moved off a BibP link. 4.3 Future Development of Client-Side Resolvers It is anticipated that future developments of client-side resolvers will expand the coverage of browser support and perhaps add func- tionality. Updated resolvers should be available through each local bibhost of the BibP network. These resolvers may be directly used in documents served from the domain. For example, if bibhost.xxx.tld has been implemented, it is recommended that documents served from within the xxx.tld domain incorporate client-side resolution using the following declaration in the HEAD element. This provides for automatic updating of client-side resolvers without document modification. 5. Security Considerations BibP Level 1 defines read-only access to networked bibliographic information. The security concerns are therefore minimal. It is possible that spoofing of the bibhost for a particular domain could Cameron and Tatu Experimental [Page 21] INTERNET-DRAFT BibP Level 1 August 2000 provide inaccurate bibliographic or metaservice information. How- ever, such an effect would be localized and should be easy to address by the local domain administrator. A second security concern is the security of the default global service at usin.org as well as the potential use of www.bibhost.com to capture global services. Both of these domains have been registered by the first author of this docu- ment; eventually they should be turned over to an appropriate insti- tutional authority. 6. Conclusions Bibliographic Protocol Level 1 provides a new layer of abstraction for web-based reference linking. In essence, the linking to a copy or service with respect to a referenced document is eliminated in favor of a link to the document itself. The link specifies what the cited document is, not how to access it. Link resolution under BibP is based on an open-architecture model involving a network of library-based and publisher-based servers. A global BibP service is also defined, but can be overridden by a document-specific (publisher-based) servers, which can in turn be overridden by library-based bibhost servers. A JavaScript based client-side resolver can be incorporated into HTML documents to enable BibP with most recent versions of Netscape Navi- gator and Internet Explorer. A trivial BibP server implementation can be easily installed as a scaffold for experimental work and monitor- ing of BibP traffic. A nontrivial prototype implementation of BibP has also been constructed with full support for BibP Level 1 and several features of BibP Level 2 [BibP-MSc]. 7. References [BibP-MSc] Serban Tatu. "Bibliographic Protocol: Distributed Refer- ence Linking to Document Metaservices on the Web." M.Sc. Thesis, School of Computing Science, Simon Fraser Univer- sity, July 2000. USIN: RDNS(sfu.ca).CMPT/MSc:2000$SerbanTatu [DOI] Norman Paskin. "DOI: Current Status and Outlook," D-Lib Maga- zine Volume 5, Number 5, May 1999. USIN: ISSN/1082-9873:5(5)$paskin [ISO2108] International Organization for Standardization, Information and documentation - International standard book numbering (ISBN), ISO 2108:1992, 1992. USIN: RDNS(iso.ch)/ISO:2108(1992) Cameron and Tatu Experimental [Page 22] INTERNET-DRAFT BibP Level 1 August 2000 [ISO3297] International Organization for Standardization, Information and documentation - International standard serial number- ing (ISSN), ISO 3297:1998, 1998. USIN: RDNS(iso.ch)/ISO:3297(1998) [RFC1034] P. Mockapetris. "Domain Names - Concepts and Facilities," Request for Comments 1034, Internet Engineering Task Force, November 1987. USIN: RDNS(ietf.org)/RFC:1034 [RFC1737] K. Sollins and L. Masinter. "Functional Requirements for Uniform Resource Names," Request for Comments 1737, Inter- net Engineering Task Force, December 1994. USIN: RDNS(ietf.org)/RFC:1737 [RFC2219] M. Hamilton and R. Wright. "Use of DNS Aliases for Network Services," Request for Comments 2219, Internet Engineering Task Force, October 1997. USIN: RDNS(ietf.org)/RFC:2219 [RFC2396] T. Berners-Lee, R. Fielding and L. Masinter. "Uniform Resource Identifiers (URI): Generic Syntax," Request for Comments 2396, Internet Engineering Task Force, August 1998. USIN: RDNS(ietf.org)/RFC:2396 [RFC2413] S. Weibel, J. Kunze, C. Lagoze and M. Wolf. "Dublin Core Metadata for Resource Discovery," Request for Comments 2413, Internet Engineering Task Force, September 1998. USIN: RDNS(ietf.org)/RFC:2413 [RFC2616] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, and T. Berners-Lee. "Hypertext Transfer Proto- col -- HTTP/1.1," Request for Comments 2616, Internet Engineering Task Force, June 1999. USIN: RDNS(ietf.org)/RFC:2616 [UCD] Robert D. Cameron. "A Universal Citation Database as a Catalyst for Reform in Scholarly Communication." First Monday, Volume 2, No. 4, April 1997. USIN: ISSN/1396-0466:2(4)$cameron [Unicode] The Unicode Consortium. The Unicode Standard, Version 3.0, Addison Wesley Longman, Reading, Massachusetts, 2000. USIN: ISBN/0-201-61633-5 [USIN] Robert D. Cameron. "Towards Universal Serial Item Names," Journal of Digital Information, Volume 1, Number 3, October 1998. USIN: ISSN/1368-7506:1(3)$Cameron Cameron and Tatu Experimental [Page 23] INTERNET-DRAFT BibP Level 1 August 2000 Authors' Addresses Robert D. Cameron School of Computing Science Simon Fraser University 8888 University Drive Burnaby, B.C. V5A 1S6 Canada Phone: +1 604 291 3241 EMail: cameron@cs.sfu.ca Serban G. Tatu EMail: statu@cs.sfu.ca Cameron and Tatu Experimental [Page 24]