Network Working Group Juha Hakala Internet-Draft Helsinki University Library Category: Informational February 2000 draft-hakala-nbn-00.txt Expires: August 25, 2000 Using National Bibliography Numbers as Uniform Resource Names Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on August 25, 2000. Abstract This document discusses how national bibliography numbers (persistent and unique identifiers assigned by the national libraries) can be supported within the URN framework and the syntax for URNs defined in RFC 2141 [Moats].Much of the discussion below is based on the ideas expressed in RFC 2288 [Lynch]. Chapter 5 contains a URN namespace registration request modelled according to the template in RFC 2611 [Daigle et al.]. 1. Introduction As part of the validation process for the development of URNs the IETF working group agreed that it is important to demonstrate that the current URN syntax proposal can accommodate existing identifiers from well established namespaces. One such infrastructure for assigning and managing names comes from the bibliographic community. Bibliographic identifiers function as names for objects that exist both in print and, increasingly, in electronic formats. RFC 2288 [Lynch et. al.] investigated the feasibility of using three identifiers (ISBN, ISSN and SICI) as URNs. This document will analyse the usage of national bibliography numbers (NBNs) as URNs. The need to extend analysis to new identifier systems was shortly discussed in the RFC 2288 as well, with the following summary: "The issues involved in supporting those additional identifiers are anticipated to be broadly similar to those involved in supporting ISBNs, ISSNs, and SICIs". Note that this document does not purport to define the "official" standard way of using national bibliography numbers as URNs; it merely demonstrates feasibility. A registration request for acquiring Namespace Identifier (NID) "NBN" for national bibliography numbers has been written by the National Library of Finland on the request of Conference of Directors of National Libraries (CDNL) and Conference of the European National Librarians (CENL). The request is included into chapter 5 of this text. The document at hand is part of a global co-operation of the national libraries to foster identification of electronic documents in general and utilisation of URNs in particular. It should be noted that some national libraries, including national libraries of Finland, Norway and Sweden, are already assigning NBN-based URNs for electronic documents. Following the registration request, we have used the URN Namespace Identifier "NBN" for the national bibliographic numbers in examples below. 2. Identification vs. Resolution As a rule the national bibliography numbers identify finite, manageably- sized objects, but these objects may still be large enough that resolution to a hierarchical system is appropriate. The materials identified by a national bibliography number may exist only in printed or other physical form, not electronically. The best that a resolver will be able to offer in this case is bibliographic data from a national bibliography database, including information about where the physical resource is stored in national library's holdings. The URN Framework provides resolution services that may be used to describe any differences between the resource identified by a URN and the resource that would be returned as a result of resolving that URN. However, NBNs will be used for instance to identify resources in digital Web archives created by harvester robot applications. In this case, NBN will identify exactly the resource the user expects to see. 3. National bibliography numbers 3.1 Overview National Bibliography Number (NBN) is a generic name referring to a group of identifier systems utilised by the national libraries and only by them for identification of deposited publications which lack an identifier, or to descriptive metadata (cataloguing) that describes the resources. Each national library uses its own NBN strings independently of other national libraries; there is no global authority which controls them. For this reason NBNs are unique only on the national level. When used as URNs NBN strings must be augmented with a controlled prefix such as country code. These prefixes guarantee uniqueness of the NBN-based URNs on the global scale. NBNs have traditionally been given to documents that do not have a publisher-assigned identifier, but are catalogued to the national bibliography. NBNs can be seen as a fall-back mechanism: if no other, better established identifier such as ISBN can be given, an NBN is assigned. In principle, NBN usage enables identification of any Internet document. Local policies may limit the NBN usage to much smaller subset of documents. Some national libraries (e.g. Finland, Norway, Sweden) have established Web-based URN generators, which enable authors and publishers to fetch NBN-based URNs for their network documents. At least national libraries of Sweden and Finland are harvesting and archiving domestic Web documents (and a number of other libraries plan to start this activity), and long-time preservation of these materials requires persistent and unique identification. NBNs can be and are in fact already used as internal identifiers in these Web archives. Both syntax and scope of NBNs can be decided by each national library independently. Typically, a NBN consist of one or more letters and/or a number. This simple syntax makes NBNs infinitely extensible and very suitable for e.g. naming of the Web documents. For instance the application used by the national library of Finland for Web harvesting creates NBNs which are based on the MD5 checksum of the archived resource. 3.2 LCCN Two examples of NBN systems are LCCNs (Library of Congress Control Number) used by the Library of Congress, and F-code assigned by the National Library of Finland. The Library of Congress Card Number was the number used to identify and control catalog cards. With the development of the MARC format and the first distribution of machine-readable records for book materials in the late 1960s, the name of the LCCN was changed to Library of Congress Control Number. LCCNs are currently structured as follows: Element Length Positions Alphabetic Prefix 3 00-02 Year 2 03-04 Serial Number 6 05-10 Supplement Number 1 11 The uniqueness of the LCCN is determined by the first 11 positions (positions 00-10). The Supplement Number has never been used by the Library of Congress and this position is always blank. The Supplement Number may be followed by two kinds of variable length data known as Suffix/Alphabetic Identifier and Revision Date. Each Suffix/Alphabetic Identifier is preceded by a slash as is Revision Date. If there is noSuffix/Alphabetic Identifier, the Revision Date is preceded by two slashes. According to the RFC 2141, "RFC 1630 [2] reserves the characters "/", "?", and "#" for particular purposes. The URN-WG has not yet debated the applicability and precise semantics of those purposes as applied to URNs. Therefore, these characters are RESERVED for future developments. Namespace developers SHOULD NOT use these characters in unencoded form, but rather use the appropriate %-encoding for each character". Thus the slash character ("/") has to be encoded according the requirements of RFC2141. There are no other characters in LCCN that need encoding. For more information about the LCCN, see http://lcweb.loc.gov/cds/mdslccn.html. 3.3 F-code F-codes have been used since early 20th century to identify and control catalogue cards and later MARC records in the national bibliography. In 1998 the national library of Finland decided to enable the Finnish authors to fetch F-codes to their Internet documents, if these documents do not qualify for other identifiers such as ISBN. Authors and publishers can retrieve F-codes, embedded into URNs, from the URN generator (http://www.lib.helsinki.fi/cgi-bin/urn.pl) developed in co- operation between the national library of Finland and the Lund University library, NETLAB unit. There is a user guide, which tells the users how to embed the NBN-based URNs into the identified documents. F-codes are also used within the Web harvesting and archiving software, which has been built to the Networked European Deposit Library (NEDLIB) project (see http://www.konbib.nl/nedlib). This application calculates MD5 checksum for each archived resource, and then builds an NBN-based URN from the checksum. The URN serves then as a unique identifier to the archived resource. Traditional identifiers can not be used for this purpose, since there may for instance be several variants of a book which (quite rightly so) all have the same ISBN. Moreover, identifiers embedded into a document do not necessarily belong to the document itself; the Web archiver can not trust the identifier information it finds. The F-code built by the URN generator consist of: Prefix (for example fe) Year (YYYY; for example 1999) Number (for example 1055) The generator also adds namespace identifier "NBN" and ISO 3166 country code. Thus a URN based on F-code would in this case be for instance urn:nbn:fi-fe19991055. URNs created by the Web archiver have similar overall structure, except that prefix (which may be defined by the operator) is fea and year is not used. An example of a URN built by the Web archiver: urn:nbn:fi-fea- 5c5875e6e49ae649cad63e5ee4f6c346. F-codes never need any special encoding when used as URNs, since they consists of alphanumeric codes only (0-9, a-z). This is often the case for other NBN systems as well. 3.4 Encoding Considerations and Lexical Equivalence Embedding NBNs within the URN framework presents usually no particular encoding problems, since all of the characters that can appear in commonly used NBN systems can be expressed in special encoding, as described in RFC 2141 [MOATS]. When an NBN is used as an URN, the namespace specific string will consist of three parts: prefix, consisting of either a two-letter ISO 3166 country code or other string, delimiting character (hyphen, colon or hash sign) and NBN string assigned by the national library. Non-ISO 3166 -prefixes must be registered. The Library of Congress will maintain the central register of reserved codes, and make it available to the national libraries. All two-letter codes are reserved for existing and possible future ISO country codes and may not be used as non-ISO prefixes. If there are several national libraries in one country who use the same prefix - for instance, a country code -, they need to agree on how to split the sub-namespace between them. Models: URN:NBN:- URN:NBN:- Examples: URN:NBN:fi-fe19981001 (A "real" URN assigned by the National Library of Finland). URN:NBN:LCCN:2001000168 (A LCCN-based hypothetical URN assigned by the Library of Congress). 3.5 Resolution of NBN-based URNs As a dumb code NBN would be difficult to resolve globally as such. The (usually) country code -based prefix part of the URN namespace specific string will provide a guide to where to find a resolution service and the NBN register will identify the assigning agency. Once the NBN-based URN resolution is in global usage, the number of prefixes will slowly become equal or even slightly bigger than the number of national libraries. If NBN assignment is limited to the national bibliography database, then all NBN-based URNs for that country will be resolved there. In one model these databases contain detailed resource descriptions including URLs, which will point both to the copy of the document in the Internet and to the copy in the national library's (legal) deposit collection. Due to the limitations in the usage of legal deposit documents it is possible that the deposited electronic materials can not be delivered outside the premises of the national library. If it is possible for the authors and publishers to retrieve NBNs to Web documents and there is no obligation to deposit thus identified documents to the national library, URN resolution service is not possible without a national Web index and archive, maintained by the national library or other organisation/organisations. Web index/archive will also resolve URNs machine-generated to the archived Web documents. 3.6 Additional considerations Guidelines adopted by each national library define when different versions of a work should be assigned the same of differing NBNs. These rules apply only if identifier assignment is done manually. If identifiers are allocated programmatically, the only criteria that can be used is that two documents which are identical on the bit level (have the same MD5 checksum) are deemed identical and should receive the same NBN. The likelihood of this happening to dissimilar documents is about 2^64, according to the RFC1321. The rules governing the usage of NBNs are less strict than those specifying the usage of ISBN or other, better established identifiers. Since the NBNs have up to know been given only by the personnel (cataloguers) working in the national libraries, the identifier assignment has in practice been well co-ordinated. It is obvious that a NBN URN will resolve to single instance of the work if identifier assignment has been automatic. Given the nature of NBNs it is also likely that different versions of the same work will receive different NBNs even if identifier is given manually. 4. Security Considerations This document proposes means of encoding several existing bibliographic identifiers within the URN framework. This document does not discuss resolution except in a very generic level; thus questions of secure or authenticated resolution mechanisms are out of scope. It does not address means of validating the integrity or authenticating the source or provenance of URNs that contain bibliographic identifiers. Issues regarding intellectual property rights associated with objects identified by the various bibliographic identifiers are also beyond the scope of this document, as are questions about rights to the databases that might be used to construct resolvers. 5. Namespace registration URN Namespace ID Registration for the National Bibliography Number (NBN) Namespace ID: NBN This Namespace ID has been in production use in demonstrator systems since summer 1998; at least hundreds of URNs from this namespace have been delivered already in Finland and Sweden. Registration Information: Version: 2 Date: 2000-02-25 The first registration of the NID "NBN" was done via the URN WG in November 1998. Declared registrant of the namespace: Name: Juha Hakala E-mail: juha.hakala@helsinki.fi Affiliation: Helsinki University Library - The National Library of Finland, Conference of European National Librarians (CENL) and Conference of Directors of National Libraries (CDNL) Address: P.O.Box 26, 00014 Helsinki University, Finland Both CENL and CDNL made decisions to foster the usage of URNs during 1998. Both organisations have set up a working group for this purpose. One item in the common work plan is utilisation of national bibliography numbers (NBNs; see below) as URNs for identification of grey literature published in the Internet. NBN namespace will enable the national libraries to do this. The namespace will be available for all national libraries in the world. Declaration of syntactic structure: The namespace specific string will consist of three parts: prefix, consisting of either a two-letter ISO 3166 country code or other string, delimiting character (hyphen, colon or hash sign) and NBN string assigned by the national library. A namespace specific string must be unique when normalised to omit the delimiter between the prefix and the string. Non-ISO prefixes must be registered. A global registry, maintained by the Library of Congress, will be created and made available via the Web. Contact information: nbn.register@loc.gov.us. All two-letter codes are reserved for existing and possible future ISO country codes and may not be used as non-ISO prefixes. If there are several national libraries in one country who want to use the same prefix - for instance, a country code -, they need to agree on how to split the namespace between them into smaller sub-domains. These smaller domains must be registered if they are resolved on different sites. Similarly, a single national library may utilise various sub- domains; for instance, the National Library of Finland already has two domains, fi-fe for author-assigned URNs and fi-fea for URNs built by the Web harvesters. Models: URN:NBN:- URN:NBN:- Examples: A country code -based URN: URN:NBN:fi-fe19981001 (A URN assigned by the National Library of Finland). Non-country code based URN: URN:NBN:LCCN:2001000168 (A hypothetical URN assigned by the Library of Congress). Relevant ancillary documentation: National Bibliography Number (NBN) is a generic name referring to a group of identifier systems used by the national libraries for identification of deposited publications which lack an identifier, or to descriptive metadata (cataloguing) that describes the resources. Each national library uses its own NBN strings independently of other libraries; there is no global authority which controls them. For this reason NBNs are unique only on the national level, and the controlled prefix guarantees uniqueness on the global scale. NBNs have traditionally been given to documents that do not have a publisher-assigned identifier, but are catalogued to the national bibliography. When assigned as URNs, these NBNs will fit into the global URN resolution services. Some national libraries (Finland, Norway, Sweden) have established Web-based URN generators, which enable authors and publishers to fetch NBN-based URNs for their network documents. Both syntax and scope of NBNs can be decided by each national library independently. Typically, a NBN consist of one or more letters and a number. Identifier uniqueness considerations: NBN strings assigned by two national libraries may be identical. For this reason usage of prefix in the namespace specific string is obligatory for guaranteeing global uniqueness of NBN-based URNs. In the national level, libraries utilise different policies for guaranteeing uniqueness. A national library may automate the delivery of NBN-based URNs. In this case, the NBNs are assigned sequentially by a program (URN generator). Identifier persistence considerations: Persistence of the NBNs as identifiers is guaranteed by the persistence of national libraries and information systems, such as national bibliographies, maintained by them. NBNs have been used for several centuries for printed materials. NBN-based identification of electronic documents is a recent practice, but it is likely to continue for a very long time. Process of identifier assignment: Assignment of NBN-based URNs is always controlled in the national level by the national library / national libraries. In Europe, Conference of the European National Librarians will co-ordinate the URN practices in member libraries via a working group established in 1998. In the global level, Conference of Directors of National Librarians (CDNL) has established in 1999 a task force with similar aims. National libraries may choose different strategies in assigning NBN- based URNs. One option is assignment by the library personnel only. This is typically done when the document is catalogued into the national bibliography. A national library may also set up a URN generator (generators), and allow publishers and authors to retrieve NBN-based URNs from there. In this case there is no guarantee that the document will be catalogued into the national bibliography. Besides the harvester the national libraries may develop other applications such as Web harvesters/archivers which utilise URNs for identification purposes. Process for identifier resolution: URNs based on NBNs will be primarily resolved via the national bibliography databases. In one model these databases contain detailed resource descriptions including URLs, which will point both to the copy of the document in the Internet and to the copy in the national library's (legal) deposit collection. Due to the limitations in the usage of legal deposit documents it is possible that the deposited materials can not be delivered outside the premises of the national library. For those documents not catalogued into the national bibliography database URN resolution may take place via national or international Web indexes and/or archives. Nordic national libraries have established a joint initiative called Nordic Web Index / Nordic Web Archive (NWI/NWA), which aims at creating national Web archives and indexes into all Nordic countries. As a dumb code NBN would be difficult to resolve globally as such. The prefix part of the URN namespace specific string will provide a guide to where to find a resolution service and the NBN register will identify the assigning agency. It will be necessary to establish a DNS NAPTR resource record for each prefix; the total number of these records may in the end be about 200. Initially, only a handful of records will be needed. Within each record, there will be one or more resolution services specified, depending on the assignment policy of the national library. If NBN assignment is limited to the national bibliography database, then all NBN-based URNs for that country will be resolved there. If it is possible to retrieve NBNs to Web documents, full-scale URN resolution service is not possible without a national Web index and archive. Rules for Lexical Equivalence: None in the global level. Any national library may provide its own rules, on the basis of its NBN syntax. Conformance with URN Syntax: All NBNs we know of are ASCII strings consisting of letters (a-z) and numbers (0-9). If NBN contains characters that are reserved in the URN syntax, this data must be presented in hex encoded form as defined in RFC2141. A national library may limit the full scope of its NBN strings in URN usage in such a way that there are no reserved characters in the URN namespace specific strings. Validation mechanism: None specified on the global level. A national library may use NBNs, which contain a checksum and can therefore be validated, but this is for the time being not a common practice. Scope: Global. 6. References [Daigle et al.]: Daigle, L., van Gulik, D., Iannella, R. & Faltstrom, P.: URN Namespace Definition Mechanisms, RFC2611, June 1999. [Lynch] Lynch, C., Using Existing Bibliographic Identifiers as Uniform Resource Names, RFC 2288, February 1998 [Moats] Moats, R., "URN Syntax", RFC 2141, May 1997. 7. Authors' Address Juha Hakala Helsinki University Library - The National Library of Finland P.O. Box 26 FIN-00014 Helsinki University FINLAND EMail: juha.hakala@helsinki.fi 8. Full Copyright Statement Copyright (C) The Internet Society (2000). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.