Internet DRAFT - draft-hakala-nbn

draft-hakala-nbn




Network Working Group                                       Juha Hakala
Internet-Draft                              Helsinki University Library
Category: Informational                                   February 2000
draft-hakala-nbn-00.txt
Expires: August 25, 2000





                Using National Bibliography Numbers as
                         Uniform Resource Names

Status of this Memo

This document is an Internet-Draft and is in full conformance with all 
provisions of Section 10 of RFC2026.

Internet-Drafts are working documents of the Internet Engineering Task 
Force (IETF), its areas, and its working groups. Note that other groups 
may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months 
and may be updated, replaced, or obsoleted by other documents at any 
time. It is inappropriate to use Internet-Drafts as reference material 
or to cite them other than as "work in progress."

     The list of current Internet-Drafts can be accessed at
     http://www.ietf.org/ietf/1id-abstracts.txt

     The list of Internet-Draft Shadow Directories can be accessed at
     http://www.ietf.org/shadow.html.


This Internet-Draft will expire on August 25, 2000.

Abstract

This document discusses how national bibliography numbers (persistent 
and unique identifiers assigned by the national libraries) can be 
supported within the URN framework and the syntax for URNs defined in 
RFC 2141 [Moats].Much of the discussion below is based on the ideas 
expressed in RFC 2288 [Lynch]. Chapter 5 contains a URN namespace 
registration request modelled according to the template in RFC 2611 
[Daigle et al.].


1. Introduction 

As part of the validation process for the development of URNs the IETF 
working group agreed that it is important to demonstrate that the 
current URN syntax proposal can accommodate existing identifiers from 
well established namespaces.  One such infrastructure for assigning and 
managing names comes from the bibliographic community.  Bibliographic 
identifiers function as names for objects that exist both in print and, 
increasingly, in electronic formats.  RFC 2288 [Lynch et. al.] 
investigated the feasibility of using three identifiers (ISBN, ISSN and 
SICI) as URNs. 

This document will analyse the usage of national bibliography numbers 
(NBNs) as URNs. The need to extend analysis to new identifier systems 
was shortly discussed in the RFC 2288 as well, with the following 
summary: "The issues involved in supporting those additional identifiers 
are anticipated to be broadly similar to those involved in supporting 
ISBNs, ISSNs, and SICIs".

Note that this document does not purport to define the "official" 
standard way of using national bibliography numbers as URNs; it merely 
demonstrates feasibility. A registration request for acquiring Namespace 
Identifier (NID) "NBN" for national bibliography numbers has been 
written by the National Library of Finland on the request of Conference 
of Directors of National Libraries (CDNL) and Conference of the European 
National Librarians (CENL). The request is included into chapter 5 of 
this text. 

The document at hand is part of a global co-operation of the national 
libraries to foster identification of electronic documents in general 
and utilisation of URNs in particular. It should be noted that some 
national libraries, including national libraries of Finland, Norway and 
Sweden, are already assigning NBN-based URNs for electronic documents.

Following the registration request, we have used the URN Namespace 
Identifier "NBN" for the national bibliographic numbers in examples 
below. 


2. Identification vs. Resolution

As a rule the national bibliography numbers identify finite, manageably-
sized objects, but these objects may still be large enough that 
resolution to a hierarchical system is appropriate.

The materials identified by a national bibliography number may exist 
only in printed or other physical form, not electronically. The best 
that a resolver will be able to offer in this case is bibliographic data 
from a national bibliography database, including information about where 
the physical resource is stored in national library's holdings. 

The URN Framework provides resolution services that may be used to 
describe any differences between the resource identified by a URN and 
the resource that would be returned as a result of resolving that URN. 
However, NBNs will be used for instance to identify resources in digital 
Web archives created by harvester robot applications. In this case, NBN 
will identify exactly the resource the user expects to see.


3. National bibliography numbers

3.1 Overview

National Bibliography Number (NBN) is a generic name referring to a 
group of identifier systems utilised by the national libraries and only 
by them for identification of deposited publications which lack an 
identifier, or to descriptive metadata (cataloguing) that describes the 
resources. Each national library uses its own NBN strings independently 
of other national libraries; there is no global authority which controls 
them. For this reason NBNs are unique only on the national level. When 
used as URNs NBN strings must be augmented with a controlled prefix such 
as country code. These prefixes guarantee uniqueness of the NBN-based 
URNs on the global scale. 

NBNs have traditionally been given to documents that do not have a 
publisher-assigned identifier, but are catalogued to the national 
bibliography. NBNs can be seen as a fall-back mechanism: if no other, 
better established identifier such as ISBN can be given, an NBN is 
assigned. In principle, NBN usage enables identification of any Internet 
document. Local policies may limit the NBN usage to much smaller subset 
of documents.

Some national libraries (e.g. Finland, Norway, Sweden) have established 
Web-based URN generators, which enable authors and publishers to fetch 
NBN-based URNs for their network documents. At least national libraries 
of Sweden and Finland are harvesting and archiving domestic Web 
documents (and a number of other libraries plan to start this activity), 
and long-time preservation of these materials requires persistent and 
unique identification. NBNs can be and are in fact already used as 
internal identifiers in these Web archives.

Both syntax and scope of NBNs can be decided by each national library 
independently. Typically, a NBN consist of one or more letters and/or a 
number. This simple syntax makes NBNs infinitely extensible and very 
suitable for e.g. naming of the Web documents. For instance the 
application used by the national library of Finland for Web harvesting 
creates NBNs which are based on the MD5 checksum of the archived 
resource. 


3.2 LCCN

Two examples of NBN systems are LCCNs (Library of Congress Control 
Number) used by the Library of Congress, and F-code assigned by the 
National Library of Finland. 

The Library of Congress Card Number was the number used to identify and 
control catalog cards. With the development of the MARC format and the 
first distribution of machine-readable records for book materials in the 
late 1960s, the name of the LCCN was changed to Library of Congress 
Control Number. LCCNs are currently structured as follows:

Element               Length        Positions
Alphabetic Prefix     3             00-02
Year                  2             03-04
Serial Number         6             05-10
Supplement Number     1             11


The uniqueness of the LCCN is determined by the first 11 positions 
(positions 00-10). The Supplement Number has never been used by the 
Library of Congress and this position is always blank. The Supplement 
Number may be followed by two kinds of variable length data known as 
Suffix/Alphabetic Identifier and Revision Date. Each Suffix/Alphabetic 
Identifier is preceded by a slash as is Revision Date. If there is 
noSuffix/Alphabetic Identifier, the Revision Date is preceded by two 
slashes.

According to the RFC 2141, "RFC 1630 [2] reserves the characters "/", 
"?", and "#" for particular purposes. The URN-WG has not yet debated the 
applicability and precise semantics of those purposes as applied to 
URNs. Therefore, these characters are RESERVED for future developments.  
Namespace developers SHOULD NOT use these characters in unencoded form, 
but rather use the appropriate %-encoding for each character".

Thus the slash character ("/") has to be encoded according the 
requirements of RFC2141. There are no other characters in LCCN that need 
encoding.

For more information about the LCCN, see 
http://lcweb.loc.gov/cds/mdslccn.html.


3.3 F-code

F-codes have been used since early 20th century to identify and control 
catalogue cards and later MARC records in the national bibliography. In 
1998 the national library of Finland decided to enable the Finnish 
authors to fetch F-codes to their Internet documents, if these documents 
do not qualify for other identifiers such as ISBN. Authors and 
publishers can retrieve F-codes, embedded into URNs, from the URN 
generator (http://www.lib.helsinki.fi/cgi-bin/urn.pl) developed in co-
operation between the national library of Finland and the Lund 
University library, NETLAB unit. There is a user guide, which tells the 
users how to embed the NBN-based URNs into the identified documents. 

F-codes are also used within the Web harvesting and archiving software, 
which has been built to the Networked European Deposit Library (NEDLIB) 
project (see http://www.konbib.nl/nedlib). This application calculates 
MD5 checksum for each archived resource, and then builds an NBN-based 
URN from the checksum. The URN serves then as a unique identifier to the 
archived resource. Traditional identifiers can not be used for this 
purpose, since there may for instance be several variants of a book 
which (quite rightly so) all have the same ISBN. Moreover, identifiers 
embedded into a document do not necessarily belong to the document 
itself; the Web archiver can not trust the identifier information it 
finds. 

The F-code built by the URN generator consist of:

Prefix (for example fe)
Year (YYYY; for example 1999)
Number (for example 1055)

The generator also adds namespace identifier "NBN" and ISO 3166 country 
code. Thus a URN based on F-code would in this case be for instance 
urn:nbn:fi-fe19991055. 

URNs created by the Web archiver have similar overall structure, except 
that prefix (which may be defined by the operator) is fea and year is 
not used. An example of a URN built by the Web archiver: urn:nbn:fi-fea-
5c5875e6e49ae649cad63e5ee4f6c346. 


F-codes never need any special encoding when used as URNs, since they 
consists of alphanumeric codes only (0-9, a-z). This is often the case 
for other NBN systems as well.

3.4 Encoding Considerations and Lexical Equivalence

Embedding NBNs within the URN framework presents usually no particular 
encoding problems, since all of the characters that can appear in 
commonly used NBN systems can be expressed in special encoding, as 
described in RFC 2141 [MOATS].

When an NBN is used as an URN, the namespace specific string will 
consist of three parts: prefix, consisting of either a two-letter ISO 
3166 country code or other string, delimiting character (hyphen, colon 
or hash sign) and NBN string assigned by the national library. 

Non-ISO 3166 -prefixes must be registered. The Library of Congress will 
maintain the central register of reserved codes, and make it available 
to the national libraries. All two-letter codes are reserved for 
existing and possible future ISO country codes and may not be used as 
non-ISO prefixes. If there are several national libraries in one country 
who use the same prefix - for instance, a country code -, they need to 
agree on how to split the sub-namespace between them. 

Models:
URN:NBN:<ISO 3166 country code>-<assigned NBN string>
URN:NBN:<non-ISO 3166 prefix>-<assigned NBN string>

Examples:
URN:NBN:fi-fe19981001 (A "real" URN assigned by the National Library of 
Finland).
URN:NBN:LCCN:2001000168 (A LCCN-based hypothetical URN assigned by the 
Library of Congress).

3.5 Resolution of NBN-based URNs

As a dumb code NBN would be difficult to resolve globally as such. The 
(usually) country code -based prefix part of the URN namespace specific 
string will provide a guide to where to find a resolution service and 
the NBN register will identify the assigning agency. Once the NBN-based 
URN resolution is in global usage, the number of prefixes will slowly 
become equal or even slightly bigger than the number of national 
libraries. 

If NBN assignment is limited to the national bibliography database, then 
all NBN-based URNs for that country will be resolved there. In one model 
these databases contain detailed resource descriptions including URLs, 
which will point both to the copy of the document in the Internet and to 
the copy in the national library's (legal) deposit collection. Due to 
the limitations in the usage of legal deposit documents it is possible 
that the deposited electronic materials can not be delivered outside the 
premises of the national library.

If it is possible for the authors and publishers to retrieve NBNs to Web 
documents and there is no obligation to deposit thus identified 
documents to the national library, URN resolution service is not 
possible without a national Web index and archive, maintained by the 
national library or other organisation/organisations. Web index/archive 
will also resolve URNs machine-generated to the archived Web documents.

3.6 Additional considerations

Guidelines adopted by each national library define when different 
versions of a work should be assigned the same of differing NBNs. These 
rules apply only if identifier assignment is done manually. If 
identifiers are allocated programmatically, the only criteria that can 
be used is that two documents which are identical on the bit level (have 
the same MD5 checksum) are deemed identical and should receive the same 
NBN. The likelihood of this happening to dissimilar documents is about 
2^64, according to the RFC1321.

The rules governing the usage of NBNs are less strict than those 
specifying the usage of ISBN or other, better established identifiers. 
Since the NBNs have up to know been given only by the personnel 
(cataloguers) working in the national libraries, the identifier 
assignment has in practice been well co-ordinated. 

It is obvious that a NBN URN will resolve to single instance of the work 
if identifier assignment has been automatic. Given the nature of NBNs it 
is also likely that different versions of the same work will receive 
different NBNs even if identifier is given manually. 


4. Security Considerations

This document proposes means of encoding several existing bibliographic 
identifiers within the URN framework. This document does not discuss 
resolution except in a very generic level; thus questions of secure or 
authenticated resolution mechanisms are out of scope.  It does not 
address means of validating the integrity or authenticating the source 
or provenance of URNs that contain bibliographic identifiers.  Issues 
regarding intellectual property rights associated with objects 
identified by the various bibliographic identifiers are also beyond the 
scope of this document, as are questions about rights to the databases 
that might be used to construct resolvers.


5. Namespace registration


URN Namespace ID Registration for the National Bibliography Number (NBN)

Namespace ID:

NBN

This Namespace ID has been in production use in demonstrator systems 
since summer 1998; at least hundreds of URNs from this namespace have 
been delivered already in Finland and Sweden. 

Registration Information:

Version: 2
Date: 2000-02-25
The first registration of the NID "NBN" was done via the URN WG in 
November 1998.

Declared registrant of the namespace:

Name: Juha Hakala
E-mail: juha.hakala@helsinki.fi
Affiliation: Helsinki University Library - The National Library of 
Finland, Conference of European National Librarians (CENL) and 
Conference of Directors of National Libraries (CDNL)
Address: P.O.Box 26, 00014 Helsinki University, Finland

Both CENL and CDNL made decisions to foster the usage of URNs during 
1998. Both organisations have set up a working group for this purpose. 
One item in the common work plan is utilisation of national bibliography 
numbers (NBNs; see below) as URNs for identification of grey literature 
published in the Internet. NBN namespace will enable the national 
libraries to do this. The namespace will be available for all national 
libraries in the world. 

Declaration of syntactic structure:

The namespace specific string will consist of three parts: prefix, 
consisting of either a two-letter ISO 3166 country code or other string, 
delimiting character (hyphen, colon or hash sign) and NBN string 
assigned by the national library. A namespace specific string must be 
unique when normalised to omit the delimiter between the prefix and the 
string. 

Non-ISO prefixes must be registered. A global registry, maintained by 
the Library of Congress, will be created and made available via the Web. 
Contact information: nbn.register@loc.gov.us. All two-letter codes are 
reserved for existing and possible future ISO country codes and may not 
be used as non-ISO prefixes. 

If there are several national libraries in one country who want to use 
the same prefix - for instance, a country code -, they need to agree on 
how to split the namespace between them into smaller sub-domains. These 
smaller domains must be registered if they are resolved on different 
sites. Similarly, a single national library may utilise various sub-
domains; for instance, the National Library of Finland already has two 
domains, fi-fe for author-assigned URNs and fi-fea for URNs built by the 
Web harvesters. 

Models:

URN:NBN:<ISO 3166 country code>-<assigned NBN string>
URN:NBN:<non-ISO 3166 prefix>-<assigned NBN string>

Examples:

A country code -based URN: URN:NBN:fi-fe19981001 (A URN assigned by the 
National Library of Finland).
Non-country code based URN: URN:NBN:LCCN:2001000168 (A hypothetical URN 
assigned by the Library of Congress).

Relevant ancillary documentation:

National Bibliography Number (NBN) is a generic name referring to a 
group of identifier systems used by the national libraries for 
identification of deposited publications which lack an identifier, or to 
descriptive metadata (cataloguing) that describes the resources. Each 
national library uses its own NBN strings independently of other 
libraries; there is no global authority which controls them. For this 
reason NBNs are unique only on the national level, and the controlled 
prefix guarantees uniqueness on the global scale. 

NBNs have traditionally been given to documents that do not have a 
publisher-assigned identifier, but are catalogued to the national 
bibliography. When assigned as URNs, these NBNs will fit into the global 
URN resolution services. Some national libraries (Finland, Norway, 
Sweden) have established Web-based URN generators, which enable authors 
and publishers to fetch NBN-based URNs for their network documents.

Both syntax and scope of NBNs can be decided by each national library 
independently. Typically, a NBN consist of one or more letters and a 
number.

Identifier uniqueness considerations:

NBN strings assigned by two national libraries may be identical. For 
this reason usage of prefix in the namespace specific string is 
obligatory for guaranteeing global uniqueness of NBN-based URNs. 

In the national level, libraries utilise different policies for 
guaranteeing uniqueness. A national library may automate the delivery of 
NBN-based URNs. In this case, the NBNs are assigned sequentially by a 
program (URN generator). 

Identifier persistence considerations:

Persistence of the NBNs as identifiers is guaranteed by the persistence 
of national libraries and information systems, such as national 
bibliographies, maintained by them. NBNs have been used for several 
centuries for printed materials. NBN-based identification of electronic 
documents is a recent practice, but it is likely to continue for a very 
long time.

Process of identifier assignment:

Assignment of NBN-based URNs is always controlled in the national level 
by the national library / national libraries. In Europe, Conference of 
the European National Librarians will co-ordinate the URN practices in 
member libraries via a working group established in 1998. In the global 
level, Conference of Directors of National Librarians (CDNL) has 
established in 1999 a task force with similar aims. 

National libraries may choose different strategies in assigning NBN-
based URNs. One option is assignment by the library personnel only. This 
is typically done when the document is catalogued into the national 
bibliography. A national library may also set up a URN generator 
(generators), and allow publishers and authors to retrieve NBN-based 
URNs from there. In this case there is no guarantee that the document 
will be catalogued into the national bibliography. Besides the harvester 
the national libraries may develop other applications such as Web 
harvesters/archivers which utilise URNs for identification purposes.

Process for identifier resolution:

URNs based on NBNs will be primarily resolved via the national 
bibliography databases. In one model these databases contain detailed 
resource descriptions including URLs, which will point both to the copy 
of the document in the Internet and to the copy in the national 
library's (legal) deposit collection. Due to the limitations in the 
usage of legal deposit documents it is possible that the deposited 
materials can not be delivered outside the premises of the national 
library. 

For those documents not catalogued into the national bibliography 
database URN resolution may take place via national or international Web 
indexes and/or archives. Nordic national libraries have established a 
joint initiative called Nordic Web Index / Nordic Web Archive (NWI/NWA), 
which aims at creating national Web archives and indexes into all Nordic 
countries. 

As a dumb code NBN would be difficult to resolve globally as such. The 
prefix part of the URN namespace specific string will provide a guide to 
where to find a resolution service and the NBN register will identify 
the assigning agency. It will be necessary to establish a DNS NAPTR 
resource record for each prefix; the total number of these records may 
in the end be about 200. Initially, only a handful of records will be 
needed. 

Within each record, there will be one or more resolution services 
specified, depending on the assignment policy of the national library. 
If NBN assignment is limited to the national bibliography database, then 
all NBN-based URNs for that country will be resolved there. If it is 
possible to retrieve NBNs to Web documents, full-scale URN resolution 
service is not possible without a national Web index and archive.

Rules for Lexical Equivalence:

None in the global level. Any national library may provide its own 
rules, on the basis of its NBN syntax.

Conformance with URN Syntax:

All NBNs we know of are ASCII strings consisting of letters (a-z) and 
numbers (0-9). If NBN contains characters that are reserved in the URN 
syntax, this data must be presented in hex encoded form as defined in 
RFC2141. A national library may limit the full scope of its NBN strings 
in URN usage in such a way that there are no reserved characters in the 
URN namespace specific strings.

Validation mechanism:

None specified on the global level. A national library may use NBNs, 
which contain a checksum and can therefore be validated, but this is for 
the time being not a common practice.

Scope:

Global.


6. References

[Daigle et al.]: Daigle, L., van Gulik, D., Iannella, R. & Faltstrom, 
P.: URN Namespace Definition Mechanisms, RFC2611, June 1999.
[Lynch] Lynch, C., Using Existing Bibliographic Identifiers as Uniform 
Resource Names, RFC 2288, February 1998
[Moats] Moats, R., "URN Syntax", RFC 2141, May 1997.


7. Authors' Address

   Juha Hakala
   Helsinki University Library - The National Library of Finland
   P.O. Box 26
   FIN-00014 Helsinki University
   FINLAND

   EMail: juha.hakala@helsinki.fi


8.  Full Copyright Statement

   Copyright (C) The Internet Society (2000).  All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works.  However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.