Network Working Group Juha Hakala Internet-Draft Helsinki University Library Category: Informational 28 August 2001 draft-hakala-sici-00.txt Expires: 28 February 2002 Using Serial Item and Contribution Identifiers as Uniform Resource Names Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on 28 February 2002. Abstract This document discusses how Serial Item and Contribution Identifiers (SICIs; persistent and unique identifiers for serial issues and contributions such as articles) can be supported within the URN framework and the syntax for URNs defined in RFC 2141 [Moats]. Much of the discussion below is based on the ideas expressed in RFC 2288 [Lynch]. Chapter 5 contains a URN namespace registration request modelled according to the template in RFC 2611 [Daigle et al.]. 1. Introduction As part of the validation process for the development of URNs the IETF working group agreed that it is important to demonstrate that the current URN syntax proposal can accommodate existing identifiers from well-established namespaces. One such infrastructure for assigning and managing names comes from the bibliographic community. Bibliographic identifiers function as names for objects that exist both in print and, increasingly, in electronic formats. RFC 2288 [Lynch et. al.] investigated the feasibility of using three identifiers (ISBN, ISSN and SICI) as URNs. SICI is an American national standard defined by NISO/ANSI Z39.56-1996 [NISO]. The need to develop a new version of the standard is at present being investigated by NISO. RFC 2288 does not û and it was not the aim of its authors û to analyse how SICI-based URNs can actually be resolved. This text will specify one solution to this question. There may be other, complementary resolution services. Generally, the difficulty of designing a URN resolution service is dependent on two factors: * Is the identifier dumb, or does it provide a hint on where to find a resolution service? * How many potential resolution services are there? ISBN (International Standard Book Number) is a good example of an intelligent identifier. Analysis of the ISBN will reveal not only the region where the ISBN has been assigned, but also the publisher who is responsible for the book. Resolution of ISBN-based URNs can be decentralised to national bibliography databases, maintained by the national libraries. If the ISBN was a dumb identifier, this would be impossible. International Standard Serial Number (ISSN) is a dumb identifier. It does not have a publisher identifier; serials published by a certain company get seemingly random ISSNs. Although ISSNs are allocated to regional agencies in blocks, which gives the system some "intelligence", a resolution service should not rely on these blocks, but use the global ISSN database. It contains a bibliographic description of every periodical that has received an ISSN. Thus, it is easy to resolve ISSN- based URNs even though the identifier itself does not help in localising the resolution service. SICI is based on ISSN (see below for a description of its syntax). Like ISSN, it is therefore a dumb identifier. But there is not, and will never be, a global SICI database, which would contain bibliographic information about every serial issue and/or article published in the world. Most articles will not be catalogued at all, and the existing bibliographic information about articles is dispersed into a large number of databases maintained by publishers, libraries and other information intermediaries. Although it might be technically possible to merge records from these databases into a union catalogue, in practice such an enterprise is not politically possible. As a "dumb" identifier with a large and ever growing number of potential resolution services SICI poses interesting challenges to the design of the URN resolution process. Generally, a combination of dumb identifier and multiple resolution services is a problem, since there is no simple way of finding out which resolution service is the correct one. A gateway service is needed for providing this valuable information. Below we propose that for SICI- based URNs, the global ISSN database will be capable of acting as a link between the user and the resolution service. The registration request for acquiring a Namespace Identifier (NID) "SICI" for Serial Item and Contribution Identifiers has been written by the National Library of Finland on behalf of the National Information Standards Organization (NISO). The request is included in chapter 5 of this text. The document at hand is part of a global co-operation of the national libraries to foster identification of electronic documents in general and utilisation of URNs in particular. This work is co-ordinated by a working group established by the Conference of Directors of National Libraries (CDNL). We have used the URN Namespace Identifier "SICI" for the Serial Item and Contribution Identifiers in examples below. 2. Identification vs. Resolution As a rule the SICIs identify finite, manageably-sized objects, but these objects may still be large enough so that resolution to a hierarchical system, such as all articles published in a serial issue, is appropriate. The materials identified by a SICI may exist only in printed or other physical form, not electronically. The best that a resolver service will be able to offer in this case is bibliographic data from the database providing resolution services, including information about where the physical resource is stored in the owner institution's holdings. 3. Serial Item and Contribution Identifier 3.1 Overview The Serial Item and Contribution Identifier (SICI) standard defines a variable length code that provides unique identification of serial items (e.g., issues) and the contributions (e.g., articles) contained in a serial title. SICI is specified in NISO/ANSI Z39.56-1996 [NISO2]. Like other NISO standards, the SICI document is available for free in the Web. SICI is based on ISSN (International Standard Serial Number), but augments it extensively. SICI is a combination of three segments, all of which are required: Item segment; the data elements needed to describe the serial item such as serial issue (ISSN, Chronology, Enumeration) Contribution segment, the data elements needed to identify contributions within an item (Location, Title Code) Control segment, the data elements needed to record those administrative elements that determine the validity, version, and format of the SICI code representation. RFC 2288 provides the following example: 0015-6914(19960101)157:1<62:KTSW>2.0.TX;2-F The first nine characters are the ISSN identifying the serial title. The second component, in parentheses, is the chronology information giving the date the particular serial issue was published. In this example that date was January 1, 1996. The third component, 157:1, is enumeration information (volume, number) for the particular issue of the serial. These three components comprise the "item segment" of a SICI code. By augmenting the ISSN with the chronology and/or enumeration information, specific issues of the serial can be identified. The next segment, <62:KTSW>, identifies a particular contribution within the issue. In this example we provide the starting page number and a title code constructed from the initial characters of the title. Identifiers assigned to a contribution can be used in the contribution segment if page numbers are inappropriate. The rest of the identifier is the control segment, which includes a check character. Interested readers are encouraged to consult the standard for an explanation of the fields in that segment. SICI can be seen as a logical extension of the ISSN to the items and individual contributions that make up a serial's hierarchical structure. The current version of the SICI does have some limitations; it does not allow identification of subsections of an article such as paragraphs or diagrams. If deemed necessary, the functionality needed for article subsection identification could be added to the standard. The current version of SICI guarantees uniqueness in most situations; however, the standard does not always differentiate between multiple variant formats in which an electronic article may be published. For instance, variants of a digitised article published in PDF and HTML formats will receive the same SICI, provided that the ISSN is the same. According to the rules of the ISSN centre, ISSN numbers can be applied retrospectively to old periodicals. If the original printed document has an ISSN, the same identifier is also valid for the digitised version. ISSN guidelines formulate this principle in the following way: A reproduction is a copy of an item and intended to function as a substitute for that item. The reproduction may be in a different medium from the original but it is not a different edition in itself. The ISSN assigned to the original is valid for the reproduction, a new ISSN is not assigned to the reproduction. ISSN numbers are assigned by regional agencies, which receive ISSN blocks from the ISSN International Centre. SICI usage is not dependent on such formal agencies; the aim is that once ISSN is known, SICI codes can be created, manually or by computer program, by publishers, libraries, document delivery services or even by individual users. Given the complexity of SICI codes, the recommended practice is to automate the SICI creation process. If an article is structured enough, all elements of SICI can be extracted from the document. A tool capable of this has been built by the E.U. project DIEPER; this tool, of course, only works properly if the document is structured in the way the DIEPER project recommends. Another, less challenging option is a SICI generator, which builds syntactically correct SICIs including the check character if the basic ingredients are typed in manually. 3.2 Encoding Considerations and Lexical Equivalence RFC 2288 contains the following simple and yet sufficient analysis of SICI encoding: The character set for SICIs is intended to be email-transport- transparent, so it does not present major problems. However, all printable excluded and reserved characters from the URN syntax are valid in the SICI character set and must be %-encoded. Example of a SICI for an issue of a journal: URN:SICI:1046-8188(199501)13:1%3C%3E1.0.TX;2-F For an article contained within that issue: URN:SICI:1046-8188(199501)13:1%3C69:FTTHBI%3E2.0.TX;2-4 Equivalence rules for SICIs are not appropriate for definition as part of the namespace and incorporation in areas such as cache management algorithms. It is best left to resolver systems which try to determine if two SICIs refer to the same content. Consequently, we do not propose any specific rules for equivalence testing through lexical manipulation. 3.3 Resolution of SICI-based URNs Since ISSN is a dumb code, SICI does not contain any explicit hint on where to find the URN resolution service or services. However, an efficient and global resolution service can be accomplished by using the ISSN register as a way station. In spring 2001, the ISSN register contained about one million bibliographic records describing serials, including thousands of electronic journals. There are several other databases, which contain hundreds of thousands of serial records, but the ISSN register has the best coverage. The first step in resolving a SICI-based URN is a query to the ISSN register. The SICI resolution service in the ISSN register will parse the SICI code in order to extract the ISSN from it. ISSN will then be used as a search key for retrieving the bibliographic record of the serial from the ISSN register. Currently the ISSN register already contains thousands of records describing electronic journals. These records contain the URL of the serial's home page. This URL is appropriate for resolving the URN based on the ISSN of the periodical. The mechanism for resolving such URNs via the ISSN register has been specified in RFC 3044 [Rozenfeld]. The ISSN International Centre has already built a demonstration URN resolution service for ISSN-based URNs into their present information system. In order to resolve SICI-based URNs, a new data element has to be added into the records in the ISSN register. This data element would contain the network address (URL) of the database, which holds the article required and/or bibliographic information about it. It must also be possible to specify volumes and if necessary issues which are included in the database within this data element. The data element should be repeatable, since the same article may be available from multiple sources. For instance, the publisher, Library of Congress (http://www.loc.gov/), JSTOR (http://www.jstor.org/) and a number of host services such as EBSCO (http://www.ebsco.com/home/) may all have a copy of the same resource. The SICI resolution service built into the ISSN register will check if database address information is available in the bibliographic record of the serial. Then it makes sure that the volume and/or issue needed is available via the service. If this is the case, the application will make the query, receive the result û article or bibliographic information about it - and pass it on to the user. The functionality described above was implemented in co-operation between the ISSN International Centre and the E.U. project DIEPER (http://gdz.sub.uni-goettingen.de/dieper/). The SICI resolution service is an extension of the service built for resolving ISSN-based URNs. By March 2001 a demonstrator service via which several of the databases maintained by the project partners could be accessed was released for internal use within the project. The ISSN IC and project partners wish to maintain the service also after the formal end of the project. Discussions about adding the new data element into bibliographic records in the ISSN register are under way. Please note that the discussion herein applies to SICIs assigned to serial contributions. Since serial items (issues) have seldom been described or digitised as such, a search by serial item SICI will in practice be expanded into retrieval of all contributions (articles) within the serial item (issue) in question. If a resolution service for the resource at hand does not exist, or the user is not authorised to utilise it, he/she may get the bibliographic description of the serial from the ISSN register. 3.4 Additional considerations Electronic journals have rapidly become very popular in scientific publishing. The main reasons for this are the emergence of viable business models (e.g. licensing) and the birth of a reliable and efficient delivery mechanism (the Web). New content is being added via two different channels. A significant number of scientific journals is published in electronic form, usually alongside a printed version. On the other hand, old printed volumes are digitised and made available in electronic form. Digitisation is done by development projects such as DIEPER, established services such as JSTOR, or publishers - for instance Elsevier is digitising all printed journals the company has published. Reliable linking of articles to references and bibliographic data about the articles is an important issue. URLs are as of this writing the most common means used for linking, but their reliability is low; average lifetime for a URL is estimated to be two years. A more reliable linking mechanism than URLs is urgently needed. Many scientific publishers are already using Digital Object Identifiers (DOI) for their materials. DOI resolution service is based on Handle system, which is "a comprehensive system for assigning, managing, and resolving persistent identifiers, known as "handles," for digital objects and other resources on the Internet" (see http://www.handle.net/introduction.html). Handles can be used as Uniform Resource Names(URNs). URN is both an identifier and a non-commercial and technically advanced resolution service. Due to the co-operation of the ISSN International Centre the URN resolution service for articles outlined in this Internet standard is global, and can accommodate an unlimited number of article services located anywhere in the world. For instance, in order to establish URN-based links to articles digitised in JSTOR service, a number of steps are necessary. First, each article must be identified by SICI, and these SICIs must be indexed in the JSTOR database. Second, bibliographic records of JSTOR journals in the ISSN register must all be enriched with a link to the JSTOR search interface and volume/issue information. For instance, the bibliographic record describing the journal "Ecology" must contain the information that volumes 1-77 (1920-1996) are available via JSTOR. This information may be quite volatile, and maintenance of the ISSN register must therefore be frequent and efficient. Apart from modification of the data, some programming work is needed. Due to the work done in the DIEPER project, the ISSN register already has the functionality needed for resolving SICI-based URNs. Adding the required functionality into the JSTOR database may or may not be difficult depending on the system architecture; in DIEPER some partners were able to implement the required functionality quite easily. Since the Web browsers do not support URN resolution yet, the final step in enabling resolution of URN-based SICIs is installation of the browser plug-in developed by the ISSN International Centre. For various reasons, one article may be available in several locations. Every article copy may have a different set of users who are allowed access to it. For instance, a copy acquired by a national library via legal deposit may only be available within the library premises. Making the links context sensitive û provide only those links that "work" for a user is a challenge. OpenURL framework [Van de Sompel] provides a means for sensitive linking. As of this writing OpenURL is rapidly gaining popularity, and there are already a few integrated library systems which support it. The ISSN register may in the future support OpenURL usage; this would be very valuable when the same resource (article) is available from several sources, which have different user population. In their present form the URN resolution services provided via the ISSN register suit those services best, which are available in public domain, and are reasonably stable. Numerous digitisation projects such as DIEPER are currently making printed articles available in the Web in digital form. An additional benefit of coding the needed location and volume information into the ISSN register would be that this database then could also serve as a global registry of serial digitisation efforts. Such a register is badly needed to avoid duplicate work. Since the number of SICI resolution services will eventually be high, the capacity of the server on which the ISSN register runs and its network connection may become a bottleneck, especially if the articles were delivered via the ISSN server to the users. Setting up mirror sites would in this case be the most efficient means for load control and balancing. Technically the setting up of mirror sites is not difficult. The ISSN register contains approximately a million bibliographic records, and is therefore not a very large database. 4. Security Considerations This document proposes means of encoding and using Serial Item and Contribution Identifiers within the URN framework. This document does not discuss resolution except at a generic level; thus questions of secure or authenticated resolution mechanisms in the ISSN register or in actual resolution services are out of scope. This text does not address means of validating the integrity or authenticating the source or provenance of URNs that contain SICIs. Issues regarding intellectual property rights associated with objects identified by the various bibliographic identifiers are also beyond the scope of this document, as are questions about rights to the databases that might be used to construct resolvers. 5. Namespace registration URN Namespace ID Registration for the Serial Item and Contribution Identifier (SICI) Namespace ID: SICI SICI is a well-established acronym for Serial Item and Contribution Identifiers; giving this NID for any other system would cause a lot of confusion. This namespace ID has already been used in SICI-based URNs in the E.U. project DIEPER. Registration Information: Version: 1 Date: 2001-08-28 Declared registrant of the namespace: Name: Patricia Harris E-mail: pharris@niso.org Affiliation: National Information Standards Organisation Address: 4733 Bethesda Avenue, Suite 300, Bethesda, MD 20814 Declaration of syntactic structure: Each SICI contains three segments: Item segment; the data elements needed to describe the serial item such as serial issue (ISSN, Chronology, Enumeration) Contribution segment, the data elements needed to identify contributions within an item (Location, Title Code) Control segment, the data elements needed to record those administrative elements that determine the validity, version, and format of the SICI code representation. Example: 0015-6914(19960101)157:1<62:KTSW>2.0.TX;2-F SICI codes can be generated and parsed by computer programs. Relevant ancillary documentation: SICI is an American national standard defined by NISO/ANSI Z39.56-1996 [NISO2]. A new version of the standard is currently under development. Identifier uniqueness considerations: SICI codes will almost always be unique. Since SICI is based on ISSN, articles from different journals will definitely never get the same SICI. Since enumeration and chronology information must also be given, articles and other contributions published in different volumes and issues will also never get the same SICI. SICIs may not be unique if and only if: If two or more contributions are published on the same page(s) and if they have similar enough titles (the first letter of each word is the same). In a single issue of an electronic journal (which lacks page numbers) there are two or more contributions with titles similar enough. If there are several technical variants of an electronic serial contribution (multiple formats, multiple resolutions) the current version of SICI will not make any difference between these variants. In this case the intellectual content will usually be the same, but layout will differ from one version to another. The new version of the SICI standard will be enhanced in order to diminish the risk of non-unique SICIs. Identifier persistence considerations: Once assigned, SICI will never change. The same SICI will not be used again for other serial items and contributions. Process of identifier assignment: There will not be a national, regional or international agency governing the SICI assignment process. Publishers, libraries or other information intermediaries will create SICIs when needed. The most important prerequisite is that the journal must have an ISSN. Although SICI assignment is decentralised, the national ISSN agencies and the ISSN International Centre may support publishers and other interested parties in SICI implementation. SICI can - and should - be built via automated means. If the source document such as article is sufficiently structured, SICI can be generated without human involvement. Another option is a semi-automated process, in which a human user types in the relevant data elements, and the application takes care of building the code. Process for identifier resolution: Resolution will take place in two steps as defined in chapter 3.3. First the ISSN register is used for finding the location of the resolution service(s) for the serial and volume at hand. Using the linking information stored in the serial's bibliographic record, the correct resolution service is contacted, and the requested resource is delivered to the user. Rules for Lexical Equivalence: We do not propose any specific rules for equivalence testing through lexical manipulation. Conformance with URN Syntax: According to the RFC 2288: The character set for SICIs is intended to be email-transport- transparent, so it does not present major problems. However, all printable excluded and reserved characters from the URN syntax are valid in the SICI character set and must be %-encoded. Example of a SICI for an issue of a journal: URN:SICI:1046-8188(199501)13:1%3C%3E1.0.TX;2-F For an article contained within that issue: URN:SICI:1046-8188(199501)13:1%3C69:FTTHBI%3E2.0.TX;2-4 Validation mechanism: Validity of a SICI string can be checked by modulus 37 check digit. Scope: Global. 6. References [Daigle et al.]: Daigle, L., van Gulik, D., Iannella, R. & Faltstrom, P.: URN Namespace Definition Mechanisms, RFC2611, June 1999. [Lynch] Lynch, C., Using Existing Bibliographic Identifiers as Uniform Resource Names, RFC 2288, February 1998 [Moats] Moats, R., URN Syntax, RFC 2141, May 1997. [NISO] NISO/ANSI Z39.56-1996 Serial Item and Contribution Identifier. Electronic resource, available at http://www.techstreet.com/cgi- bin/pdf/free/152629/z39-56.pdf [Rozenfeld] Rozenfeld, S., Using The ISSN (International Serial Standard Number) as URN (Uniform Resource Names) within an ISSN-URN Namespace, RFC 3044, January 2001. [Van de Sompel] Van de Sompel, Herbert & Beit-Arie, Oren: Open Linking in the Scholarly Information Environment Using the OpenURL Framework. D- Lib Magazine, March 2001. Electronic resource, available at http://www.dlib.org/dlib/march01/vandesompel/03vandesompel.html 7. Authors' Address Juha Hakala Helsinki University Library - The National Library of Finland P.O. Box 26 FIN-00014 Helsinki University FINLAND E-mail: juha.hakala@helsinki.fi 8. Full Copyright Statement Copyright (C) The Internet Society (2001). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.